Our standard architecture for private clouds includes security appliances that provide a fairly comprehensive set of security features from stateless firewall to full intrusion detection. We like to use Juniper SRX devices, and I admit I'm a big fan JunOS.
In the same vein as this post by Thomas, a client wished to use a Linux system to connect to their SRX-powerd IPSec VPN using the Shrew Soft client. Searching the tubes for other people's experience connecting these two end points yielded... nothing. But after some poking and prodding, I got the client to connect. The main elements of the configuration that weren't immediately obvious were:
- Use "push" configuration mode
- Manually add the policies needed for the given VPN
I find the configuration modes are always named differently from one client to another -- Shrew Soft has push, pull, and dhcp; VPN Tracker has mode config, mode config (active), and mode config (passive); &c. Standard terminology would definitely simplify cross-vendor integrations. I would have liked to have this VPN client pulling the network policies from the SRX. VPN Tracker is able to do this, and the Shrew Soft client has a configuration option for this, but it didn't work in my test setup. Below is the complete client configuration. The test node was a Debian 6 VM, extra stuff beyond the base install was added just to build the client (2.1.7, current at the time of writing).
b:auth-mutual-psk:eA== n:client-addr-auto:1 n:client-banner-enable:1 n:client-dns-auto:1 n:client-dns-used:1 n:network-dpd-enable:1 n:network-frag-size:540 n:network-ike-port:500 n:network-mtu-size:1380 n:network-natt-port:4500 n:network-natt-rate:60 n:network-notify-enable:1 n:phase1-dhgroup:2 n:phase1-keylen:256 n:phase1-life-kbytes:0 n:phase1-life-secs:28800 n:phase2-keylen:256 n:phase2-life-kbytes:0 n:phase2-life-secs:3600 n:phase2-pfsgroup:2 n:policy-list-auto:0 n:policy-nailed:1 n:vendor-chkpt-enable:0 n:version:2 s:client-ip-addr:0.0.0.0 s:client-ip-mask:255.255.255.255 s:client-dns-addr:126.96.36.199 s:client-dns-suffix: s:network-host:vpn.example.com s:client-auto-mode:push s:client-iface:virtual s:network-natt-mode:enable s:network-frag-mode:enable s:auth-method:mutual-psk-xauth s:ident-client-type:fqdn s:ident-client-data:vpn.example.com s:ident-server-type:address s:phase1-exchange:aggressive s:phase1-cipher:aes s:phase1-hash:sha1 s:phase2-transform:aes s:phase2-hmac:sha1 s:ipcomp-transform:disabled s:policy-level:auto s:policy-list-include:10.10.10.0 / 255.255.255.0
A client recently asked us to recommend a proxy (layer 4 and layer 7) that would scale better than the appliance they were using without breaking the bank. I didn't hesitate to point them to haproxy -- it's fast, stable, scalable, widely deployed and well-supported.
Not long after deploying a pair of Dell R610s running haproxy and heartbeat, the client (wisely) initiated some benchmark testing to evaluate the capabilities of their new proxy layer. The results fell far short of their expectations and ours. A pool of several servers only managed 600 -- 700 sessions per second. Worse yet, running the benchmark directly against one server (bypassing the proxy layer) yielded better results, around 700 -- 800 sessions per second. It's a sad day when a single server outperforms a cluster. What could be holding things back?
I first tried to reproduce the inadequate performance in my dev environment. And although the client was using some just-deployed XenServer 6.0 chassis, my dev environment is still on 5.6. But the problem was easily reproduced despite this difference (unfortunately showing that 6.0 has this same capacity limit as 5.6). My test case used a static document of about 10KB, served under Apache 2. The benchmarks were run with ab (from the Apache package), httperf, and siege using various concurrencies and durations. I wasn't overly impressed with httperf, ab got the job done pretty well, and I found siege to work beautifully. Regardless of my personal evaluations of these tools, though, I found that all three agreed fairly closely on the performance capabilities of my test VM: 600 -- 800 sessions per second, depending on the test parameters. In general, low concurrencies yieled moderately acceptable session service rates, but anything above about 50 -- 100 concurrent sessions caused the overall session rate to plunge. Given that the client was starting their tests at 250 concurrent sessions and moving up from there (hoping for at least 1000/s service), this would never suffice.
While running benchmark after benchmark, I poked at various system statics on the VMs involved and saw nothing useful: haproxy running at about 10% CPU (not terrible given that the network driver was for a generic software "NIC"), Apache was peaking at around 30% CPU, iostat and vmstat showing nothing that would limit the service performance. So I poked on the chassis hosting haproxy and found a process eating 70% of the dom0 CPU capacity -- netback. The software network stack in XenServer itself was unable to handle the load the benchmark was generating.
To show definitively that the proxies were not contributing to the problem in any substantial way, I reworked my test setup on the deployed R610 proxies (which were not yet live due to the performance issues). Using the same static test file, Apache 2, and haproxy configuration, sending traffic from one R610 to the other (the latter hosting both Apache and haproxy), I pulled a cool 10000 sessions per second. More than a ten-fold increase! Watching the processes in top(1), I noted that the reported CPU loads were very volatile during a benchmark run. Pinning the processes to specific CPUs helped here. Only one haproxy process was running (despite the R610 having 8 CPU cores), so I pinned it to a single core, Apache to another single core, and top to a third core. This allowed haproxy to break 11000 sessions per second. That's pretty respectable for a 10KB document on a 1Gbe network.
With a spread from 800/s to 10000/s, it's clear that the overhead of the virtualized network is a serious bottleneck. Optimal performance demands minimal overhead!