Fun With vpnc

I recently got a new laptop at work and I decided to put OpenSolaris on it. This meant I had to setup vpnc in order to access the server networks and wireless here. I installed my vpnc package, copied the profile from my Ubuntu workstation, and started it up. It connected, but no packets flowed. I didn’t have time to investigate, so I decided to work on it some more at home.

The strange thing is that it connected from home with the very same profile and everything worked fine. I immediately suspected something was wrong with the routing tables, like maybe some of the routes installed by vpnc-script were conflicting with the routes necessary to talk to the VPN concentrator. I endlessly compared the routing tables between work and home and my working Ubuntu workstation, removing routes, adding routes, and manually constructing the routing table until I was positive it could not be that.

Everything I pinged worked. I could ping the concentrator. I could ping the gateway. I could ping the tunnel device. I could ping the physical interface—or so I thought.

As I was preparing to write a message to the vpnc-devel mailing list requesting help, I did some pings to post the output in the email. I ran

$ ping <concentrator ip>
<concentrator ip> is alive

which looked good, but I wanted the full ping output, so I ran

$ ping -s <concentrator ip>
PING <concentrator ip>: 56 data bytes
^C
----<concentrator ip> PING Statistics----
4 packets transmitted, 1 packets received, 75% packet loss
round-trip (ms)  min/avg/max/stddev = 9223372036854776.000/0.000/0.000/-NaN

For some reason, only the first ping was getting through. The rest were getting hung up somewhere. The really strange thing was that I saw the same behavior on the local physical interface:

$ ifconfig bge0
bge0: flags=1004843 mtu 1500 index 3
        inet 161.253.143.151 netmask ffffff00 broadcast 161.253.143.255
$ ping -s 161.253.143.151
PING 161.253.143.151: 56 data bytes
^C
----161.253.143.151 PING Statistics----
5 packets transmitted, 1 packets received, 80% packet loss
round-trip (ms)  min/avg/max/stddev = 9223372036854776.000/0.000/0.000/-NaN

I have never seen a situation where you couldn’t even ping a local physical interface! I checked and double checked that IPFilter wasn’t running. Finally I started a packet capture of the physical interface to see what was happening to my pings:

# snoop -d bge0 icmp
Using device bge0 (promiscuous mode)
161.253.143.151 -> <concentrator ip> ICMP Destination unreachable (Bad protocol 50)
161.253.143.151 -> <concentrator ip> ICMP Destination unreachable (Bad protocol 50)
161.253.143.151 -> <concentrator ip> ICMP Destination unreachable (Bad protocol 50)
^C

That’s when by chance I saw messages being sent to the VPN concentrator saying “bad protocol 50.” IP protocol 50 represents “ESP”, commonly used for IPsec. Apparently Solaris eats these packets. Haven’t figured out why.

I remembered seeing something in the vpnc manpage about ESP packets:

--natt-mode <natt/none/force-natt/cisco-udp>

      Which NAT-Traversal Method to use:
      o    natt -- NAT-T as defined in RFC3947
      o    none -- disable use of any NAT-T method
      o    force-natt -- always use NAT-T encapsulation  even
           without presence of a NAT device (useful if the OS
           captures all ESP traffic)
      o    cisco-udp -- Cisco proprietary UDP  encapsulation,
           commonly over Port 10000

I enabled force-natt mode, which encapsulates the ESP packet in a UDP packet, normally to get past NAT, and it started working! In retrospect, I should have been able to figure that out much easier. First, it pretty much says it on the vpnc homepage: “Solaris (7 works, 9 only with –natt-mode forced).” I didn’t even notice that. Second, I should have realized that I was behind a NAT at home and not at work, so they would be using a different NAT-traversal mode by default. Oh well, it was a good diagnostic exercise, hence the post to share the experience.

In other vpnc related news, I’ve ported Kazuyoshi’s patch to the open_tun and solaris_close_tun functions of OpenVPN to the tun_open and tun_close functions of vpnc. His sets up the tunnel interface a little bit differently and adds TAP support. It solves the random problems vpnc had with bringing up the tunnel interface such as:

# ifconfig tun0
tun0: flags=10010008d0<POINTOPOINT,RUNNING,NOARP,MULTICAST,IPv4,FIXEDMTU> mtu 1412 index 8
        inet 128.164.xxx.yy --> 128.164.xxx.yy netmask ffffffff 
        ether f:ea:1:ff:ff:ff
# ifconfig tun0 up
ifconfig: setifflags: SIOCSLIFFLAGS: tun0: no such interface
# dmesg | grep tun0
Jul 23 14:56:05 swan ip: [ID 728316 kern.error] tun0: DL_BIND_REQ failed: DL_OUTSTATE

The changes are in the latest vpnc package available from my package repository.

VPNC for OpenSolaris

I’ve compiled VPNC and the requisite TUN/TAP driver for OpenSolaris so that I can access my work network from home. Kazuyoshi’s driver adds TAP functionality to the original TUN driver which hasn’t been updated in nine years. It’s a real testament to the stability of the OpenSolaris kernel ABI that the module still compiles, loads, and works properly.

All of the software can be installed from my repository onto build 111 or higher:

$ pfexec pkg set-publisher -O http://pkg.thestaticvoid.com/ thestaticvoid
$ pfexec pkg install vpnc

The tun driver should load automatically and create /dev/tun. Now create a VPN profile configuration in /etc/vpnc/. The configuration contains a lot of private information so I’m not going to share mine here, but /etc/vpnc/default.conf is a good start.

One thing I do like to do is make sure only certain subnets are tunneled through the VPN. That way connecting to the VPN doesn’t interrupt any connections that are already established (for example, AIM). To do that I create a script /etc/vpnc/gwu-networks-script containing

#!/bin/sh

# Only tunnel GWU networks through VPN
CISCO_SPLIT_INC=2
CISCO_SPLIT_INC_0_ADDR=161.253.0.0
CISCO_SPLIT_INC_0_MASK=255.255.0.0
CISCO_SPLIT_INC_0_MASKLEN=16
CISCO_SPLIT_INC_0_PROTOCOL=0
CISCO_SPLIT_INC_0_SPORT=0
CISCO_SPLIT_INC_0_DPORT=0
CISCO_SPLIT_INC_1_ADDR=128.164.0.0
CISCO_SPLIT_INC_1_MASK=255.255.0.0
CISCO_SPLIT_INC_1_MASKLEN=16
CISCO_SPLIT_INC_1_PROTOCOL=0
CISCO_SPLIT_INC_1_SPORT=0
CISCO_SPLIT_INC_1_DPORT=0

. /etc/vpnc/vpnc-script

then add Script /etc/vpnc/gwu-networks-script to the end of my VPN profile configuration.

Connecting to the VPN you should see messages like:

$ pfexec vpnc gwu
Enter password for jameslee@<no>: 
which: no ip in (/sbin:/usr/sbin:/usr/gnu/bin:/usr/bin:/usr/sbin:/sbin)
which: no ip in (/sbin:/usr/sbin:/usr/gnu/bin:/usr/bin:/usr/sbin:/sbin)
add net 128.164.<no>: gateway 128.164.<no>
add host 128.164.<no>: gateway 161.253.<no>
add net 161.253.0.0: gateway 128.164.<no>
add net 128.164.0.0: gateway 128.164.<no>
add net 128.164.<no>: gateway 128.164.<no>
add net 128.164.<no>: gateway 128.164.<no>
VPNC started in background (pid: 594)...

The vpnc-script will modify your /etc/resolv.conf and routing tables so be sure to run vpnc-disconnect when you are done with the connection to restore the original configuration.

Thanks to the good folks at OpenConnect for a well-maintained vpnc-script which works on Solaris. Spec files for these packages are available from my GitHub repository if you want to roll your own.