Fun With vpnc

I recently got a new laptop at work and I decided to put OpenSolaris on it. This meant I had to setup vpnc in order to access the server networks and wireless here. I installed my vpnc package, copied the profile from my Ubuntu workstation, and started it up. It connected, but no packets flowed. I didn’t have time to investigate, so I decided to work on it some more at home.

The strange thing is that it connected from home with the very same profile and everything worked fine. I immediately suspected something was wrong with the routing tables, like maybe some of the routes installed by vpnc-script were conflicting with the routes necessary to talk to the VPN concentrator. I endlessly compared the routing tables between work and home and my working Ubuntu workstation, removing routes, adding routes, and manually constructing the routing table until I was positive it could not be that.

Everything I pinged worked. I could ping the concentrator. I could ping the gateway. I could ping the tunnel device. I could ping the physical interface—or so I thought.

As I was preparing to write a message to the vpnc-devel mailing list requesting help, I did some pings to post the output in the email. I ran

$ ping <concentrator ip>
<concentrator ip> is alive

which looked good, but I wanted the full ping output, so I ran

$ ping -s <concentrator ip>
PING <concentrator ip>: 56 data bytes
^C
----<concentrator ip> PING Statistics----
4 packets transmitted, 1 packets received, 75% packet loss
round-trip (ms)  min/avg/max/stddev = 9223372036854776.000/0.000/0.000/-NaN

For some reason, only the first ping was getting through. The rest were getting hung up somewhere. The really strange thing was that I saw the same behavior on the local physical interface:

$ ifconfig bge0
bge0: flags=1004843 mtu 1500 index 3
        inet 161.253.143.151 netmask ffffff00 broadcast 161.253.143.255
$ ping -s 161.253.143.151
PING 161.253.143.151: 56 data bytes
^C
----161.253.143.151 PING Statistics----
5 packets transmitted, 1 packets received, 80% packet loss
round-trip (ms)  min/avg/max/stddev = 9223372036854776.000/0.000/0.000/-NaN

I have never seen a situation where you couldn’t even ping a local physical interface! I checked and double checked that IPFilter wasn’t running. Finally I started a packet capture of the physical interface to see what was happening to my pings:

# snoop -d bge0 icmp
Using device bge0 (promiscuous mode)
161.253.143.151 -> <concentrator ip> ICMP Destination unreachable (Bad protocol 50)
161.253.143.151 -> <concentrator ip> ICMP Destination unreachable (Bad protocol 50)
161.253.143.151 -> <concentrator ip> ICMP Destination unreachable (Bad protocol 50)
^C

That’s when by chance I saw messages being sent to the VPN concentrator saying “bad protocol 50.” IP protocol 50 represents “ESP”, commonly used for IPsec. Apparently Solaris eats these packets. Haven’t figured out why.

I remembered seeing something in the vpnc manpage about ESP packets:

--natt-mode <natt/none/force-natt/cisco-udp>

      Which NAT-Traversal Method to use:
      o    natt -- NAT-T as defined in RFC3947
      o    none -- disable use of any NAT-T method
      o    force-natt -- always use NAT-T encapsulation  even
           without presence of a NAT device (useful if the OS
           captures all ESP traffic)
      o    cisco-udp -- Cisco proprietary UDP  encapsulation,
           commonly over Port 10000

I enabled force-natt mode, which encapsulates the ESP packet in a UDP packet, normally to get past NAT, and it started working! In retrospect, I should have been able to figure that out much easier. First, it pretty much says it on the vpnc homepage: “Solaris (7 works, 9 only with –natt-mode forced).” I didn’t even notice that. Second, I should have realized that I was behind a NAT at home and not at work, so they would be using a different NAT-traversal mode by default. Oh well, it was a good diagnostic exercise, hence the post to share the experience.

In other vpnc related news, I’ve ported Kazuyoshi’s patch to the open_tun and solaris_close_tun functions of OpenVPN to the tun_open and tun_close functions of vpnc. His sets up the tunnel interface a little bit differently and adds TAP support. It solves the random problems vpnc had with bringing up the tunnel interface such as:

# ifconfig tun0
tun0: flags=10010008d0<POINTOPOINT,RUNNING,NOARP,MULTICAST,IPv4,FIXEDMTU> mtu 1412 index 8
        inet 128.164.xxx.yy --> 128.164.xxx.yy netmask ffffffff 
        ether f:ea:1:ff:ff:ff
# ifconfig tun0 up
ifconfig: setifflags: SIOCSLIFFLAGS: tun0: no such interface
# dmesg | grep tun0
Jul 23 14:56:05 swan ip: [ID 728316 kern.error] tun0: DL_BIND_REQ failed: DL_OUTSTATE

The changes are in the latest vpnc package available from my package repository.