I recently got a new laptop at work and I decided to put OpenSolaris on it. This meant I had to setup vpnc in order to access the server networks and wireless here. I installed my vpnc package, copied the profile from my Ubuntu workstation, and started it up. It connected, but no packets flowed. I didn’t have time to investigate, so I decided to work on it some more at home.
The strange thing is that it connected from home with the very same profile and everything worked fine. I immediately suspected something was wrong with the routing tables, like maybe some of the routes installed by vpnc-script were conflicting with the routes necessary to talk to the VPN concentrator. I endlessly compared the routing tables between work and home and my working Ubuntu workstation, removing routes, adding routes, and manually constructing the routing table until I was positive it could not be that.
Everything I pinged worked. I could ping the concentrator. I could ping the gateway. I could ping the tunnel device. I could ping the physical interface—or so I thought.
As I was preparing to write a message to the vpnc-devel mailing list requesting help, I did some pings to post the output in the email. I ran
$ ping <concentrator ip> <concentrator ip> is alive
which looked good, but I wanted the full ping output, so I ran
$ ping -s <concentrator ip> PING <concentrator ip>: 56 data bytes ^C ----<concentrator ip> PING Statistics---- 4 packets transmitted, 1 packets received, 75% packet loss round-trip (ms) min/avg/max/stddev = 9223372036854776.000/0.000/0.000/-NaN
For some reason, only the first ping was getting through. The rest were getting hung up somewhere. The really strange thing was that I saw the same behavior on the local physical interface:
$ ifconfig bge0 bge0: flags=1004843mtu 1500 index 3 inet 161.253.143.151 netmask ffffff00 broadcast 161.253.143.255 $ ping -s 161.253.143.151 PING 161.253.143.151: 56 data bytes ^C ----161.253.143.151 PING Statistics---- 5 packets transmitted, 1 packets received, 80% packet loss round-trip (ms) min/avg/max/stddev = 9223372036854776.000/0.000/0.000/-NaN
I have never seen a situation where you couldn’t even ping a local physical interface! I checked and double checked that IPFilter wasn’t running. Finally I started a packet capture of the physical interface to see what was happening to my pings:
# snoop -d bge0 icmp Using device bge0 (promiscuous mode) 161.253.143.151 -> <concentrator ip> ICMP Destination unreachable (Bad protocol 50) 161.253.143.151 -> <concentrator ip> ICMP Destination unreachable (Bad protocol 50) 161.253.143.151 -> <concentrator ip> ICMP Destination unreachable (Bad protocol 50) ^C
That’s when by chance I saw messages being sent to the VPN concentrator saying “bad protocol 50.” IP protocol 50 represents “ESP”, commonly used for IPsec. Apparently Solaris eats these packets. Haven’t figured out why.
I remembered seeing something in the vpnc manpage about ESP packets:
Which NAT-Traversal Method to use:
o natt -- NAT-T as defined in RFC3947
o none -- disable use of any NAT-T method
o force-natt -- always use NAT-T encapsulation even
without presence of a NAT device (useful if the OS
captures all ESP traffic)
o cisco-udp -- Cisco proprietary UDP encapsulation,
commonly over Port 10000
I enabled force-natt mode, which encapsulates the ESP packet in a UDP packet, normally to get past NAT, and it started working! In retrospect, I should have been able to figure that out much easier. First, it pretty much says it on the vpnc homepage: “Solaris (7 works, 9 only with –natt-mode forced).” I didn’t even notice that. Second, I should have realized that I was behind a NAT at home and not at work, so they would be using a different NAT-traversal mode by default. Oh well, it was a good diagnostic exercise, hence the post to share the experience.
In other vpnc related news, I’ve ported Kazuyoshi’s patch to the open_tun and solaris_close_tun functions of OpenVPN to the tun_open and tun_close functions of vpnc. His sets up the tunnel interface a little bit differently and adds TAP support. It solves the random problems vpnc had with bringing up the tunnel interface such as:
# ifconfig tun0 tun0: flags=10010008d0<POINTOPOINT,RUNNING,NOARP,MULTICAST,IPv4,FIXEDMTU> mtu 1412 index 8 inet 128.164.xxx.yy --> 128.164.xxx.yy netmask ffffffff ether f:ea:1:ff:ff:ff # ifconfig tun0 up ifconfig: setifflags: SIOCSLIFFLAGS: tun0: no such interface # dmesg | grep tun0 Jul 23 14:56:05 swan ip: [ID 728316 kern.error] tun0: DL_BIND_REQ failed: DL_OUTSTATE
The changes are in the latest vpnc package available from my package repository.