On binding datagram (UDP) sockets to the ANY addresses

This story goes back a long time. For around 10 years now, people have been requesting that PowerDNS learn how to automatically listen on all available IP addresses. And for slightly less than that time, we’ve been telling people we would not be adding that feature.

For one, if you run a nameserver, you should *know* what IP addresses you listen on! How else could people delegate to you, or rely on you to resolve their queries? Secondly, running services by default on ‘all’ IP addresses is a security risk. The PowerDNS Recursor for this reason binds to 127.0.0.1 by default.

But still, people wanted this feature, and we didn’t do it. Because we knew it’d be hard work. There, the truth is out. But we finally bit the bullet and had to figure out how to do it. This page shares that knowledge, including the fact that the Linux manpages tell you to do the wrong thing.

There are two ways to listen on all addresses, one of which is to enumerate all interfaces, grab all their IP addresses, and bind to all of them. Lots of work, and non-portable work too.  We really did not want to do that. You also need to monitor new addresses arriving.

Secondly, just bind to 0.0.0.0 and ::! This works very well for TCP and other connection-oriented protocols, but can fail silently for UDP and other connectionless protocols. How come? When a packet comes in on 0.0.0.0, we don’t know which IP address it was sent to. And this is a problem when replying to such a packet – what would the correct source address be? Because we are connectionless (and therefore stateless), the kernel doesn’t know what to do.

So it picks the most appropriate address, and that may be the wrong one. There are some heuristics that make some kernels do the right thing more reliably, but there are no guarantees.

When receiving packets on datagram sockets, we usually use recvfrom(2), but this does not provide the missing bit of data: which IP address the packet was actually sent to. There is no recvfromto(). Enter the very powerful recvmsg(2). Recvmsg() allows for the getting of a boatload of parameters per datagram, as requested via setsockopt().

One of the parameters we can request is the original destination IP address of the packet.

IPv6

For IPv6, this is actually standardized in RFC 3542, which tells us to request parameter IPV6_RECVPKTINFO via setsockopt(), which will lead to the delivery of the IPV6_PKTINFO parameter when we use recvmsg(2).

This parameter is sent to us as a struct in6_pktinfo, and its ipi6_addr member contains the original destination IPv6 address of the query.

When replying to a packet from a socket bound to ::, we have the reverse problem: how to specify which *source* address to use. To do so, use sendmsg(2) and specify an IPV6_PKTINFO parameter, which again contains a struct in6_pktinfo.

And we are done!

To get this to work on OSX, please #define __APPLE_USE_RFC_3542, but otherwise this feature is portable across FreeBSD, OSX and Linux. (Please let me know about Windows, I want to make this page as valuable as possible).

IPv4
For IPv4 the situation is more complicated. Linux and the BSDs picked a slightly different way to do things, since they did not have an RFC to guide them. Confusingly, the Linux manpages document this incorrectly (I’ll submit a patch to the manpages as soon as everybody agrees that this page describes things correctly).
For BSD, use a setsockopt() called IP_RECVDSTADDR to request the original destination address. This then arrives as an IP_RECVDSTADDR option over recvmsg(), which carries a struct in_addr, which does NOT necessarily have all fields filled out (like for example the destination port number).
For Linux, use the setsockopt() called IP_PKTINFO, which will get you a parameter over recvmsg() called IP_PKTINFO, which carries a struct in_pktinfo, which has a 4 byte IP address hiding in its ipi_addr field.
Conversely, for sending on Linux pass a IP_PKTINFO parameter using sendmsg()  and make it contain a struct in_pktinfo.
On FreeBSD, pass the IP_SENDSRCADDR option, and make it contain a struct in_addr, but again note that it probably does not make sense to set the source port in there, as your socket is bound to exactly one port number (even if it covers many IP addresses).

Binding to :: for IPv6 *and* IPv4 purposes

On Linux, one can bind to :: and get packets destined for both IPv6 and IPv4. The good news is that this combines well with the above, and Linux delivers an IPv4 IP_PKTINFO for IPv4 packets, and will also honour the IP_PKTINFO for outgoing IPv4 packets on such a combined IPv4/IPv6 socket.
On FreeBSD, and probably other BSD-derived systems, one should bind explicitly to :: and 0.0.0.0 to cover IPv4 and IPv6. This is probably better. To get this behaviour on Linux, use the setsockopt() IPV6_V6ONLY, or set /proc/sys/net/ipv6/bindv6only to 1.

Actual source code

To see all this in action, head over to https://github.com/PowerDNS/pdns/blob/master/pdns/nameserver.cc – it contains the relevant setsockopt(), sendmsg() and recvmsg() calls.

6 comments

  1. gladiac

    Thanks for this blog post! I’ve recently implemented support for IP_PKTINFO in socket_wrapper (see http://cwrap.org/ ). I wondered why I only find IPV6_PKTINFO in FreeBSD. Well you provided the answer. Time to implement support for IP_RECVDSTADDR.

  2. Pingback: New features in socket_wrapper 1.1.0 • Andreas Schneider
  3. Cody

    I was looking for something else (I use BIND and I’m not actually after anything DNS related) but I noticed something that you might want to change (or then again maybe not). Admittedly I’m still waking up and generally I am sleep deprived, but when I first saw this sentence (half-way awake):

    > Secondly, just bind to 0.0.0.0 and ::!
    I thought immediately that it was a typo and you meant ::1 (which would be localhost6). It took a moment for it to register, though: just like 0.0.0.0 is the wildcard IPv4 address so too is :: for the IPv6 wildcard address. But the font face makes it look like it is part of the address and given the similarity to 1 and the exclamation mark it makes it seem like you meant the ‘::1’.

    Just a random thought that might or might not be only as valuable as ‘::’ indicates…

  4. Pingback: Binding to an IPv6 Subnet | blabs.apnic.net
  5. cizixs

    Thanks for this great post!
    I had an issue related to UDP source address selection when listening on `0.0.0.0`, and this post helps identify the problem.
    The application protocol is Kerberos instead of DNS, but this issue stands correct for all UDP packets.

    By the way, the example link at the end redirects to github wiki page, and I was unable to find the source code. Can you please update the link?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s