On binding datagram (UDP) sockets to the ANY addresses
This story goes back a long time. For around 10 years now, people have been requesting that PowerDNS learn how to automatically listen on all available IP addresses. And for slightly less than that time, we’ve been telling people we would not be adding that feature.
For one, if you run a nameserver, you should *know* what IP addresses you listen on! How else could people delegate to you, or rely on you to resolve their queries? Secondly, running services by default on ‘all’ IP addresses is a security risk. The PowerDNS Recursor for this reason binds to 127.0.0.1 by default.
But still, people wanted this feature, and we didn’t do it. Because we knew it’d be hard work. There, the truth is out. But we finally bit the bullet and had to figure out how to do it. This page shares that knowledge, including the fact that the Linux manpages tell you to do the wrong thing.
There are two ways to listen on all addresses, one of which is to enumerate all interfaces, grab all their IP addresses, and bind to all of them. Lots of work, and non-portable work too. We really did not want to do that. You also need to monitor new addresses arriving.
Secondly, just bind to 0.0.0.0 and ::! This works very well for TCP and other connection-oriented protocols, but can fail silently for UDP and other connectionless protocols. How come? When a packet comes in on 0.0.0.0, we don’t know which IP address it was sent to. And this is a problem when replying to such a packet – what would the correct source address be? Because we are connectionless (and therefore stateless), the kernel doesn’t know what to do.
So it picks the most appropriate address, and that may be the wrong one. There are some heuristics that make some kernels do the right thing more reliably, but there are no guarantees.
When receiving packets on datagram sockets, we usually use recvfrom(2), but this does not provide the missing bit of data: which IP address the packet was actually sent to. There is no recvfromto(). Enter the very powerful recvmsg(2). Recvmsg() allows for the getting of a boatload of parameters per datagram, as requested via setsockopt().
One of the parameters we can request is the original destination IP address of the packet.
IPv6
For IPv6, this is actually standardized in RFC 3542, which tells us to request parameter IPV6_RECVPKTINFO via setsockopt(), which will lead to the delivery of the IPV6_PKTINFO parameter when we use recvmsg(2).
This parameter is sent to us as a struct in6_pktinfo, and its ipi6_addr member contains the original destination IPv6 address of the query.
When replying to a packet from a socket bound to ::, we have the reverse problem: how to specify which *source* address to use. To do so, use sendmsg(2) and specify an IPV6_PKTINFO parameter, which again contains a struct in6_pktinfo.
And we are done!
To get this to work on OSX, please #define __APPLE_USE_RFC_3542, but otherwise this feature is portable across FreeBSD, OSX and Linux. (Please let me know about Windows, I want to make this page as valuable as possible).
Thanks for this blog post! I’ve recently implemented support for IP_PKTINFO in socket_wrapper (see http://cwrap.org/ ). I wondered why I only find IPV6_PKTINFO in FreeBSD. Well you provided the answer. Time to implement support for IP_RECVDSTADDR.
cwrap has a new wrapper called resolv_wrapper which can be used to specify your own resolv.conf for testing or fake dns queries!
I was looking for something else (I use BIND and I’m not actually after anything DNS related) but I noticed something that you might want to change (or then again maybe not). Admittedly I’m still waking up and generally I am sleep deprived, but when I first saw this sentence (half-way awake):
> Secondly, just bind to 0.0.0.0 and ::!
I thought immediately that it was a typo and you meant ::1 (which would be localhost6). It took a moment for it to register, though: just like 0.0.0.0 is the wildcard IPv4 address so too is :: for the IPv6 wildcard address. But the font face makes it look like it is part of the address and given the similarity to 1 and the exclamation mark it makes it seem like you meant the ‘::1’.
Just a random thought that might or might not be only as valuable as ‘::’ indicates…
Thanks for this great post!
I had an issue related to UDP source address selection when listening on `0.0.0.0`, and this post helps identify the problem.
The application protocol is Kerberos instead of DNS, but this issue stands correct for all UDP packets.
By the way, the example link at the end redirects to github wiki page, and I was unable to find the source code. Can you please update the link?
IP_SENDSRCADDR does not seem to exist on OSX, even though IP_RECVDSTADDR is defined.
Try doing the first thing the article suggests?
#define __APPLE_USE_RFC_3542
And it should be done before any #include …
If you’ve tried that you might want to look into another name; different platforms have different names for the constants. For instance sometimes EWOULDBLOCK is called EAGAIN or the same value even.
I should say for some constants, rather.
Thanks for the wonderful explanation. Saved me lots of time.
So in FreeBSD struct in_addr contains a port? – because in linux it doesn’t … (it’s specified as .sin_port in struct sockaddr_in; the .sin_addr member is of type struct in_addr and only contains an ip as .s_addr) …