With some care, it turns out to be possible to spoof fake DNS responses using fragmented datagrams. While preparing a presentation for XS4ALL back in 2009, I found out how this could be done, but I never got round to formally publishing the technique. The presentation was however made available.
In 2013, Amir Herzberg & Haya Shulman (while at Bar Ilan University) published a paper called Fragmentation Considered Poisonous. In this paper they explain how fragmented DNS responses can be used for cache poisoning. Later that year CZNIC presented about this paper and its techniques at RIPE 67.
A stunning 72 papers cite the original article, but as of 2018 not too many people know about this cache poisoning method.
More recently, The Register reported that another team, also involving Dr Shulman (now at Fraunhofer Institute for Secure Information Technology), has been able to use fragmented DNS responses to acquire certificates for domain names whose nameservers they do not control. They were able to demonstrate this in real life, which is a remarkable achievement. Incidentally, this team includes Amit Klein who in 2008 discovered & reported a weakness in PowerDNS.
Full details will be presented at the ACM Conference on Computer and Communications Security in Toronto, October 18. This presentation will also propose countermeasures.
Meanwhile, in this post, I hope to explain a (likely) part of their technique.
Whole datagram DNS spoofing
To match bona fide DNS responses to their corresponding queries, resolvers and operating system check:
- Name of the query
- Type of the query
- Source/destination address
- Destination port (16 bits)
- DNS transaction ID (16 bits)
The first three items can be predictable, the last two aren’t supposed to be. To spoof in a false response therefore means we need to guess 32 bits of random. To do so, the attacker needs to send the resolver lots and lots of fake answers with guesses for destination port and the transaction ID. Over (prolonged) time, their chosen response arrives ahead of the authentic response, is accepted, and they are able to spoof a domain name. Profit.
In practice this turns out to be very hard to do. The 32 bit requirement plus the short timeframe in which to send false responses means that as far as I know, this has been demonstrated in a lab setting just once. Anecdotal reports of blindly spoofing a fully randomized source port resolver have not been substantiated.
DNS queries and responses can be carried in UDP datagrams. A UDP datagram can be many kilobytes in size – far larger than most UDP packets. This means that a sufficiently large UDP response datagram can get split up into multiple packets. These are then called fragments.
Such fragments travel the network separately, to be joined together again on receipt.
Fragmented DNS responses happen occasionally with DNSSEC, for example in this case:
$ dig -t mx isc.org @ams.sns-pb.isc.org +dnssec -4 +bufsize=16000 43.028963 IP 192.168.1.228.44751 > 220.127.116.11.53: 20903+ [1au] MX? isc.org. (48) 43.035379 IP 18.104.22.168.53 > 192.168.1.228.44751: 20903*- 3/5/21 MX mx.ams1.isc.org. 20, MX mx.pao1.isc.org. 10, RRSIG (1472) 43.035391 IP 22.214.171.124 > 192.168.1.228: ip-proto-17
The final line represents a fragment, which only notes it is UDP (protocol 17).
Matching fragments together is quite comparable to matching DNS queries to responses. Every IP packet, even a fragment, carries a 16 bit number called an IPID. This IPID is not copied from the query to the response, it is picked by the DNS responder.
On receipt, fragments are grouped by IPID, after which the checksum of the reassembled datagram is checked. If correct, the DNS response gets forwarded to the resolver process.
If we want to spoof a DNS response, we could pick a DNS query that leads to a fragmented datagram, and then try to spoof only the second fragment. On first sight, this does not appear to be much easier as we now need to guess the IPID (16 bits) and we also need to make sure the checksum of the whole datagram matches (another 16 bits). This then also requires a 32 bit guess to succeed.
However, if we send a server a DNS query, it will most of the time send the same DNS response to everyone who asks (also for fragmented answers). In other words, if the attacker wants to spoof a certain response, it will know exactly what that response looks like – with the exception of the destination port and the DNS transaction ID (32 bits).
But note that both of these unpredictable parts are in the first fragment. The second fragment is completely static, except for the IPID. Now for the clever bit.
The ‘internet checksum’ is literally .. a sum. So the checksum of the entire datagram consists of the checksum of the first fragment plus the checksum of the second fragment (modulo 16 bits).
This means that to make sure the whole reassembled datagram passes the checksum test, all we have to do is make sure that our fake second fragment has the same known partial checksum as the original. We can pick the checksum of our fake second segment easily through the TTL of the our chosen response record.
This leaves us with only 16 bits to guess, which given the birthday paradox is not that hard.
Randomness of the IPID
So how random is the IPID, does it even represent a 16-bit challenge? According to the 2013 paper, some operating systems pick the IPID from a global counter. This means an attacker can learn the currently used IPID and predict the one used for the next response with pretty good accuracy.
Other operating systems use an IPID that increments per destination which means we can’t remotely guess the IPID. It turns out however that through clever use of multiple fragments, this still allows an attacker to “capture” one of these. See the original paper for details.
Is that it?
Definitely not. In order to get a certificate issued falsely using this technique requires several additional elements. First we must be able to force many questions. Secondly, we must make sure that the original authoritative server fragments the answer just right. There are ways to do both, but they are not easy.
I await the presentation at the ACM conference in October eagerly – but I’m pretty sure it will build on the technique outlined above.
In the meantime, DNSSEC does actually protect against this vulnerability, but it does require that your domain is signed and that your CA validates. This may not yet be the case.
DNS lookups occur for every website visited. The processor of DNS requests gets a complete picture of what a household or phone is doing on the internet. In addition, DNS can be used to block sites or to discover if devices are accessing malware or are part of a botnet.
(for the tl;dr, please skip right to the summary at the end)
Recently, we’ve seen Cloudflare (rumoured to be heading to IPO soon) get interested in improving your DNS privacy. Through a collaboration with Mozilla, Cloudflare is offering to move Firefox DNS lookups from the subscriber’s service provider straight onto its own systems. From a variety of blog posts it appears that Mozilla is aiming to make this the new default, although we also hear the decision has not yet been taken and that other organizations beyond Cloudflare may be involved. This new DNS service will be encrypted, using a protocol called DNS over HTTPS.
We are currently living in strange times where companies are willing to offer us services for “free” in return for access to our data. This data can then be used for profiling purposes (targeted advertising) or competitive analysis (market intelligence, for example what kinds of people visit what sites etc). In this way, if you are getting something for free, you frequently aren’t the customer, you are the product.
In addition, once our data flows through a third party, it is possible for that third party to influence what we see or how well things work: Gmail moving your school newsletter to the never opened ‘Promotional’ tab, Facebook suddenly no longer displaying your updates to users unless you pay up, Outlook.com deciding that most independent email providers should end up in the spam folder.
At Open-Xchange and PowerDNS, we think further centralization of the internet is a bad thing in and of itself, so we are not happy about the idea of moving DNS to a large, central, third party. Centralization means permissionless innovation becomes harder, when it was this very permissionless innovation that gave us the internet as we know it today.
We do of course applaud giving users a choice of encrypted DNS providers. Our worry is about the mulled plan to switch users over by default, or asking users to make an uninformed choice to switch to “better, more private DNS”, without making sure consumers know what is going on. Because that ‘OK, Got It’ button will frequently just get clicked.
Beyond our worries about centralization however there are concrete reasons to think twice before changing the DNS trust model & moving queries to a third party by default.
What will change?
When a user wants to visit ‘www.wikipedia.org’, the browser first looks up the IP address for this site. As it stands, by default, the service provider nameserver is consulted for this purpose. The setting for this is hidden in the Cable/DSL/FTTH-modem or phone. In the newly proposed world, the browser would ask Cloudflare for the IP address of ‘www.wikipedia.org’. Cloudflare says it takes your privacy more seriously than telecommunication service providers do because this DNS query will be encrypted, unlike regular DNS. They also promise not to sell your data or engage in user profiling.
Interestingly, this claim cannot be true in Europe.The EU GDPR and telecom regulations greatly limit what ISPs could do with the data. Selling it on is absolutely forbidden. Service providers would be risking 4% revenue fines because doing this secretly would be in stark violation of the GDPR, Europe’s privacy regulation.
In other countries, service providers do indeed study and use their user’s traffic patterns for marketing purposes.
So given this, under what circumstances would it be ok for Cloudflare (or any other third party) to take over our DNS by default?
Cloudflare is a Content Delivery Network (CDN). CDNs serve website content & videos from servers across the globe, so that content is closer to the end-user. As it stands, large scale CDNs like Akamai, Fastly, Google, Level3 and Cloudflare cooperate and coordinate intimately with service providers, to the point of co-locating caches within ISP networks to guarantee rapid delivery of content. When connecting to ‘www.whitehouse.gov’ for example, it is entirely possible to end up on an Akamai server hosted within your own service provider in the city you live in. Only two companies were then involved in delivering that page to you: your ISP and Akamai. Neither your request, nor the response ever left your own country.
In the proposed future where Cloudflare does our DNS, all queries go through their networks first before we reach content hosted by them, or their competitors. We can legitimately wonder if Cloudflare will diligently work to protect the interests of its competitors and deliver the best service it can.
Interestingly enough, as of today, at least for KPN (a national service provider in The Netherlands) and www.whitehouse.gov this is not true: the IP address we mostly get from the KPN servers is 20% closer in terms of latency, and is reached through Internet peering. The IP address we get via Cloudflare is slower and additionally reached through IP transit, which is more expensive for both KPN and Akamai. Cloudflare is therefore slowing down access to an Akamai hosted website, at higher cost for everyone involved. Cloudflare, incidentally, explains that this is because of privacy reasons.
Any new default DNS provider should commit to working with all its competitors to deliver service that is as good as would have been provided through the service providers’ DNS.
Any chokepoint of communications is susceptible to government blocking orders and legal procedures. In some countries the government shows up with a (long) list of what domains to block, in other countries this happens only after a series of long-winded lawsuits. In addition, child pornography researchers (& law enforcement organizations) frequently provide lists of domains they think should be blocked, and these often are.
Local service providers typically fight attempts to block popular content, since their subscribers don’t like it. Once an international DNS provider is the default for lookups, it can also expect government orders and other legal efforts aimed to get domain names blocked.
A new default DNS provider should document its policies on how it will deal with lawsuits and government orders commanding it to block traffic. At the very least, blocks should be constrained regionally. It should also document what content they would block out of their own accord.
Without going all “Snowden” on this subject, many governments grant themselves rights to intercept foreign communications with far less oversight than if they were intercepting national traffic. In other words, citizens of country X enjoy far less privacy protection in country Y. This is not a controversial statement and is explicitly written out in many countries’ interception laws and regulations. But the upshot is that for privacy, it pays to keep DNS within the country where you are a citizen.
In addition, most countries have legislated that communications service providers can and must break their own contracts, terms and conditions to comply with government interception orders. In other words, even though a company has committed in writing to not share your data with anyone, if the government shows up, they can be forced to do so anyhow.
It may well be that a third party DNS provider operates under a regime that has an interest in the DNS traffic that gets sent to it from all over the world.
New centralised DNS providers should document which governments have interception powers over them and be honest about their chances of standing up to such interception.
DNS is currently under control of your network provider – which could be your employer, your coffee shop or frequently, your (Internet) service provider. Enterprise environments often filter DNS for malware related traffic, blocking requests for known harmful domain names. They will also use query logs to spot infected devices. Increasingly, large scale service providers are also offering DNS based malware filtering, especially in the UK.
When moving DNS to a centralised provider, such local filtering no longer functions. Enterprise network administrators will also lose visibility into what traverses their network. From the standpoint of the individual employee this may be great but it is not what the network operator wanted.
Interestingly enough, DNS over HTTPS has specifically been designed to be hard to block, as the designers envisioned that network operators would attempt to use firewall rules to disable forms of DNS they could not monitor or control.
When asking users if they should move their DNS to a new provider, they should be reminded they may be losing protection that was previously provided to them by their service provider or employer network administrators.
Is your service provider actually spying on you?
If we want to assess the benefit of moving DNS to a third party by default, it is important to know if we are being spied upon in the first place. In some cases and in some countries, this is definitely true. In Russia and China, DNS is routinely intercepted and even changed. Also, some providers replace ‘this domain does not exist’ DNS answers by the IP address of a ‘search page’ with advertisements.
But in many places, local service providers are bound by stringent rules that forbid any spying or profiling, mostly countries that fall under the European GDPR or GDPR inspired legislation.
It has been argued that users are not sophisticated enough to reason about this subject and that the DNS move should happen by default, with an opt-out for those that care. Another idea that has been raised is a startup dialogue that proposes a more secure internet experience and a ‘Got it!’ button. This clearly does not go far enough in educating users about the change they will be authorizing.
Before moving DNS to a third party, users should be surveyed if they feel their current provider is spying on them or not, and if they think the new third party DNS provider would be an improvement. The outcome will likely be different per region. This survey could then lead to a well-designed, localized, opt-in procedure.
Having a choice of (encrypted) DNS providers is good. Mozilla is pondering moving DNS resolution to a third party by default, initially Cloudflare. Before doing so, any third party should commit to:
- Network neutrality: promise to work with competitors to ensure performance for other CDNs does not deteriorate compared to when the service provider DNS was used
- A policy on blocking: how will the provider deal with government blocking requests or lawsuits demanding that content will be blocked.
- Warning users the new DNS may not offer safety features they got from the network DNS provider
- Being clear about the legislations it operates under: which governments could force it into large scale interception?
Finally, Mozilla should survey its users to find out their attitudes towards moving DNS from their current service provider to Cloudflare. To do so, those users must first be well informed about what such a move would mean. Based on the survey results, an honest consent page can be generated that makes sure users know what they are agreeing to.
We want to thank Rudolf van der Berg and Remco van Mook for their comments & input for this post. These opinions are ours alone though.
We’ve just released PowerDNS Recursor version 4.1.4, this is a maintenance release with no major changes.
One new setting was added, this is the
max-udp-queries-per-round, which controls the maximum amount of messages the Recursor will handle before other mthreads are scheduled. Its default should be high enough for nearly all users.
The changelog looks as follows:
We’re pleased to announce the availability of the PowerDNS Authoritative Server version 4.1.3. This is a maintenance release addressing a performance issue in the GeoIP backend and fixes several other issues.
The changelog is below, the full changelog can be found in the documentation.
- #6441, #6614: Restrict creation of OPT and TSIG RRsets
- #6228, #6370: Fix handling of user-defined axfr filters return values
- #6584, #6585, #6608: Prevent the GeoIP backend from copying NetMaskTrees around, fixes slow-downs in certain configurations (Aki Tuomi)
- #6654, #6659: Ensure alias answers over TCP have correct name
The tarball is on the downloads website (sig), packages for CentOS 6 and 7, Ubuntu Trusty, Xenial, Artful and Bionic, Debian Jessie and Stretch and Raspbian Jessie are available from the repositories.
This is the third release in the 4.1 train. Besides bug fixes, it contains some performance and usability improvements.
Please find the most important changes below. For full details, visit the changelog.
- API: increase serial after dnssec related updates
- Dnsreplay: bail out on a too small outgoing buffer
- lower ‘packet too short’ loglevel
- Make check-zone error on rows that have content but shouldn’t
- avoid an isane amount of new backend connections during an axfr
- Report unparseable data in stoul invalid_argument exception
- recheck serial when axfr is done
- add tcp support for alias
- allocate new statements after reconnecting to postgresql
- bindbackend: only compare ips in ismaster() (Kees Monshouwer)
- Rather than crash, sheepishly report no file/linenum
- Document undocumented config vars
- prevent cname + other data with dnsupdate
This release improves the stability and resiliency of the RPZ implementation and fixes several issues related to EDNS Client Subnet.
The full changelog looks like this:
- #6344: Add FFI version of
- #6336, #6293, #6237: Retry loading RPZ zones from server when they fail initially.
- #6300: Fix ECS-based cache entry refresh code.
- #6320: Fix ECS-specific NS AAAA not being returned from the cache.
This week was my first IETF visit. Although I’ve been active in several IETF WGs for nearly twenty years, I had never bothered to show up in person. I now realize this was a very big mistake – I thoroughly enjoyed meeting an extremely high concentration of capable and committed people. While RIPE, various NOG/NOFs and DNS-OARC are great venues as well, nothing is quite the circus of activity that an IETF meeting is. Much recommended!
Before visiting I read up on recent DNS standardization activity, and I noted a ton of stuff was going on. In our development work, I had also been noticing that many of the new DNS features interact in unexpected ways. In fact, there appears to be somewhat of a combinatorial explosion going on in terms of complexity.
As an example, DNAME and DNSSEC are separate features, but it turns out DNAME can only work with DNSSEC with special handling. And every time a new outgoing feature is introduced, like for exampled DNS cookies, new probing is required to detect authoritative servers that get confused by such newfangled stuff.
This led me to propose a last minute talk (video!) to the DNSOP Working Group, which I tentatively called “The DNS Camel, or, how many features can we add to this protocol before it breaks”. This ended up on the agenda as “The DNS Camel” (with no further explanation) which intrigued everyone greatly. I want to thank DNSOP chairs Suzanne and Tim for accommodating my talk which was submitted at the last moment!
Note: My “DNS is too big” story is far from original! Earlier work includes “DNS Complexity” by Paul Vixie in the ACM Queue and RFC 8324 “DNS Privacy, Authorization, Special Uses, Encoding, Characters, Matching, and Root Structure: Time for Another Look” by John Klensin. Randy Bush presented on this subject in 2000 and even has a slide describing DNS as a camel!
Based on a wonderful chart compiled by ISC, I found that DNS is now described by at least 185 RFCs. Some shell-scripting and HTML scraping later, I found that this adds up to 2781 printed pages, comfortably more than two copies of “The C++ Programming Language (4th edition)”. This book is not known for its brevity.
In graph form, I summarised the rise of DNS complexity as above. My claim is that this rise is not innocent. As DNS becomes more complex, the number of people that “get it” also goes down. Notably, the advent of DNSSEC caused a number of implementations to drop out (MaraDNS, MyDNS, for example).
Also, with the rise in complexity and the decrease in number of capable contributers, the inevitable result is a drop in quality:
And in fact, with the advent of DNSSEC this is what we found. For several years, security & stability bugs in popular nameserver implementations were absolutely dominated by DNSSEC and cryptography related issues.
My claim is that we are heading for that territory again.
So how did this happen? We all love DNS and we don’t want to see it harmed in any way. Traditionally, protocol or product evolution is guided by forces pulling and pushing on it.
Requirements from operators ‘pull’ DNS in the direction of greater complexity. Implementors meanwhile usually push back on such changes because they fear future bugs, and because they usually have enough to do already. Operators, additionally, are weary of complexity: they are the ones on call 24/7 to fix problems. They don’t want their 3AM remedial work to be any harder than it has to be.
Finally, the standardization community may also find things that need fixing. Standardizers work hard to make the internet better (the new IETF motto I think), and they find lots of things that could be improved – either practically or theoretically.
In the DNS world, we have the unique situation that (resolver) operator feedback is largely absent. Only a few operators manifest themselves in the standardization community (Cloudflare, Comcast, Google, Salesforce being notably present). Specifically, almost no resolver operator (access provider) ever speaks at WG meetings or writes on mailing lists. In reality, large scale resolver operators are exceptionally weary of new DNS features and turn off whatever features they can to preserve their night time rest.
On the developer front, the DNS world is truly blessed with some of the most gifted programmers in the world. The current crop of resolvers and authoritative servers is truly excellent. DNS may well be the best served protocol in existence today. This high level of skill also has a downside however. DNS developers frequently see immense complexity not as a problem but as a welcome challenge to be overcome. We say yes to things we should say no to. Less gifted developer communities would have to say no automatically since they simply would not be able to implement all that new stuff. We do not have this problem. We’re also too proud to say we find something (too) hard.
Finally, the standardization community has its own issues. A ‘show of hands’ made it clear that almost no one in the WG session was actually on call for DNS issues. Standardizers enjoy complexity but do not personally bear the costs of that complexity. Standardizers are not on 24/7 call as there rarely is a need for an emergency 3AM standardization session!
Notably, a few years ago I was informed by RFC authors that ‘NSEC3’ was easy. We in the implementation community meanwhile were pondering that the ‘3’ in NSEC3 probably stood for the number of people that understood this RRTYPE! I can also report that as of 2018, the major DNSSEC validator implementations still encounter NSEC3 corner cases where it is not clear what the intended behaviour is.
Note that our standardizers, like our developers, are extremely smart people. This however is again a mixed blessing – this talent creates at the very least an acceptance of complexity and a desire to conquer really hard problems, possibly in very clever ways.
The net result of the various forces on DNS not being checked is obvious: more and more complex features.
Orthogonality of features
As noted above, adding a lot of features can lead to a combinatorial explosion. DNSSEC has to know about DNAME. CZNic contributed related the following gem they discovered during the implementation of ‘aggressive NSEC for NXDOMAIN detection’: it collides with trust-anchor signalling. The TA signalling happens in the form of a query to the root that leads to an NXDOMAIN, with associated NSEC records. These NSEC records then shut up further TA signalling, as no TA related names apparently exist! And here two unrelated features now need to know about each other: aggressive NSEC needs to be disabled for TA signalling.
If even a limited number of features overlap (ie, are not fully orthogonal), soon the whole codebase consists of features interacting with each other.
We’re well on our way there, and this will lead to a reduction in quality, likely followed by a period of stasis where NO innovation is allowed anymore. And this would be bad. DNS is still not private and there is a lot of work to do.
I rounded off my talk with a few simple suggestions:
Quickly a 20 person long queue formed at the mic. It turns out that while I may have correctly diagnosed a problem, and that there is wide agreement that we are digging a hole for ourselves, I had not given sufficient thought about any solutions.
IETF GROW WG chair Job Snijders noted that the BGP-related WGs have implemented different constituencies (vendors, operators) that all have to agree. In addition, interoperable implementations are a requirement before a draft can progress to standard. This alone would cut back significantly on the flow of new standards.
Other speakers with experience in hardware and commercial software noted that in their world the commercial vendors provided ample feedback to not make life too difficult, or that such complexity would at least come at huge monetary cost. Since in open source features are free, we do not “benefit” from that feedback.
There was enthusiasm for the idea of going through the “200 DNS RFCs” and deprecating stuff we no longer thought was a good idea. This enthusiasm was more in theory than in practice though as it is known to be soul crushing work.
The concept however of reducing at least the growth in DNS complexity was very well received. And in fact, in subsequent days, there was frequent discussion about the “DNS Camel”:
And in fact, a draft has even been written that simplifies DNS by specifying DNS implementations no longer need to probe for EDNS0 support. The name of the draft? draft-spacek-edns-camel-diet-00!
I’m somewhat frightened of the amount of attention my presentation got, but happy to conclude it apparently struck a nerve that needed to be struck.
So what are the next steps? There is a lot to ponder.
I’ve been urged by several very persuasive people to not only rant about the problem but to also contribute to the solution, and I’ve decided these people are right. So please watch this space!