Diverting recursor-to-auth attacks

We get frequent reports from users/customers about various DNS-related attacks they are facing on either their authoritatives or recursors. This post focuses on one kind of attack that involves both. (Cloudmark wrote about this some time ago as well).

The attack works like this: given a target domain of example.com, the attacker takes his botnet, and has it fire off high amounts of $RANDOM.example.com queries. These queries go from the infected hosts to their recursors (i.e. normal ISP recursors). The recursors then need to go out to the auths, after all they don’t have any random string in cache.

When this attack starts, there is no packet count amplification – bot sends query to recursor, recursor sends to auth, answer flows back, done. However, if enough of this happens, one or more of the auths for example.com may get overloaded, and start responding more slowly, or not respond at all. The recursors will then either retry or move on to other auths, spreading the attack in the most effective and destructive way over the auths.

These attacks are painful, especially for authoritatives backed by (SQL) databases, like many PowerDNS users are running. Legitimate traffic for existing names gets cached very well inside pdns_server, but even if you put a wildcard in your database, these random queries will cause an SQL query, and those are costly.

Because SQL and random names are a bad fit, we get requests for being able to combine the bindbackend and an SQL backend in one pdns_server process. This works, but does not have the desired effect of offloading the SQL – we query both before sending out a response. So, something else needs to happen. While pondering that question this week, a few ideas came up:

  1. use IPTables u32 to match queries for the victim domain, and redirect them (I understand this can be done without generating a lot of state)
  2. teach dnsdist to pick backends based on domain name
  3. somehow get the recursors to redirect their traffic

I did not try ideas 1 and 2; I trust they will work in practice, and will effectively remove load from the SQL backends, but they still involve handling the whole malicious query load on the same server pipe. Luckily, it turns out idea 3 is feasible.

The idea behind 3 is to convince a recursor it is talking to the wrong machines, by virtue of sending it a new NSset in the AUTHORITY section of a response. Some authoritative servers will include the NSset from the zone in every response, but PowerDNS does not do this – so we need another trick.

Some time ago we added an experimental, internal-use-only feature to the Authoritative Server called lua-prequery, to be used specifically for our Recursor regression tests. While never designed for production usage, we can abuse it to make idea 3 work.

require 'posix'

function endswith(s, send)
 return #s >= #send and s:find(send, #s-#send+1, true) and true or false
end

function prequery ( dnspacket )
 qname, qtype = dnspacket:getQuestion()
 print(os.time(), qname,qtype)
 if endswith(qname, '.example.com') and posix.stat('/etc/powerdns/dropit')
 then
   dnspacket:setRcode(pdns.NXDOMAIN)
   ret = {}
   ret[1] = {qname='example.com', qtype=pdns.NS, content="ns-nosql.example.com", place=2, ttl=30}
   ret[2] = {qname='example.com', qtype=pdns.NS, content="ns-nosql2.example.com", place=2, ttl=30}
   dnspacket:addRecords(ret)
   return true
 end
 return false
end

(A careful reader noted that the stat() call, while cached, may not be the most efficient way to enable/disable this thing. Caveat emptor.)

This piece of code, combined with a reference to it in pdns.conf (‘lua-prequery-script=/etc/powerdns/prequery.lua‘), will cause pdns_server to send authoritative NXDOMAINs for any query ending in example.com, and include a new NSset, suggesting the recursor go look ‘over there’.

In our testing, BIND simply ignored the new NSset (we did not investigate why). PowerDNS Recursor believes what you tell it, and will stick to it until the TTL (30 seconds in this example) runs out. Unbound will also believe you, but if none of the machines you redirect it to actually work, it will come back. So, in general we recommend you point the traffic to a set of machines that can give valid replies.

In a lab setting, we found that with both Unbound and PowerDNS Recursor, this approach can move -all- traffic from your normal nameservers to the offload hosts, except for a few packets every TTL seconds. Depending on attack rate and TTL, this easily means offloading >99.9% of traffic, assuming no BIND is involved. In the real world, where some ISPs do use BIND for recursion, you won’t hit 99% or 90% but this approach may still help a lot.

We have not tried this on a real world attack, yet.

What’s next?

If you are under such an attack, and would like to give this a shot, please contact us, we’d love to try this on a real attack!

If you feel like toying around with this (I really want to find out how to make BIND cooperate, but I ran out of time), please get in touch (IRC preferred), I want to talk to you :-)

PowerDNS “Graphing as a Service”

Over the past few months,we’ve worked on our graphing tool, which has proved to be a wonderful aid in debugging. If you want to get the best help from us in diagnosing your problems, read on.

PowerDNS Authoritative Server and PowerDNS Recursor can both emit statistics using the ‘carbon’ format as used by Graphite. This means that feeding your stats into the very powerful Graphite software is as easy as setting:

carbon-server=2001:888:2000:1d::2

And from that point on, your PowerDNS product will send statistics to that IP address every 30 seconds. The target address should either run Graphite, or our own developed Metronome. Metronome is less powerful than Graphite, but very easy to setup. Graphs can be configured in Javascript, and out of the box, Metronome comes with support for graphing all PowerDNS products, plus the output of a small statistics program we wrote that emits network and CPU statistics.

Screenshot from 2014-12-11 14:31:11

To make this easier for everyone, we also run a public instance of Metronome on http://xs.powerdns.com/metronome/, and you are welcome to configure your PowerDNS products to send statistics by putting the following in pdns.conf or recursor.conf:

carbon-server=82.94.213.34    # the IPv6 address works as well

carbon-ourname=pick-something

Your data will then appear in the dropdown with the name ‘pick-something’. If you don’t pick your own name, PowerDNS will use your hostname, but you might consider that to be too revealing. It might also trample existing data. Carbon travels over port 2003, and if PowerDNS can’t connect to the server, nothing is interrupted.

For the PowerDNS Recursor, you can even enable carbon reporting at runtime using:

rec_control set-carbon-server 82.94.213.34 pick-something

To disable, run ‘rec_control set-carbon-server’.

If you have any (performance) issues with PowerDNS you want help with, it is tremendously useful to turn on reporting to our Metronome instance – very often we can spot your problem from the graph quickly.

Screenshot from 2014-12-11 14:37:20

Now, our public Metronome service is of course public, but you can obscure your data by picking an innocuous carbon-ourname.

Private Metronome service is also available for holders of PowerDNS support agreements. Finally, Metronome is easy to install, so you can also benefit from it locally.

Enjoy!

PowerDNS Security Advisory 2014-02

PowerDNS Security Advisory 2014-02: PowerDNS Recursor 3.6.1 and earlier can be made to provide bad service

Hi everybody,

Please be aware of PowerDNS Security Advisory 2014-02, which you can also find below. The good news is that the currently released version of the PowerDNS Recursor is safe. The bad news is that users of older versions will have to upgrade.

PowerDNS Recursor 3.6.2, released late October, is in wide production use and has been working well for our users. If however you have reasons not to upgrade, the advisory below contains a link to a patch which applies to older versions.

Finally, if you have problems upgrading, please either contact us on our mailing lists, or privately via powerdns.support@powerdns.com (should you wish to make use of our SLA-backed support program).

We want to thank Florian Maury of French government information security agency ANSSI for bringing this issue to our attention and coordinating the security release with us and other nameserver vendors.

  • CVE: CVE-2014-8601
  • Date: 8th of December 2014
  • Credit: Florian Maury (ANSSI)
  • Affects: PowerDNS Recursor versions 3.6.1 and earlier
  • Not affected: PowerDNS Recursor 3.6.2; no versions of PowerDNS Authoritative Server
  • Severity: High
  • Impact: Degraded service
  • Exploit: This problem can be triggered by sending queries for specifically configured domains
  • Risk of system compromise: No
  • Solution: Upgrade to PowerDNS Recursor 3.6.2
  • Workaround: None known. Exposure can be limited by configuring the allow-from setting so only trusted users can query your nameserver.

Recently we released PowerDNS Recursor 3.6.2 with a new feature that strictly limits the amount of work we’ll perform to resolve a single query. This feature was inspired by performance degradations noted when resolving  domains hosted by ‘ezdns.it’, which can require thousands of queries to  resolve.

During the 3.6.2 release process, we were contacted by a government security agency with news that they had found that all major caching nameservers, including PowerDNS, could be negatively impacted by specially configured, hard to resolve domain names. With their permission, we continued the 3.6.2 release process with the fix for the issue already in there.

We recommend that all users upgrade to 3.6.2 if at all possible. Alternatively, if you want to apply a minimal fix to your own tree, it can be found here, including patches for older versions.

As for workarounds, only clients in allow-from are able to trigger the degraded service, so this should be limited to your userbase.

Note that in addition to providing bad service, this issue can be abused to send unwanted traffic to an unwilling third party. Please see ANSSI’s report for more information.

Recursor 3.6.2

Note
Version 3.6.2 is a bugfix update to 3.6.1. Released on the 30th of October 2014.

Official download page

A list of changes since 3.6.1 follows.

  • commit ab14b4f: expedite servfail generation for ezdns-like failures (fully abort query resolving if we hit more than 50 outqueries)
  • commit 42025be: PowerDNS now polls the security status of a release at startup and periodically. More detail on this feature, and how to turn it off, can be found in Section 2, “Security polling”.
  • commit 5027429: We did not transmit the right ‘local’ socket address to Lua for TCP/IP queries in the recursor. In addition, we would attempt to lookup a filedescriptor that wasn’t there in an unlocked map which could conceivably lead to crashes. Closes ticket 1828, thanks Winfried for reporting
  • commit 752756c: Sync embedded yahttp copy. API: Replace HTTP Basic auth with static key in custom header
  • commit 6fdd40d: add missing #include <pthread.h> to rec-channel.hh (this fixes building on OS X).

Authoritative Server 3.4.1

Warning
Version 3.4.1. of the PowerDNS Authoritative Server is a major upgrade if you are coming from 2.9.x. Additionally, if you are coming from any 3.x version (including 3.3.1), there is a mandatory SQL schema upgrade. Please refer toSection 6, “From PowerDNS Authoritative Server 3.3.1 to 3.4.0” and any relevant sections before it, before deploying this version. There are no 3.4.1 upgrade notes.
[Note] Note
Released October 30th, 2014

Find the downloads on our download page.

This is a bugfix update to 3.4.0 and any earlier version.

A list of changes since 3.4.0 follows.

PowerDNS Security Status Polling

PowerDNS software sadly sometimes has critical security bugs. Even though we send out notifications of these via all channels available, our recent security releases have taught us that not everybody actually finds out about important security updates via our mailing lists, Facebook and Twitter.

To solve this, the development versions of PowerDNS software have been updated to  poll for security notifications over DNS, and log these periodically. Secondly, the security status of the software is available for monitoring using the built-in metrics. This allows operators to poll for the PowerDNS security status and alert on it.

In the implementation of this idea, we have taken the unique role of operating system distributors into account. Specifically, we can deal with backported security fixes.

This feature can easily be disabled, and operators can also point the queries point at their own status service.

In this post, we want to inform you that the most recent snapshots of PowerDNS now include security polling, and we want to solicit your rapid feedback before this feature becomes part of the next PowerDNS releases.

Implementation

PowerDNS software periodically tries to resolve ‘auth-x.y.z.security-status.secpoll.powerdns.com|TXT’ or ‘recursor-x.y.z.security-status.secpoll.powerdns.com|TXT’ (if the security-poll-suffix setting is left at the default of secpoll.powerdns.com). No other data is included in the request.

The data returned is in one of the following forms:

  • NXDOMAIN or resolution failure
  • “1 Ok” -> security-status=1
  • “2 Upgrade recommended for security reasons, see http://powerdns.com/..” -> security-status=2
  • “3 Upgrade mandatory for security reasons, see http://powerdns.com/..” -> security-status=3

In cases 2 or 3, periodic logging commences at syslog level ‘Error’. The metric security-status is set to 2 or 3 respectively. The security status could be lowered however if we discover the issue is less urgent than we thought.

If resolution fails, and the previous security-status was 1, the new security-status becomes 0 (‘no data’). If the security-status was higher than 1, it will remain that way, and not get set to 0. In this way, security-status of 0 really means ‘no data’, and can not mask a known problem.

Distributions

Distributions frequently backport security fixes to the PowerDNS versions they ship. This might lead to a version number that is known to us to be insecure to be secure in reality.

To solve this issue, PowerDNS can be compiled with a distribution setting which will move the security polls from: ‘auth-x.y.z.security-status.secpoll.powerdns.com’ to ‘auth-x.y.z-n.debian.security-status.secpoll.powerdns.com

Note two things, one, there is a separate namespace for debian, and secondly, we use the package version of this release. This allows us to know that 3.6.0-1 (say) is insecure, but that 3.6.0-2 is not.

Details and how to disable

The configuration setting ‘security-poll-suffix’ is by default set to ‘secpoll.powerdns.com’. If empty, nothing is polled. This can be moved to ‘secpoll.yourorganization.com’. Our up to date secpoll zonefile is available on github for this purpose.

If compiled with PACKAGEVERSION=3.1.6-abcde.debian, queries will be sent to “auth-3.1.6-abcde.debian.security-status.security-poll-suffix”.

Delegation

If a distribution wants to host its own file with version information, we can delegate dist.security-status.secpoll.powerdns.com to their nameservers directly.

PowerDNS Authoritative Server 3.4.0 released

[Warning] Warning
Version 3.4.0 of the PowerDNS Authoritative Server is a major upgrade if you are coming from 2.9.x. Additionally, if you are coming from any 3.x version (including 3.3.1), there is a mandatory SQL schema upgrade. Please refer to Section 6, “From PowerDNS Authoritative Server 3.3.1 to 3.4.0” and any relevant sections before it, before deploying this version.
[Note] Note
Released September 30th, 2014

Find the downloads on our download page.

This is a performance, feature, bugfix and conformity update to 3.3.1 and any earlier version. It contains a huge amount of work by various contributors, to whom we are very grateful.

A list of changes since 3.3.1 follows.

Changes between RC2 and 3.4.0:

Changes between RC1 and RC2:

Changes between 3.3.1 and 3.4.0-RC1 follow.

DNSSEC changes:

  • commit bba8413: add option (max-signature-cache-entries) to limit the maximum number of cached signatures.
  • commit 28b66a9: limit the number of NSEC3 iterations (see RFC5155 10.3), with the max-nsec3-iterations option.
  • commit b50efd6: drop the ‘superfluous NSEC3′ option that old BIND validators need.
  • The bindbackend ‘hybrid’ mode was reintroduced by Kees Monshouwer. Enable it with bind-hybrid.
  • Aki Tuomi contributed experimental PKCS#11 support for DNSSEC key management with a (Soft)HSM.
  • Direct RRSIG queries now return NOTIMP.
  • commit fa37777: add secure-all-zones command to pdnssec
  • Unrectified zones can now get rectified ‘on the fly’ during outgoing AXFR. This makes it possible to run a hidden signing master without rectification.
  • commit 82fb538: AXFR in: don’t accept zones with a mixture of Opt-Out NSEC3 RRs and non-Opt-Out NSEC3 RRs
  • Various minor bugfixes, mostly from the unstoppable Kees Monshouwer.
  • commit 0c4c552: set non-zero exit status in pdnssec if an exception was thrown, for easier automatic usage.
  • commit b8bd119: pdnssec -v show-zone: Print all keys instead of just entry point keys.
  • commit 52e0d78: answer direct NSEC queries without DO bit
  • commit ca2eb01: output ZSK DNSKEY records if experimental-direct-dnskey support is enabled
  • commit 83609e2: SOA-EDIT: fix INCEPTION-INCREMENT handling
  • commit ac4a2f1: AXFR-out can handle secure and insecure NSEC3 optout delegations
  • commit ff47302: AXFR-in can handle secure and insecure NSEC3 optout delegations

New features:

  • DNAME support. Enable with experimental-dname-processing.
  • PowerDNS can now send stats directly to Carbon servers. Enable with carbon-server, tweak with carbon-ourname and carbon-interval.
  • commit 767da1a: Add list-zone capability to pdns_control
  • commit 51f6bca: Add delete-zone to pdnssec.
  • The gsql backends now support record comments, and disabling records.
  • The new reuseport config option allows setting SO_REUSEPORT, which allows for some performance improvements.
  • local-address-nonexist-fail and local-ipv6-nonexist-fail allow pdns to start up even if some addresses fail to bind.
  • ‘AXFR-SOURCE’ in domainmetadata sets the source address for an AXFR retrieval.
  • commit 451ba51: Implement pdnssec get-meta/set-meta
  • Experimental RFC2136/DNS UPDATE support from Ruben d’Arco, with extensive testing by Kees Monshouwer.
  • pdns_control bind-add-zone
  • New option bind-ignore-broken-records ignores out-of-zone records while loading zone files.
  • pdnssec now has commands for TSIG key management.
  • We now support other algorithms than MD5 for TSIG.
  • commit ba7244a: implement pdns_control qtypes
  • Support for += syntax for options

Bugfixes:

  • We verify the algorithm used for TSIG queries, and use the right algorithm in signing if there is possible confusion. Plus a few minor TSIG-related fixes.
  • commit ff99a74: making *-threads settings empty now yields a default of one instead of zero.
  • commit 9215e60: we had a deadly embrace in getUpdatedMasters in bindbackend reimplementation, thanks to Winfried for detailed debugging!
  • commit 9245fd9: don’t addSuckRequest after supermaster zone creation to avoid one cause of simultaneous AXFR for the same zone
  • commit 719f902: fix dual-stack superslave when multiple namservers share a ip
  • commit 33966bf: avoid address truncation in doNotifications
  • commit eac85b1: prevent duplicate slave notications caused by different ipv6 address formatting
  • commit 3c8a711: make notification queue ipv6 compatible
  • commit 0c13e45: make isMaster ip check more tolerant for different ipv6 notations
  • Various fixes for possible issues reported by Coverity Scan (commit f17c93b, )
  • commit 9083987: don’t rely on included polarssl header files when using system polarssl. Spotted by Oden Eriksson of Mandriva, thanks!
  • Various users reported pdns_control hangs, especially when using the guardian. We are confident that all causes of these hangs are now gone.
  • Decreasing the webserver ringbuffer size could cause crashes.
  • commit 4c89cce: nproxy: Add missing chdir(“/”) after chroot()
  • commit 016a0ab: actually notice timeout during AXFR retrieve, thanks hkraal

REST API changes:

  • The REST API was much improved and is nearing stability, thanks to Christian Hofstaedtler and others.
  • Mark Schouten at Tuxis contributed a zone importer.

Other changes:

  • Our tarballs and packages now include *.sql schema files for the SQL backends.
  • The webserver (including API) now has an ACL (webserver-allow-from).
  • Webserver (including API) is now powered by YaHTTP.
  • Various autotools usage improvements from Ruben Kerkhof.
  • The dist tarball is now bzip2-compressed instead of gzip.
  • Various remotebackend updates, including replacing curl with (included) yahttp.
  • Dynamic module loading is now allowed on Mac OS X.
  • The AXFR ACL (allow-axfr-ips) now defaults to 127.0.0.0/8,::1 instead of the whole world.
  • commit ba91c2f: remove unused gpgsql-socket option and document postgres socket usage
  • Improved support for Lua 5.2.
  • The edns-subnet option code is now fixed at 8, and the edns-subnet-option-numbers option has been removed.
  • geobackend now has very limited edns-subnet support – it will use the ‘real’ remote if available.
  • pipebackend ABI v4 adds the zone name to the AXFR command.
  • We now avoid getaddrinfo() as much as possible.
  • The packet cache now handles (forwarded) recursive answers better, including TTL aging and respecting allow-recursion.
  • commit ff5ba4f: pdns_server –help no longer exits with 1.
  • Mark Zealey contributed an experimental LMDB backend. Kees Monshouwer added experimental DNSSEC support to it. Thanks, both!
  • commit 81859ba: No longer attempt to answer questions coming in from port 0, reply would not reach them anyhow. Thanks to Niels Bakker and sid3windr for insight & debugging. Closes ticket 844.
  • RCodes are now reported in text in various places, thanks Aki.
  • Kees Monshouwer set up automatic testing for the oracle and goracle backends, and fixed various issues in them.
  • Leftovers of previous support for Windows have been removed, thanks to Kees Monshouwer, Aki Tuomi.
  • Bundled PolarSSL has been upgraded to 1.3.2
  • PolarSSL replaced previously bundled implementations of AES (commit e22d9b4) and SHA (commit 9101035)
  • bindbackend is now a module
  • commit 14a2e52: Use the inet data type for supermasters.ip on postgrsql.
  • We now send an empty SERVFAIL when a CNAME chain is too long, instead of including the partial chain.
  • commit 3613a51: Show built-in features in –version output
  • commit 4bd7d35: make domainmetadata queries case insensitive
  • commit 088c334: output warning message when no to be notified NS’s are found
  • commit 5631b44: gpsqlbackend: use empty defaults for dbname and user; libpq will use the current user name for both by default
  • commit d87ded3: implement udp-truncation-threshold to override the previous 1680 byte maximum response datagram size – no matter what EDNS0 said. Plus document it.
  • Implement udp-truncation-threshold to override the previous 1680 byte maximum response datagram size – no matter what EDNS0 said.
  • On shutdown, PowerDNS now attempts to stop all processes in its process group, especially useful for pipe/remotebackend users. Feature donated by Spotify.
  • Removed settings related to fancy records, as we haven’t supported those since version 3.0
  • Based on earlier work by Mark Zealey, Kees Monshouwer increased our packet cache performance between 200% and 500% depending on the situation, by simplifying some code in commit 801812e and commit 8403ade.