Security Update: PowerDNS Recursor 3.6.1

Hi everybody,

We regret that we have to announce a PowerDNS Recursor security release:

Issue:    A specific sequence of packets can crash PowerDNS Recursor 3.6.0 remotely
CVE:      CVE-2014-3614
Affected: All deployments of PowerDNS Recursor 3.6.0 
Not Affected: 
          PowerDNS Authoritative Server, PowerDNS Recursor versions other than 3.6.0 
Workaround: 
          1) Only users from netmasks specified in 'allow-from' can cause the crash 
          2) add automated restarting
Remediation: 
          Upgrade to 3.6.1, or apply our minimal patch and recompile
          Distributions shipping 3.6.0 have been notified and will be providing updates very soon

Recently, we’ve discovered that PowerDNS Recursor 3.6.0 (but NOT earlier) can crash when exposed to a specific sequence of malformed packets. This sequence happened spontaneously with one of our largest deployments, and the packets did not appear to have a malicious origin.

Yet, this crash can be triggered remotely, leading to a denial of service attack. There appears to be no way to use this crash for system compromise or stack overflow.

PowerDNS Recursor 3.6.1 packages and sources are available from https://www.powerdns.com/downloads.html

In addition, if you want to apply a minimal fix, it can be found on: https://xs.powerdns.com/tmp/minipatch-3.6.1

Finally, distributions that ship PowerDNS Recursor 3.6.0 have been notified and will be providing updated packages soon.

As for workarounds, only clients in allow-from are able to trigger the crash, so this should be limited to your userbase. Secondly, https://github.com/PowerDNS/pdns/blob/master/contrib/upstart-recursor.conf
and https://github.com/PowerDNS/pdns/blob/master/contrib/systemd-pdns-recursor.service
can be used to enable Upstart and Systemd to restart the PowerDNS Recursor in case of a crash.

3.6.1 release notes:

In addition to various fixes related to this potential crash, 3.6.1 fixes a few minor issues and adds a debugging feature:

  • We could not encode IPv6 AAAA records that mapped to IPv4 addresses in some cases (:ffff.1.2.3.4). Fixed in commit c90fcbd , closing ticket 1663.
  • Improve systemd startup timing with respect to network availability (commit cf86c6a), thanks to Morten Stevens.
  • Realtime telemetry can now be enabled at runtime, for example with ‘rec_control carbon-server 82.94.213.34 ourname1234′. This ties in to our existing carbon-server and carbon-ourname settings, but now at runtime. This specific invocation will make your stats appear automatically on our public telemetry server.

We want to thank the dedicated PowerDNS users that spent months investigating the rare crashes they observed. Without such an engaged community, we would never be able to chase down issues like these.

If you need any help with upgrading, please contact us either on the mailing lists or via our website or phone.

Problems posting to our mailing lists

Hi everybody,

Over the past few months, we’d been receiving some reports of people having problems posting to our lists. At first, it appeared the problem was caused by subscribers emailing to the wrong list address. We have told a few of you that it was your fault.

It turns out that even though we know a thing or two about DNS, we managed to mess up our own. ‘mailman.powerdns.com’ was a CNAME. This in turn led some mail transfer agent software to rewrite posts to pdns-users at mailman.powerdns.com to pdns-users at xs.powerdns.com, and this would then fail within mailman.

We want to apologize for 1) causing this mess and 2) blaming our users for it. Sorry.

Thanks to Leo, Winfried and Ruben for nagging us about the issue. You were right.

The situation has now been resolved, and you should no longer have problems posting to our lists.

Bert

Authoritative Server 3.4.0 Release Candidate 1

[Warning] Warning
Version 3.4.0 of the PowerDNS Authoritative Server is a major upgrade if you are coming from 2.9.x. Additionally, if you are coming from any 3.x version (including 3.3.1), there is a mandatory SQL schema upgrade. Please refer to Section 6, “From PowerDNS Authoritative Server 3.3.1 to 3.4.0” and any relevant sections before it, before deploying this version.

This is a performance, feature, bugfix and conformity update to 3.3.1 and any earlier version. It contains a huge amount of work by various contributors, to whom we are very grateful.

A list of changes since 3.3.1 follows.

DNSSEC changes:

  • commit bba8413: add option (max-signature-cache-entries) to limit the maximum number of cached signatures.
  • commit 28b66a9: limit the number of NSEC3 iterations (see RFC5155 10.3), with the max-nsec3-iterations option.
  • commit b50efd6: drop the ‘superfluous NSEC3′ option that old BIND validators need.
  • The bindbackend ‘hybrid’ mode was reintroduced by Kees Monshouwer. Enable it with bind-hybrid.
  • Aki Tuomi contributed experimental PKCS#11 support for DNSSEC key management with a (Soft)HSM.
  • Direct RRSIG queries now return NOTIMP.
  • commit fa37777: add secure-all-zones command to pdnssec
  • Unrectified zones can now get rectified ‘on the fly’ during outgoing AXFR. This makes it possible to run a hidden signing master without rectification.
  • commit 82fb538: AXFR in: don’t accept zones with a mixture of Opt-Out NSEC3 RRs and non-Opt-Out NSEC3 RRs
  • Various minor bugfixes, mostly from the unstoppable Kees Monshouwer.
  • commit 0c4c552: set non-zero exit status in pdnssec if an exception was thrown, for easier automatic usage.
  • commit b8bd119: pdnssec -v show-zone: Print all keys instead of just entry point keys.
  • commit 52e0d78: answer direct NSEC queries without DO bit
  • commit ca2eb01: output ZSK DNSKEY records if experimental-direct-dnskey support is enabled
  • commit 83609e2: SOA-EDIT: fix INCEPTION-INCREMENT handling
  • commit ac4a2f1: AXFR-out can handle secure and insecure NSEC3 optout delegations
  • commit ff47302: AXFR-in can handle secure and insecure NSEC3 optout delegations

New features:

  • DNAME support. Enable with experimental-dname-processing.
  • PowerDNS can now send stats directly to Carbon servers. Enable with carbon-server, tweak with carbon-ourname and carbon-interval.
  • commit 767da1a: Add list-zone capability to pdns_control
  • commit 51f6bca: Add delete-zone to pdnssec.
  • The gsql backends now support record comments, and disabling records.
  • The new reuseport config option allows setting SO_REUSEPORT, which allows for some performance improvements.
  • local-address-nonexist-fail and local-ipv6-nonexist-fail allow pdns to start up even if some addresses fail to bind.
  • ‘AXFR-SOURCE’ in domainmetadata sets the source address for an AXFR retrieval.
  • commit 451ba51: Implement pdnssec get-meta/set-meta
  • Experimental RFC2136/DNS UPDATE support from Ruben d’Arco, with extensive testing by Kees Monshouwer.
  • pdns_control bind-add-zone
  • New option bind-ignore-broken-records ignores out-of-zone records while loading zone files.
  • pdnssec now has commands for TSIG key management.
  • We now support other algorithms than MD5 for TSIG.
  • commit ba7244a: implement pdns_control qtypes
  • Support for += syntax for options

Bugfixes:

  • We verify the algorithm used for TSIG queries, and use the right algorithm in signing if there is possible confusion. Plus a few minor TSIG-related fixes.
  • commit ff99a74: making *-threads settings empty now yields a default of one instead of zero.
  • commit 9215e60: we had a deadly embrace in getUpdatedMasters in bindbackend reimplementation, thanks to Winfried for detailed debugging!
  • commit 9245fd9: don’t addSuckRequest after supermaster zone creation to avoid one cause of simultaneous AXFR for the same zone
  • commit 719f902: fix dual-stack superslave when multiple namservers share a ip
  • commit 33966bf: avoid address truncation in doNotifications
  • commit eac85b1: prevent duplicate slave notications caused by different ipv6 address formatting
  • commit 3c8a711: make notification queue ipv6 compatible
  • commit 0c13e45: make isMaster ip check more tolerant for different ipv6 notations
  • Various fixes for possible issues reported by Coverity Scan (commit f17c93b, )
  • commit 9083987: don’t rely on included polarssl header files when using system polarssl. Spotted by Oden Eriksson of Mandriva, thanks!
  • Various users reported pdns_control hangs, especially when using the guardian. We are confident that all causes of these hangs are now gone.
  • Decreasing the webserver ringbuffer size could cause crashes.
  • commit 4c89cce: nproxy: Add missing chdir(“/”) after chroot()
  • commit 016a0ab: actually notice timeout during AXFR retrieve, thanks hkraal

REST API changes:

  • The REST API was much improved and is nearing stability, thanks to Christian Hofstaedtler and others.
  • Mark Schouten at Tuxis contributed a zone importer.

Other changes:

  • Our tarballs and packages now include *.sql schema files for the SQL backends.
  • The webserver (including API) now has an ACL (webserver-allow-from).
  • Webserver (including API) is now powered by YaHTTP.
  • Various autotools usage improvements from Ruben Kerkhof.
  • The dist tarball is now bzip2-compressed instead of gzip.
  • Various remotebackend updates, including replacing curl with (included) yahttp.
  • Dynamic module loading is now allowed on Mac OS X.
  • The AXFR ACL (allow-axfr-ips) now defaults to 127.0.0.0/8,::1 instead of the whole world.
  • commit ba91c2f: remove unused gpgsql-socket option and document postgres socket usage
  • Improved support for Lua 5.2.
  • The edns-subnet option code is now fixed at 8, and the edns-subnet-option-numbers option has been removed.
  • geobackend now has very limited edns-subnet support – it will use the ‘real’ remote if available.
  • pipebackend ABI v4 adds the zone name to the AXFR command.
  • We now avoid getaddrinfo() as much as possible.
  • The packet cache now handles (forwarded) recursive answers better, including TTL aging and respecting allow-recursion.
  • commit ff5ba4f: pdns_server –help no longer exits with 1.
  • Mark Zealey contributed an experimental LMDB backend. Kees Monshouwer added experimental DNSSEC support to it. Thanks, both!
  • commit 81859ba: No longer attempt to answer questions coming in from port 0, reply would not reach them anyhow. Thanks to Niels Bakker and sid3windr for insight & debugging. Closes ticket 844.
  • RCodes are now reported in text in various places, thanks Aki.
  • Kees Monshouwer set up automatic testing for the oracle and goracle backends, and fixed various issues in them.
  • Leftovers of previous support for Windows have been removed, thanks to Kees Monshouwer, Aki Tuomi.
  • Bundled PolarSSL has been upgraded to 1.3.2
  • PolarSSL replaced previously bundled implementations of AES (commit e22d9b4) and SHA (commit 9101035)
  • bindbackend is now a module
  • commit 14a2e52: Use the inet data type for supermasters.ip on postgrsql.
  • We now send an empty SERVFAIL when a CNAME chain is too long, instead of including the partial chain.
  • commit 3613a51: Show built-in features in –version output
  • commit 4bd7d35: make domainmetadata queries case insensitive
  • commit 088c334: output warning message when no to be notified NS’s are found
  • commit 5631b44: gpsqlbackend: use empty defaults for dbname and user; libpq will use the current user name for both by default
  • commit d87ded3: implement udp-truncation-threshold to override the previous 1680 byte maximum response datagram size – no matter what EDNS0 said. Plus document it.
  • Implement udp-truncation-threshold to override the previous 1680 byte maximum response datagram size – no matter what EDNS0 said.
  • On shutdown, PowerDNS now attempts to stop all processes in its process group, especially useful for pipe/remotebackend users. Feature donated by Spotify.
  • Removed settings related to fancy records, as we haven’t supported those since version 3.0
  • Based on earlier work by Mark Zealey, Kees Monshouwer increased our packet cache performance between 200% and 500% depending on the situation, by simplifying some code in commit 801812e and commit 8403ade.

Recursor 3.6.0 released

This is a performance, feature and bugfix update to 3.5/3.5.3. It contains important fixes for slightly broken domain names, which your users expect to work anyhow. It also brings robust resilience against certain classes of attacks.

Changes between RC1 and release:

 

New features:

  • commit aadceba: Implement minimum-ttl-override config setting, plus runtime configurability via ‘rec_control set-minimum-ttl’.
  • Lots of work on the JSON API, which is exposed via Aki Tuomi’s ‘yahttp’. Massive thanks to Christian Hofstaedtler for delivering this exciting new functionality. Documentation & demo forthcoming, but code to use it is available on GitHub.
  • Lua modules can now use ‘pdnslog(INFO..’), as described in ticket 1074, implemented in commit 674a305
  • Adopt any-to-tcp feature to the recursor. Based on a patch by Winfried Angele. Closes ticket 836commit 56b4d21 and commit e661a20.
  • commit 2c78bd5: implement built-in statistics dumper using the ‘carbon’ protocol, which is also understood by metronome (our mini-graphite). Use ‘carbon-server’, ‘carbon-ourname’ and ‘carbon-interval’ settings.
  • New setting ‘udp-truncation-threshold’ to configure from how many bytes we should truncate. commit a09a8ce.
  • Proper support for CHaos class for CHAOS TXT queries. commit c86e1f2, addition for lua in commit f94c53d, some warnings in commit 438db54 however.
  • Added support for Lua scripts to drop queries w/o further processing. commit 0478c54.
  • Kevin Holly added qtype statistics to recursor and rec_control (get-qtypelist) (commit 79332bf)
  • Add support for include-files in configuration, also reload ACLs and zones defined in them (commit 829849dcommit 242b90ecommit 302df81).
  • Paulo Anes contributed server-down-max-fails which helps combat Recursive DNS based amplification attacks. Described in this post. Also comes with new metric ‘failed-host-entries’ in commit 406f46f.
  • commit 21e7976: Implement “followCNAMERecords” feature in the Lua hooks.

Improvements:

  • commit 06ea901: make pdns-distributes-queries use a hash so related queries get sent to the same thread. Original idea by Winfried Angele. Astoundingly effective, approximately halves CPU usage!
  • commit b13e737: –help now writes to stdout instead of stderr. Thanks Winfried Angele.
  • To aid in limiting DoS attacks, when truncating a response, we actually truncate all the way so only the question remains. Suggested inticket 1092, code in commit add935a.
  • No longer experimental, the switch ‘pdns-distributes-queries’ can improve multi-threaded performance on Linux (various cleanup commits).
  • Update to embedded PolarSSL, plus remove previous AES implementation and shift to PolarSSL (commit e22d9b4commit 990ad9a)
  • commit 92c0733 moves various Lua magic constants into an enum namespace.
  • set group and supplementary groups before chroot (commit 6ee50ceticket 1198).
  • commit 4e9a20e: raise our socket buffer setting so it no longer generates a warning about lowering it.
  • commit 4e9a20e: warn about Linux suboptimal IPv6 settings if we detect them.
  • SIGUSR2 turns on a ‘trace’ of all DNS traffic, a second SIGUSR2 now turns it off again. commit 4f217ce.
  • Various fixes for Lua 5.2.
  • commit 81859ba: No longer attempt to answer questions coming in from port 0, reply would not reach them anyhow. Thanks to Niels Bakker and ‘sid3windr’ for insight & debugging. Closes ticket 844.
  • commit b1a2d6c: now, I’m not one to get OCD over things, but that log message about stats based on 1801 seconds got to me. 1800 now.

Fixes:

  • 0c9de4fc: stay away from getaddrinfo unless we really can’t help it for ascii ipv6 conversions to binary
  • commit 08f3f63: fix average latency calculation, closing ticket 424.
  • commit 75ba907: Some of our counters were still 32 bits, now 64.
  • commit 2f22827: Fix statistics and stability when running with pdns-distributes-queries.
  • commit 6196f90: avoid merging old and new additional data, fixes an issue caused by weird (but probably legal) Akamai behaviour
  • commit 3a8a4d6: make sure we don’t exceed the number of available filedescriptors for mthreads. Raises performance in case of DoS. See this post for further details.
  • commit 7313fe6: implement indexed packet cache wiping for recursor, orders of magnitude faster. Important when reloading all zones, which causes massive cache cleaning.
  • rec_control get-all would include ‘cache-bytes’ and ‘packetcache-bytes’, which were expensive operations, too expensive for frequent polling. Removed in commit 8e42d27.
  • All old workarounds for supporting Windows of the XP era have been removed.
  • Fix issues on S390X based systems which have unsigned characters (commit 916a0fd)

PowerDNS Jobs: are you available?

Hi everybody,

In short: there is a market for (small) PowerDNS jobs, and if you are available for such work, read on for where we’ll be sending people who need PowerDNS work done!

The longer story:

As PowerDNS use continues to increase, so does the number of inquiries we receive from operators that need (private) help. Such requests range from “can someone help me fix this error” or “can someone upgrade my PowerDNS from 2.9.22″ to designing & deploying installations for millions of users or domains.

PowerDNS itself and our certified consultants are able to field some of these requests, but not all of them. Sometimes we don’t have time, but other requests simply do not fit well with us or our consultants.

For example, we will typically not be able to do small scale one off upgrades or installations ourselves.

However, we do frequently get questions for such work. What we have now decided is whenever we can’t help a user commercially, we will be asking them to forward their job to elance.com and odesk.com. We’ll also be providing some guidance on how to identify experienced PowerDNS people.

So: if you have PowerDNS skills, and you are available for doing such jobs, know that we will be sending interested users to elance.com and odesk.com.

And if you have a profile there that mentions PowerDNS, you could be expecting some business.

If you are already doing consulting work, but via another site, please let us know, we can also send people your way there then.

Finally, please be advised that this is all in addition to our ‘regular’ support platforms: community based (mailing list, irc), professional (support agreements) and certified consultants. There is no change to those programs.

Thanks!

Recursor 3.6.0 Release Candidate 1

This is a performance, feature and bugfix update to 3.5/3.5.3. It contains important fixes for slightly broken domain names, which your users expect to work anyhow. It also brings robust resilience against certain classes of attacks.

New features:

  • commit aadceba: Implement minimum-ttl-override config setting, plus runtime configurability via ‘rec_control set-minimum-ttl’.
  • Lots of work on the JSON API, which is exposed via Aki Tuomi’s ‘yahttp’. Massive thanks to Christian Hofstaedtler for delivering this exciting new functionality. Documentation & demo forthcoming, but code to use it is available on GitHub.
  • Lua modules can now use ‘pdnslog(INFO..’), as described in ticket 1074, implemented in commit 674a305
  • Adopt any-to-tcp feature to the recursor. Based on a patch by Winfried Angele. Closes ticket 836commit 56b4d21 and commit e661a20.
  • commit 2c78bd5: implement built-in statistics dumper using the ‘carbon’ protocol, which is also understood by metronome (our mini-graphite). Use ‘carbon-server’, ‘carbon-ourname’ and ‘carbon-interval’ settings.
  • New setting ‘udp-truncation-threshold’ to configure from how many bytes we should truncate. commit a09a8ce.
  • Proper support for CHaos class for CHAOS TXT queries. commit c86e1f2, addition for lua in commit f94c53d, some warnings in commit 438db54 however.
  • Added support for Lua scripts to drop queries w/o further processing. commit 0478c54.
  • Kevin Holly added qtype statistics to recursor and rec_control (get-qtypelist) (commit 79332bf)
  • Add support for include-files in configuration, also reload ACLs and zones defined in them (commit 829849dcommit 242b90ecommit 302df81).
  • Paulo Anes contributed server-down-max-fails which helps combat Recursive DNS based amplification attacks. Described in this post. Also comes with new metric ‘failed-host-entries’ in commit 406f46f.
  • commit 21e7976: Implement “followCNAMERecords” feature in the Lua hooks.

Improvements:

  • commit 06ea901: make pdns-distributes-queries use a hash so related queries get sent to the same thread. Original idea by Winfried Angele. Astoundingly effective, approximately halves CPU usage!
  • commit b13e737: –help now writes to stdout instead of stderr. Thanks Winfried Angele.
  • To aid in limiting DoS attacks, when truncating a response, we actually truncate all the way so only the question remains. Suggested inticket 1092, code in commit add935a.
  • No longer experimental, the switch ‘pdns-distributes-queries’ can improve multi-threaded performance on Linux (various cleanup commits).
  • Update to embedded PolarSSL, plus remove previous AES implementation and shift to PolarSSL (commit e22d9b4commit 990ad9a)
  • commit 92c0733 moves various Lua magic constants into an enum namespace.
  • set group and supplementary groups before chroot (commit 6ee50ceticket 1198).
  • commit 4e9a20e: raise our socket buffer setting so it no longer generates a warning about lowering it.
  • commit 4e9a20e: warn about Linux suboptimal IPv6 settings if we detect them.
  • SIGUSR2 turns on a ‘trace’ of all DNS traffic, a second SIGUSR2 now turns it off again. commit 4f217ce.
  • Various fixes for Lua 5.2.
  • commit 81859ba: No longer attempt to answer questions coming in from port 0, reply would not reach them anyhow. Thanks to Niels Bakker and ‘sid3windr’ for insight & debugging. Closes ticket 844.
  • commit b1a2d6c: now, I’m not one to get OCD over things, but that log message about stats based on 1801 seconds got to me. 1800 now.

Fixes:

  • 0c9de4fc: stay away from getaddrinfo unless we really can’t help it for ascii ipv6 conversions to binary
  • commit 08f3f63: fix average latency calculation, closing ticket 424.
  • commit 75ba907: Some of our counters were still 32 bits, now 64.
  • commit 2f22827: Fix statistics and stability when running with pdns-distributes-queries.
  • commit 6196f90: avoid merging old and new additional data, fixes an issue caused by weird (but probably legal) Akamai behaviour
  • commit 3a8a4d6: make sure we don’t exceed the number of available filedescriptors for mthreads. Raises performance in case of DoS. See this post for further details.
  • commit 7313fe6: implement indexed packet cache wiping for recursor, orders of magnitude faster. Important when reloading all zones, which causes massive cache cleaning.
  • rec_control get-all would include ‘cache-bytes’ and ‘packetcache-bytes’, which were expensive operations, too expensive for frequent polling. Removed in commit 8e42d27.
  • All old workarounds for supporting Windows of the XP era have been removed.
  • Fix issues on S390X based systems which have unsigned characters (commit 916a0fd)

A surprising discovery on converting IPv6 addresses: we no longer prefer getaddrinfo()

Yesterday, we were contacted by PowerDNS user James Baer who noted strange crashes in PowerDNS (on Linux) upon adding thousands and thousands of IP addresses to his system. Notably, PowerDNS did not even use any of those thousands of addresses, but it still crashed. As James noted, this should not even be possible.

Quick investigation by PowerDNS contributors Aki Tuomi and Imre Gergely (thanks!) showed that the crashes emanated from a call to getaddrinfo() to convert an IPv6 address. Now, here over at PowerDNS, we are big fans of IPv6, to the point that we even presented at RIPE66, “Implementing Full IPv6 Support: More than Binding to an AF_INET6 Socket“. In this presentation, we noted “getaddrinfo() is _the only way_ to convert presentation format IPv4 and IPv6 addresses. Ignore anything else”.

In this post, we sadly have to recant this statement. In fact, we’d like to replace it with ‘getaddrinfo() considered slow and potentially dangerous in 2014′.

So, what did we find this morning? We were expecting our calls to getaddrinfo() to be a quick string conversion of things like ‘::1′ to the binary representation ‘1’. On reading the backtraces however, we found that getaddrinfo() was talking to the kernel, and enumerating all local IP addresses (!). I was actually stunned. We also found that Akamai had noted that this was crashing their code when using 64000 IP addresses. Because talking to the kernel this way is expensive, getaddrinfo() maintains a local cache for one second. This however adds the need for locking and memory management, which it predictably gets wrong.

And lo, bug reports about getaddrinfo() not being thread safe, slow, or crash prone abound. We’ve found issues reported for Linux, Solaris, FreeBSD and OSX.

Now, it should be noted that getaddrinfo()  offers us quite a few things. It can go out on the wire and do DNS queries. It can parse gai.conf to find how we want to sort our addresses, and which address families we prefer. These days, it can even perform Internationalized Domain Name (IDN) conversions! Also, getaddrinfo() is tasked with making sure ‘local IP addresses’ get returned in preference to remote ones (‘Rule 7′). These are indeed things that may require kernel communication.

However, no such complication is required when wanting to convert “::1″ into 1. And in fact, using the flags, we can tell getaddrinfo() not to do anything complicated (by omitting AI_ADDRCONFIG). Sadly, the vast majority of getaddrinfo() implementations currently in production get it wrong. This is probably an artifact of the huge mission statement that has been heaped on this function.

So, as of 2014, we consider getaddrinfo() to be avoided for IPv6 address conversions under any except the least demanding, single threaded, applications. Within PowerDNS, we have reverted to using inet_pton() whenever we can get away with it – which is almost always, except in the case of scoped addresses.

Over time, the most common C library implementations may get their act together, or they may not. But for now, to convert from representation format, inet_pton() it is.