Category: powerdns

Recursor 3.6.0 released

This is a performance, feature and bugfix update to 3.5/3.5.3. It contains important fixes for slightly broken domain names, which your users expect to work anyhow. It also brings robust resilience against certain classes of attacks.

Changes between RC1 and release:

 

New features:

  • commit aadceba: Implement minimum-ttl-override config setting, plus runtime configurability via ‘rec_control set-minimum-ttl’.
  • Lots of work on the JSON API, which is exposed via Aki Tuomi’s ‘yahttp’. Massive thanks to Christian Hofstaedtler for delivering this exciting new functionality. Documentation & demo forthcoming, but code to use it is available on GitHub.
  • Lua modules can now use ‘pdnslog(INFO..’), as described in ticket 1074, implemented in commit 674a305
  • Adopt any-to-tcp feature to the recursor. Based on a patch by Winfried Angele. Closes ticket 836commit 56b4d21 and commit e661a20.
  • commit 2c78bd5: implement built-in statistics dumper using the ‘carbon’ protocol, which is also understood by metronome (our mini-graphite). Use ‘carbon-server’, ‘carbon-ourname’ and ‘carbon-interval’ settings.
  • New setting ‘udp-truncation-threshold’ to configure from how many bytes we should truncate. commit a09a8ce.
  • Proper support for CHaos class for CHAOS TXT queries. commit c86e1f2, addition for lua in commit f94c53d, some warnings in commit 438db54 however.
  • Added support for Lua scripts to drop queries w/o further processing. commit 0478c54.
  • Kevin Holly added qtype statistics to recursor and rec_control (get-qtypelist) (commit 79332bf)
  • Add support for include-files in configuration, also reload ACLs and zones defined in them (commit 829849dcommit 242b90ecommit 302df81).
  • Paulo Anes contributed server-down-max-fails which helps combat Recursive DNS based amplification attacks. Described in this post. Also comes with new metric ‘failed-host-entries’ in commit 406f46f.
  • commit 21e7976: Implement “followCNAMERecords” feature in the Lua hooks.

Improvements:

  • commit 06ea901: make pdns-distributes-queries use a hash so related queries get sent to the same thread. Original idea by Winfried Angele. Astoundingly effective, approximately halves CPU usage!
  • commit b13e737: –help now writes to stdout instead of stderr. Thanks Winfried Angele.
  • To aid in limiting DoS attacks, when truncating a response, we actually truncate all the way so only the question remains. Suggested inticket 1092, code in commit add935a.
  • No longer experimental, the switch ‘pdns-distributes-queries’ can improve multi-threaded performance on Linux (various cleanup commits).
  • Update to embedded PolarSSL, plus remove previous AES implementation and shift to PolarSSL (commit e22d9b4commit 990ad9a)
  • commit 92c0733 moves various Lua magic constants into an enum namespace.
  • set group and supplementary groups before chroot (commit 6ee50ceticket 1198).
  • commit 4e9a20e: raise our socket buffer setting so it no longer generates a warning about lowering it.
  • commit 4e9a20e: warn about Linux suboptimal IPv6 settings if we detect them.
  • SIGUSR2 turns on a ‘trace’ of all DNS traffic, a second SIGUSR2 now turns it off again. commit 4f217ce.
  • Various fixes for Lua 5.2.
  • commit 81859ba: No longer attempt to answer questions coming in from port 0, reply would not reach them anyhow. Thanks to Niels Bakker and ‘sid3windr’ for insight & debugging. Closes ticket 844.
  • commit b1a2d6c: now, I’m not one to get OCD over things, but that log message about stats based on 1801 seconds got to me. 1800 now.

Fixes:

  • 0c9de4fc: stay away from getaddrinfo unless we really can’t help it for ascii ipv6 conversions to binary
  • commit 08f3f63: fix average latency calculation, closing ticket 424.
  • commit 75ba907: Some of our counters were still 32 bits, now 64.
  • commit 2f22827: Fix statistics and stability when running with pdns-distributes-queries.
  • commit 6196f90: avoid merging old and new additional data, fixes an issue caused by weird (but probably legal) Akamai behaviour
  • commit 3a8a4d6: make sure we don’t exceed the number of available filedescriptors for mthreads. Raises performance in case of DoS. See this post for further details.
  • commit 7313fe6: implement indexed packet cache wiping for recursor, orders of magnitude faster. Important when reloading all zones, which causes massive cache cleaning.
  • rec_control get-all would include ‘cache-bytes’ and ‘packetcache-bytes’, which were expensive operations, too expensive for frequent polling. Removed in commit 8e42d27.
  • All old workarounds for supporting Windows of the XP era have been removed.
  • Fix issues on S390X based systems which have unsigned characters (commit 916a0fd)

Recursor 3.6.0 Release Candidate 1

This is a performance, feature and bugfix update to 3.5/3.5.3. It contains important fixes for slightly broken domain names, which your users expect to work anyhow. It also brings robust resilience against certain classes of attacks.

New features:

  • commit aadceba: Implement minimum-ttl-override config setting, plus runtime configurability via ‘rec_control set-minimum-ttl’.
  • Lots of work on the JSON API, which is exposed via Aki Tuomi’s ‘yahttp’. Massive thanks to Christian Hofstaedtler for delivering this exciting new functionality. Documentation & demo forthcoming, but code to use it is available on GitHub.
  • Lua modules can now use ‘pdnslog(INFO..’), as described in ticket 1074, implemented in commit 674a305
  • Adopt any-to-tcp feature to the recursor. Based on a patch by Winfried Angele. Closes ticket 836commit 56b4d21 and commit e661a20.
  • commit 2c78bd5: implement built-in statistics dumper using the ‘carbon’ protocol, which is also understood by metronome (our mini-graphite). Use ‘carbon-server’, ‘carbon-ourname’ and ‘carbon-interval’ settings.
  • New setting ‘udp-truncation-threshold’ to configure from how many bytes we should truncate. commit a09a8ce.
  • Proper support for CHaos class for CHAOS TXT queries. commit c86e1f2, addition for lua in commit f94c53d, some warnings in commit 438db54 however.
  • Added support for Lua scripts to drop queries w/o further processing. commit 0478c54.
  • Kevin Holly added qtype statistics to recursor and rec_control (get-qtypelist) (commit 79332bf)
  • Add support for include-files in configuration, also reload ACLs and zones defined in them (commit 829849dcommit 242b90ecommit 302df81).
  • Paulo Anes contributed server-down-max-fails which helps combat Recursive DNS based amplification attacks. Described in this post. Also comes with new metric ‘failed-host-entries’ in commit 406f46f.
  • commit 21e7976: Implement “followCNAMERecords” feature in the Lua hooks.

Improvements:

  • commit 06ea901: make pdns-distributes-queries use a hash so related queries get sent to the same thread. Original idea by Winfried Angele. Astoundingly effective, approximately halves CPU usage!
  • commit b13e737: –help now writes to stdout instead of stderr. Thanks Winfried Angele.
  • To aid in limiting DoS attacks, when truncating a response, we actually truncate all the way so only the question remains. Suggested inticket 1092, code in commit add935a.
  • No longer experimental, the switch ‘pdns-distributes-queries’ can improve multi-threaded performance on Linux (various cleanup commits).
  • Update to embedded PolarSSL, plus remove previous AES implementation and shift to PolarSSL (commit e22d9b4commit 990ad9a)
  • commit 92c0733 moves various Lua magic constants into an enum namespace.
  • set group and supplementary groups before chroot (commit 6ee50ceticket 1198).
  • commit 4e9a20e: raise our socket buffer setting so it no longer generates a warning about lowering it.
  • commit 4e9a20e: warn about Linux suboptimal IPv6 settings if we detect them.
  • SIGUSR2 turns on a ‘trace’ of all DNS traffic, a second SIGUSR2 now turns it off again. commit 4f217ce.
  • Various fixes for Lua 5.2.
  • commit 81859ba: No longer attempt to answer questions coming in from port 0, reply would not reach them anyhow. Thanks to Niels Bakker and ‘sid3windr’ for insight & debugging. Closes ticket 844.
  • commit b1a2d6c: now, I’m not one to get OCD over things, but that log message about stats based on 1801 seconds got to me. 1800 now.

Fixes:

  • 0c9de4fc: stay away from getaddrinfo unless we really can’t help it for ascii ipv6 conversions to binary
  • commit 08f3f63: fix average latency calculation, closing ticket 424.
  • commit 75ba907: Some of our counters were still 32 bits, now 64.
  • commit 2f22827: Fix statistics and stability when running with pdns-distributes-queries.
  • commit 6196f90: avoid merging old and new additional data, fixes an issue caused by weird (but probably legal) Akamai behaviour
  • commit 3a8a4d6: make sure we don’t exceed the number of available filedescriptors for mthreads. Raises performance in case of DoS. See this post for further details.
  • commit 7313fe6: implement indexed packet cache wiping for recursor, orders of magnitude faster. Important when reloading all zones, which causes massive cache cleaning.
  • rec_control get-all would include ‘cache-bytes’ and ‘packetcache-bytes’, which were expensive operations, too expensive for frequent polling. Removed in commit 8e42d27.
  • All old workarounds for supporting Windows of the XP era have been removed.
  • Fix issues on S390X based systems which have unsigned characters (commit 916a0fd)

Authoritative Server Database Schema Changes

A heads-up for everybody running (git) snapshots of the PowerDNS Authoritative Server: earlier today we merged Pull #1327 from Kees Monshouwer.

This merge deprecates the ‘old’ (pre-DNSSEC) schema for the gsql backends. In other words, from this point on, the new schema is mandatory. This simplifies documentation and debugging, and also makes development easier on our end.

Please read the upgrade notes carefully to see what changes need to be applied to your database!

Related to recent DoS attacks: Recursor configuration file guidance

Hi everybody,

Over the past week we’ve been contacted by a few users reporting their PowerDNS Recursor became unresponsive under a moderate denial of service attack, one which PowerDNS should be expected to weather without issues.

In the course of investigating this issue, we’ve found that many PowerDNS installations on Linux are configured to consume (far) more filedescriptors than are actually available, wasting resources, potentially leading to unresponsiveness.

To check if this is the case for you, multiply the ‘max-mthreads’ setting by the ‘threads’ setting. Default values are 2048 and 2, leading to a theoretical FD consumption of 4096. Many Linux distributions default to 1024. So, our defaults exceed the Linux defaults by a large margin!

(FreeBSD defaults are far higher, and should not pose an issue).

To fix, there are four options:

  1. Reduce max-mthreads to 512 (or threads to 1 and max-mthreads to 1024) (max-mthreads was introduced in Recursor 3.2; but if you are running a version that old, please upgrade it!)
  2. Run ‘ulimit -n 32768’ before starting (perhaps put this in /etc/init.d/ script). There’s little reason to skip on this number.
  3. Investigate defaults in /etc/security/limits.conf
  4. Apply the patch in https://github.com/Habbie/pdns/commit/e24b124a4c7b49f38ff8bcf6926cd69077d16ad8 (this patch is in our 3.6.0 release. We recommend just upgrading!)

The patch automates 1 and 2, either raising the limit if possible, or  reducing max-mthreads until “it fits”.

Thank you for your attention, and if you have results to report to us on previous or current DoS attacks, please contact us privately.

PowerDNS Authoritative Server version 3.3.1

[Warning] Warning
Version 3.3.1 of the PowerDNS Authoritative Server is a major upgrade if you are coming from 2.9.x. There are also some important changes if you are coming from 3.0, 3.1 or 3.2. Please refer to Section 1, “From PowerDNS Authoritative Server 2.9.x to 3.0”Section 2, “From PowerDNS Authoritative Server 3.0 to 3.1”Section 3, “From PowerDNS Authoritative Server 3.1 to 3.2”Section 4, “From PowerDNS Authoritative Server 3.2 to 3.3” and Section 5, “From PowerDNS Authoritative Server 3.3 to 3.3.1” for important information on correct and stable operation, as well as notes on performance and memory use.
[Note] Note
Released December 17th, 2013

Downloads:

 

This is a bugfix update to 3.3.

Changes since 3.3:

On Ragel and char types

In which I spend a lot of time proving a platform difference is not actually a platform difference, and eventually end up proven wrong.

PowerDNS make check/testrunner output on Debian 7.0/s390x:

test-dnsrecords_cc.cc(199): error in "test_record_types": Failed to verify TXT: Unable to parse DNS TXT '"ÅLAND ISLANDS"'

We don’t observe this failure on other systems, suggesting this is an s390x-specific issue. I find this very unlikely, so I’m trying to figure out what environmental difference is causing this. So far, I have come up short (but keep reading).

There are clues, of course. The output of ragel dnslabeltext.rl is different on s390x. Diffing the output directly did not yield anything useful for me, but Ragel has a graphviz output mode (-V), and diffing a (sorted) version of that output does help:

$ diff -u <(sort amd64.dot) <(sort s390x.dot)
--- /dev/fd/63     2013-10-24 18:19:41.000000000 +0200
+++ /dev/fd/62     2013-10-24 18:19:41.000000000 +0200
@@ -1,12 +1,12 @@
      1 -> 2 [ label = "34 / segmentBegin" ];
-     2 -> 2 [ label = "DEF / reportPlain" ];
+     2 -> 2 [ label = "-128..33, 35..91, 93..127 / reportPlain" ];
      2 -> 3 [ label = "92" ];
      2 -> 7 [ label = "34" ];
      3 -> 2 [ label = "DEF / reportEscaped" ];
      3 -> 4 [ label = "48..57 / reportEscapedNumber" ];
      4 -> 5 [ label = "48..57 / reportEscapedNumber" ];
      5 -> 6 [ label = "48..57 / reportEscapedNumber" ];
-     6 -> 2 [ label = "DEF / doneEscapedNumber, reportPlain" ];
+     6 -> 2 [ label = "-128..33, 35..91, 93..127 / doneEscapedNumber, reportPlain" ];
      6 -> 3 [ label = "92 / doneEscapedNumber" ];
      6 -> 7 [ label = "34 / doneEscapedNumber" ];
      7 -> 2 [ label = "34 / segmentEnd, segmentBegin" ];

In short, on s390x, instead of the DEFault transition, a limited set is used.

The relevant bits of Ragel input:

escaped = '\\' (([^0-9]@reportEscaped) | ([0-9]{3}$reportEscapedNumber%doneEscapedNumber));
plain = ((extend-'\\'-'"')|'\n'|'\t') $ reportPlain;
txtElement = escaped | plain;

main := (('"' txtElement* '"' space?) >segmentBegin %segmentEnd)+;

The whole graphviz plot is too wide to include here, so I’ll show some snippets with explanation:

ragel-1

Parsing starts in state 1. From there, we demand 34 (the ASCII code for a double quote “) to go into state 2.

ragel-2

In state 2, there are a few options:

  1. We see 92 (a backslash), and we go into state 3 to handle an escaped character.
  2. We see something else. We pick the character up and drop right back into state 2.
  3. We get (unescaped!) 34 (“) and go into state 7 (EOF / segmentEnd) (not shown).

States 3-6 deal with escaped characters; there does not appear to be an issue there.

Where the pictures say DEF / reportPlain, on s390x instead we get a set of ranges (-128..33, 35..91, 93..127), which comes down to `everything from -128 to 127 inclusive, except for 34 and 92′. At first glance, this seems equivalent to the amd64 version, which says ’34 and 92 are special, the rest is not’. But obviously this is not true — the parser is rejecting our non-ASCII input on s390x.

From the Ragel docs:

extend: Ascii extended characters. This is the range -128..127 for signed alphabets and the range 0..255 for unsigned alphabets.

In other words, extend should cover the full 8-bit range. Then why is Ragel on s390x refusing to cooperate? Could the signed/unsigned difference be a clue?

No: as a test, I added -'x' to the definition of plain (on amd64), thus changing the parser such that no longer all of ASCII is covered:

plain = ((extend-'\\'-'"'-'x')|'\n'|'\t') $ reportPlain;

If I do this, DEF turns into -128..33, 35..91, 93..119, 121..127 — similar to the s390x version, with one more character (the ASCII code for x is 120) excluded, of course. With this change, the tests still pass (except for one that actually contains an x).

This means that on both platforms, Ragel considers our character set to be signed. Yet, on s390x it is rejecting our non-ASCII characters (that would, as far as I can see, be covered under -128..0).

In a test on amd64, I can see that the two bytes in Å (UTF-8), 0xc3 0x85, are mapped as -61 and -123 respectively, matching our understanding that Ragel considers our characters to be signed.

HOWEVER, if I stick the same character/bytes into dnslabeltext.rl on s390x, I see that they are mapped to 133 and 195. This means that although Ragel is generating a parser for signed characters on both platforms, it is considering the input to be unsigned on s390x.

Debian has kindly provided a table of how various types operate on different architectures. Indeed, this shows that char is unsigned on s390x but signed on amd64.

While trying to find more information on the signed/unsigned distinction in Ragel, I ran into the doc section about alphtype, which allows one to specify the type of the parsed alphabet. Setting it to signed is not allowed (!), suggesting that Ragel was never meant to cope with signed char alphabets. Setting it to unsigned on s390x fixed my issue, without breaking anything on amd64.

Workaround now in our source tree.

For me, this closes the case. I emailed ragel-users but no definitive reply has come in.

Recursor 3.5.3 released

This is a bugfix and performance update to 3.5.2. It brings serious performance improvements for dual stack users.

Changes since 3.5.2:

  • 3.5 replaced our ANY query with A+AAAA for users with IPv6 enabled. Extensive measurements by Darren Gamble showed that this change had a non-trivial performance impact. We now do the ANY query like before, but fall back to the individual A+AAAA queries when necessary. Change in commit 1147a8b.
  • The IPv6 address for d.root-servers.net was added in commit 66cf384, thanks Ralf van der Enden.
  • We now drop packets with a non-zero opcode (i.e. special packets like DNS UPDATE) earlier on. If the experimental pdns-distributes-queries flag is enabled, this fix avoids a crash. Normal setups were never susceptible to this crash. Code in commit 35bc40d, closes ticket 945.
  • TXT handling was somewhat improved in commit 4b57460, closing ticket 795.