FOSDEM 2018 DNS devroom CfP!

Hello DNS-enthusiasts and other developers,

After two successful BoF sessions at FOSDEM 2016 and 2017, FOSDEM 2018 will see a real DNS devroom! We hope to host talks anywhere from hardcore protocol stuff, to practical sessions for programmers that are not directly involved with DNS but may have to deal with DNS in their day to day coding or system administrators responsible for DNS infrastructure.

We have been allotted half a day on Sunday 4 February 2018. We expect to schedule 30 minutes per talk, including questions, but this is open to discussion.

If you have something you’d like to share with your fellow developers, please head to pentabarf at https://penta.fosdem.org/submission/FOSDEM18. Examples of topics are measuring, monitoring, DNS libraries, and anecdotes on how you’ve (ab)used the DNS.

The deadline for submission is December 8th. If you have a FOSDEM pentabarf account from a previous year, please use that account. Reach out to dns-devroom-manager@fosdem.org if you run into any trouble.

We are also looking for volunteers to help with cameras etc. Please drop us an email at dns-devroom-manager@fosdem.org if you’re interested in helping out.

See you there!

Cheers,
Peter van Dijk, Shane Kerr, Pieter Lexis

PowerDNS Recursor 4.1.0 Release Candidate 3 Available

PowerDNS Recursor 4.1.0 RC3 is here!

We’d like to thank everyone that has helped us test the previous Recursor release candidates.

The third Release Candidate adds support for Botan 2.x (and removes support for Botan 1.10!), has some important DNSSEC fixes, features a cleaned up web UI and has miscellaneous minor improvements.

Also thanks to Jan-Piet Mens for help on the documentation!

The full changelog looks like this:

Improvements

  • #5895: Add the DNSSEC validation state to the DNSQuestion Lua object (although the ability to update the validation state from these hooks is postponed to after 4.1.0).
  • #5498: Add support for Botan 2.x and remove support for Botan 1.10.
  • #5876: Print more details of trust anchors. In addition, the trace output that mentions if data from authoritative servers gets accepted now also prints the TTL and clarifies the ‘place’ number previously printed.
  • #5616: Better support for deleting entries in NetmaskTree and NetmaskGroup.

Bug Fixes

  • #5889: Prevent possible downgrade attacks in the recursor.
  • #5885: Split NODATA / NXDOMAIN NSEC wildcard denial proof of existence. Otherwise there is a very real risk that a NSEC will cover a more specific wildcard and we end up with what looks like a NXDOMAIN proof but is a NODATA one.
  • #5904: Fix incomplete validation of cached entries.
  • #5912: Fix going Insecure on NSEC3 hashes with too many iterations, since we could have gone Bogus on a positive answer synthetized from a wildcard if the corresponding NSEC3 had more iterations that we were willing to accept, while the correct result is Insecure.
  • #5877: Sort NS addresses by speed and remove old ones.
  • #5896: Purge nsSpeeds entries even if we get less than 2 new entries.
  • #5881: Add EDNS to truncated, servfail answers.
  • #5917: Use _exit() when we really really want to exit, for example after a fatal error. This stops us dying while we die. A call to exit() will trigger destructors, which may paradoxically stop the process from exiting, taking down only one thread, but harming the rest of the process.
  • #5930: In the recursor secpoll code, we assumed the TXT record would be the first record first record we received. Sometimes it was the RRSIG, leading to a silent error, and no secpoll check. Fixed the assumption, added an error.
  • #5938: Don’t crash when asked to run with zero threads.
  • #5939: Only accept types not matching the query if we asked for ANY. Even from forward-recurse servers.
  • #5937: Allow the use of a ‘self-resolving’ NS if cached A / AAAA exists. Before this, we could skip a perfectly valid NS for which we had retrieved the A and / or AAAA entries, for example via a glue.
  • #5961: Add the config-name argument to the definition of configname. There was a bug where the config-name parameter was not used to change the path of the config file. This meant that some commands via rec_control (e.g. reload-acls) would fail when run against a recursor which had config-name defined. The correct behaviour was present in some, but not all, definitions of configname. (@jake2184)

The tarball is available on downloads.powerdns.com (signature) and packages for CentOS 6 and 7, Debian Jessie and Stretch, Ubuntu Artful, Trusty, Xenial and Zesty are available from repo.powerdns.com.  (The Raspberry Pi packages will follow Monday morning.)

We invite you to test this release candidate and send us all feedback and issues you might have via the mailinglist, or in case of a bug, via GitHub.

Enjoy!

PowerDNS Authoritative Server 4.1.0 Release Candidate 3 Available

We present what should be our last release candidate for PowerDNS Authoritative Server 4.1.0: Release Candidate 3!

If no major issues are found we expect to release the final version within the next two weeks.

Thanks to everyone who tested the previous release candidates.

This release features various bug fixes, some improvements to pdnsutil, documentation improvements by Christian Hofstaedtler and logging message improvements by Job Snijders.

The Raspbian packages will follow Monday since the builder is still working on them.

The full changelog looks like this:

New Features

  • #5936: Make it possible to disable DNSSEC via the API, this is equivalent to doing pdnsutil disable-dnssec.
  • #5883: Add add-meta command to pdnsutil that can be used to append to existing metadata without clobbering it.

Improvements

  • #5616: Better support for deleting entries in NetmaskTree and NetmaskGroup.
  • #5935: Throw exception for metadata endpoint with wrong zone. Before, We would happily accept this POST.
  • #5879: Warn if records in a zone are occluded.

Bug Fixes

  • #5917: Use _exit() when we really really want to exit, for example after a fatal error. This stops us dying while we die. A call to exit() will trigger destructors, which may paradoxically stop the process from exiting, taking down only one thread, but harming the rest of the process.
  • #5884: Fix messages created by pdnsutil generate-tsig-key.
  • #5928: Add back missing output details to rectifyZone.
  • #5905: Use 302 redirects in the webserver for ringbuffer reset or resize. With the current 301 redirect it is only possible to reset or resize once. Every next duplicate action is replaced by the destination cached in the browser.

The tarball is available on downloads.powerdns.com (signature) and packages for CentOS 6 and 7, Debian Jessie and Stretch, Ubuntu Artful, Trusty, Xenial and Zesty are available from repo.powerdns.com. (The Raspbian packages will come later, possibly Monday, because they are still building.)

We invite you to test this release candidate and send us all feedback and issues you might have via the mailinglist, or in case of a bug, via GitHub.

PowerDNS Authoritative Server 4.1.0 Release Candidate 2 Available

Now that the Release Candidate 1 has simmered for a while, we present PowerDNS Authoritative Server Release Candidate 2!

Thanks to everyone who tested RC1.

This release has several performance improvements, stability and correctness fixes.

Of course, this release also has corrected typos, improvements to the documentation and a lot of minor improvements of issues discovered by Jan-Piet Mens.

Thanks for fixing issues not mentioned below go out to: Kees Monshouwer.

Additional documentation improvements by: Anhad Jai Singh and Christian Hofstaedtler.

The full changelog looks like this:

New Features

  • #5779: Rectify zones via the API. (Nils Wisiol)
    • Move the pdnsutil rectification code to the DNSSECKeeper
    • Generate DNSSEC keys for a zone when “dnssec” is true in an API POST/PATCH for zones
    • Rectify DNSSEC zones after POST/PATCH when API-RECTIFY metadata is 1
    • Allow setting this metadata via the “api-rectify” param in a Zone object
    • Show “nsec3param” and “nsec3narrow” in Zone API responses
    • Add an “rrsets” request parameter for a zone to skip sending RRSets in the response
    • Add rectify endpoint in the API
  • #5665: Add PKCS#11 support to packages on Operating Systems that support it.

Improvements

  • #5498: Add support for Botan 2.x and drop support for Botan 1.10 (the latter thanks to Kees Monshouwer).
  • #5810: Fix issues when b2b-migrating from the BIND backend to a database:
    • No masters were set in the target db (#5807)
    • Only the last master in the list of masters would be added to the target database
    • The BIND backend was not fully aware of native zones
  • #5584: Add support for new record types to the LDAP backend.
  • #5842: Add log-timestamp option. This option can be used to disable printing timestamps to stdout, this is useful when using systemd-journald or another supervisor that timestamps stdout by itself. As the logs will not have 2 timestamps.
  • #5838: Stop doing individual RRSIG queries during outbound AXFR. (Kees Monshouwer)

Bug Fixes

  • #5684: Improve trailing dot handling internally which lead to a segfault in pdnsutil before.
  • #5678: Treat requestor’s payload size lower than 512 as equal to 512. Before, we did not follow RFC 6891 section 6.2.3 correctly.
  • #5766: Correctly purge entries from the caches after a transfer. Since the QC/PC split up, we only removed entries for the AXFR’d domain from the packet cache, not the query cache. We also did not remove entries in case of IXFR.
  • #5791: When throwing because of bogus content in the tinydns database, report the offending name+type so the admin can find the offending record.
  • #5696: For zone PATCH requests, add new “X-PDNS-Old-Serial” and “X-PDNS-New-Serial” response headers with the zone serials before and after the changes.
  • #5704: Make default options singular and use defaults in Cryptokey API-endpoint
  • #5729: Remove printing of DS records from “pdnsutil export-zone-dnskey …”. This was not only inconsistent behaviour but also done incorrectly.
  • #5702: Make bindbackend startTransaction to return false when it has failed. (Aki Tuomi)
  • #5820: Log the needed size when a MySQL result was truncated.
  • #5710: Remove “” around secpoll result which fixes “pdns_control show security-status” not working.
  • #5722: Make the auth also publish CDS/CDNSKEY records for inactive keys, as this is needed to roll without double sigs.
  • #5734: Fix a crash when getting a public GOST key if the private one is not set.
  • #5815: Ignore SOA-EDIT for PRESIGNED zones.

The tarball is available on downloads.powerdns.com (signature) and packages for CentOS 6 and 7, Debian Jessie and Stretch, Ubuntu Trusty, Yakkety, Xenial and Zesty are available from repo.powerdns.com. (The Raspbian packages will come later, possibly Monday, because they are still building.)

We invite you to test this release candidate and send us all feedback and issues you might have via the mailinglist, or in case of a bug, via GitHub.

DNS performance metrics: the logarithmic percentile histogram

DNS performance is always a hot topic. No DNS-OARC, RIPE or IETF conference is complete without new presentations on DNS performance measurements.

Most of these benchmarks focus on denial-of-service resistance: what is the maximum query load that can be served, and this is indeed a metric that is good to know.

Less discussed however is performance under normal conditions. Every time a nameserver is slow, a user somewhere is waiting. And not only is a user waiting, some government agencies, notably the UK’s OFCOM, take a very strong interest in DNS latencies.  In addition, in contractual relations, there is frequently the desire to specify guaranteed performance levels.

So how well is a nameserver doing?

“There are three kinds of lies: lies, damned lies, and statistics.” – unknown

It is well known that when Bill Gates walks into a bar, on average everyone inside becomes a billionaire. The average alone is therefore not sufficient to characterize the wealth distribution in a bar.

A popular and frequently better statistic is the median. 50% of numbers will be below the median, 50% will be above.  So for our hypothetical bar, if most people in there made x  a year, this would also be the median (more or less). Now that Bill Gates is there, the median shifts up only a little. In many cases, the median is a great way to describe a distribution, but it is not perfect for DNS performance. The way DNS performance impacts user experience makes it useful to compare it to ambulance arrival times.

If on average an ambulance arrives within 10 minutes of being called, this is rather good. But if this is achieved by arriving within 1 minute 95% of the time, and after 200 minutes 5% of the time, it is pretty bad news for those one in twenty cases. In other words, being very late is a lot worse than being early is good. 

The median in this case is somewhat less than one minute, and the median therefore also does not  show that we let 5% of cases wait for more than three hours.

To do better, for the ambulance, a simple histogram works well:

ambulances2

This graph immediately makes it clear there is a problem, and that our ’10 minute average arrival time’ is misleading.

Although a late DNS answer is of course by far not as lethal as a late ambulance (unless you are doing the DNS for the ambulance dispatchers!), the analogy is apt. A 2 second late DNS response is absolutely useless.

Sadly, it turns out that making an arrival time graph of a typical recursive nameserver is not very informative:

fullhisto

From this, we can see that almost all traffic arrives in one bin, likely somewhere near 0.1 milliseconds, but otherwise it doesn’t teach us a lot.

A common enough trick is to use logarithmic scales, and this does indeed show far more detail:

logfull

From this, we can see quite some structure – it appears we have a bunch of answers coming in real quick, and also somewhat of a peak around 10 milliseconds.

But the question remains, how happy are our users? This is what we spent an outrageous amount of time on, inspired by a blog post we can no longer find. We proudly present:

The logarithmic percentile histogram

log-histo3

So what does this graph mean? On the x-axis are the “slowest percentiles”. So for example, at x=1 we find the 1% of answers that were slowest. On the y-axis we find the average latency of the answers in that “slowest 1%” bin: around 8 milliseconds for the KPN fiber in our office, and around 90 milliseconds for a PowerDNS installation in the Middle East.

As another example, the 0.01 percentile represents the “slowest 1/10,000” of queries, and we see that these get answered in around 1200 milliseconds – at the outer edge of being useful.

On the faster side, we see that on the KPN fiber installation, 99% of queries are answered within 0.4 milliseconds on average – enough to please any regulator! The PowerDNS user in the Middle East is faring a lot less well, taking around 60 milliseconds at that point.

Finally, we can spruce up the graph further with cumulative average log-full-avg

From this we see clearly that even though latencies go up for the slower percentiles, this has little impact on the average latency, ending up at 2.3 milliseconds for our KPN office fiber and 4.5 milliseconds for the Middle East installation.

So what can we do with these graphs?

Through a ton of measurements in various places, we have found the logarithmic percentile histogram to be incredibly robust. Over time, the shape of the graph barely moves, unless something really changes, for example by adding a dnsdist caching layer:

log-full

We can see that dnsdist speeds up both the fastest and slowest response times, but as could be expected does not make cache misses (in the middle) any faster. The reason the slowest response times are better is that the dnsdist caching layer frees up the PowerDNS Recursor to fully focus on problematic (slow) domains.

Another fun plot is the “worst case’ impact of DNSSEC, measured from a cold cache:

dnssec

As we can see from this graph, for the vast majority of cases, the impact of DNSSEC validation using the PowerDNS Recursor 4.1 is extremely limited. A rerun on a hot cache shows no difference in performance at all (which is so surprising we repeated the measurement at other deployments where we learned the same thing).

Monitoring/alerting based on logarithmic percentile histogram

As noted, the shape of these graphs is very robust. Temporary outliers barely show up for example. Only real changes in network or server conditions make the graph move. This makes these percentiles exceptionally suitable for monitoring. Setting limits on ‘1%’ and ‘0.1%’ slowest performance is both sensitive and specific: it detects all real problems, and everything it detects is a real problem.

How to get these graphs and numbers

In our development branchdnsreplay and dnsscope have gained a “–log-histogram” feature which will output data suitable for plotting. Helpfully, in the data output, a gnuplot script is included that will generate graphs as shown above. A useful output mode is svg which creates graphs suitable for embedding in web pages:

 

median

Note that this graph also plots the median response time, which here comes in at 21 microseconds.

Now that we have the code available to calculate these numbers, they might show up in the dnsdist webinterface, or in the metrics we generate. But for now, dnsreplay and dnsscope are where it is at.

Enjoy!

PowerDNS Recursor 4.1.0 Release Candidate 2 Available

Hot on the heels of RC1, PowerDNS Recursor 4.1.0 RC2 is here!

We’d like to thank everyone that has helped us test Recursor RC1.

The second Release Candidate contains several correctness fixes for DNSSEC, mostly in the area of verifying negative responses.

Also thanks Christian Hofstaedtler for help on the documentation!

The full changelog looks like this:

Bug Fixes

  • #5808: Check that the NSEC covers an empty non-terminal when looking for NODATA.
  • #5835: Disable validation for infrastructure queries (e.g. when recursing for a name). Also validate entries from the Negative cache if they were not validated before.
  • #5868: Fix DNSSEC validation for denial of wildcards in negative answers and denial of existence proofs in wildcard-expanded positive responses.
  • #5873: Fix DNSSEC validation when using -flto.
  • #5740: Lowercase all outgoing qnames when lowercase-outgoing is set.
  • #5762: Create socket-dir from the init-script.
  • #5803: Fix crashes with uncaught exceptions in MThreads.

Improvements

  • #5834: Don’t directly store NSEC3 records in the positive cache.
  • #5805: Improve logging for the built-in webserver and the Carbon sender.
  • #5824: New b.root ipv4 address (Kees Monshouwer).
  • #5774: Add experimental metrics that track the time spent inside PowerDNS per query. These metrics ignore time spent waiting for the network.
  • #5842: Add log-timestamp setting. This option can be used to disable printing timestamps to stdout, this is useful when using systemd-journald or another supervisor that timestamps output by itself.

The tarball is available on downloads.powerdns.com (signature) and packages for CentOS 6 and 7, Debian Jessie and Stretch, Ubuntu Trusty, Yakkety, Xenial and Zesty are available from repo.powerdns.com.  (The Raspberry Pi packages will follow tomorrow morning.)

We invite you to test this release candidate and send us all feedback and issues you might have via the mailinglist, or in case of a bug, via GitHub.

Enjoy!

PowerDNS Recursor 4.1.0 Release Candidate 1 Available

PowerDNS Recursor 4.1.0 RC1 is here!

The RC1 release features many fixes to the DNSSEC validation code, reported by different users. Other improvements include: logging, RPZ and the Remote Logger.

While not specifically mentioned in the ChangeLog, also thanks to Winfried Angele for bringing a documentation issue to our attention!

The full changelog looks like this:

Bug Fixes

  • #5569: Don’t fetch the DNSKEY of a zone to validate the DS of the same zone.
  • #5614: Improve DNSSEC debug logging,
  • #5672: Add NSEC records on nx-trust cache hits.
  • #5671: Handle NSEC wrap-around.
  • #5670: Fix erroneous check for section 4.1 of rfc6840.
  • #5715: Handle direct NSEC queries.
  • #5716: Detect zone cuts by asking for DS instead of NS.
  • #5738: Do not allow direct queries for RRSIG or NSEC3.
  • #5771: The target zone being insecure doesn’t mean that the denial of the DS is too, if the parent zone is Secure.
  • #5530: Add a missing header for PRId64 in the negative cache, required on EL5/EL6.
  • #5549: Prevent an infinite loop if we need auth and the best match is not.
  • #5570: Be more careful about the validation of negative answers.
  • #5599: Fix libatomic detection on ppc64. (Sander Hoentjen)
  • #5615: Fix sortlist in the presence of CNAME. (Benoit Perroud thanks for reporting this issue!)
  • #5515: Fix cache handling of ECS queries with a source length of 0.
  • #5328: Handle SNMP alarms so we can reconnect to the master.
  • #5662: Fix Recursor 4.1.0 alpha 1 compilation on FreeBSD. (@RvdE)
  • #5739: Remove pdns.PASS and pdns.TRUNCATE.
  • #5734: Fix a crash when getting a public GOST key if the private one is not set.
  • #5773: Don’t negcache entries for longer than their RRSIG validity.
  • #5792: Gracefully handle Socket::accept() returning a null pointer on EAGAIN.

Improvements

  • #5756: Improve –quiet=false output to include DNSSEC and more timing details.
  • #5733: Add DNSSEC test vectors for RSA, ECDSA, ed25519 and GOST.
  • #5543: Wrap the webserver’s and Resolver::tryGetSOASerial objects into smart pointers (also thanks to Christian Hofstaedtler for reviewing!)
  • #5545: Add more unit tests for the NetmaskTree and ECS cache index.
  • #5588: Switch the default webserver’s ACL to 127.0.0.1, ::1.
  • #5598: Add help text on autodetecting systemd support. (Ruben Kerkhof thanks for reporting!)
  • #5622: Add log-rpz-changes to log RPZ additions and removals.
  • #5621: Log the policy type (QName, Client IP, NS IP…) over protobuf.
  • #5637: Remove unused SortList compare operator for ComboAddress.
  • #5620: Add support for dumping the in-memory RPZ zones to a file.
  • #5646: Support for identifying devices by id such as mac address.
  • #5699: Implement dynamic cache sizeing.
  • #5755: Improve dnsbulktest experience in Travis for more robustness.
  • #5772: Set TC=1 if we had to omit part of the AUTHORITY section.
  • #5764: autoconf: set –enable-libsodium to auto.

The tarball is available on downloads.powerdns.com (signature) and packages for CentOS 6 and 7, Debian Jessie and Stretch, Ubuntu Trusty, Yakkety, Xenial and Zesty are available from repo.powerdns.com.

We invite you to test this release candidate and send us all feedback and issues you might have via the mailinglist, or in case of a bug, via GitHub.