Category: security

Enabling continuous fuzzing of PowerDNS products

We are very happy to announce that PowerDNS recently joined the OSS-Fuzz initiative, enabling continuous fuzzing for critical parts of our products. We are taking this opportunity to share some of our existing security processes, because we are proud of the progress we have made.

For quite some time now, we have been working on making our products safer by identifying and fixing existing security weaknesses, and deploying processes to prevent new ones from being introduced while reducing the impact should one slip through the cracks.

These efforts have led to the discovery of quite a few security issues in the past years, some of them present in the code base for several years. Nowadays, we are finding these issues because we are actively looking for them, using manual review, automated code analysis tools and fuzzing. But more on that later.

Very often, the line between a bug and a possible security issue is very thin, and vendors have been known to try to downplay the possible impact of an issue whenever possible. At PowerDNS, we choose instead to be forthcoming and open about the possible impact of the bugs that are discovered in our products. For one, there is no point in trying to downplay or obfuscate the real impact, as anyone can read the source code of our open-source products and decide that for themselves. But most of all, we believe our users should be as informed as possible regarding the severity of the issues fixed in a release, allowing them to decide for themselves how quickly they need to upgrade.

Now, let’s review the processes we have in place to prevent and find issues in our code base.

Secure development practices

In 4.0, we switched to a new version of the C++ language, C++11, which introduced several new features that allow us to avoid common pitfalls often associated with C and C++: for example, when they are properly used, smart pointers kill whole classes of bugs and security issues, including memory leaks and use-after-free.

At the same time, we also regularly stop using features that have proven brittle. For example, we recently removed our last usage of variable-length arrays (VLA), because we noticed it was very easy to shoot yourself in the foot and smash the stack by not properly checking the maximum size of such an array. We now rely on a specific compiler flag to prevent us from ever introducing them back by mistake, -Werror=vla.

Peer review and continuous testing

Every change to the code base goes through the same process:

  • a pull request is opened ;
  • our continuous testing tools make sure that it builds properly, and passes both our unit and functional tests ;
  • the change is reviewed by at least one person from our team ;
  • finally, it is merged into the appropriate branch.

To increase the likelihood of catching bugs during the continuous testing phase, our products are built with various sanitizing options enabled whenever applicable:

  • Address Sanitizer: catching out-of-bounds accesses, use-after-free, double-free, invalid free and memory leaks ;
  • Undefined Behavior Sanitizer: catching the use of misaligned or null pointers, signed integer overflow, and conversion to, from, or between floating-point types which would overflow the destination.

We also use several static analysis tools to detect mistakes that eluded both the original author and the reviewer:

Independent audit of the code base

Last year, we contracted a third-party security company, Nixu, to perform an independent audit of our code base, mainly focused on dnsdist and the Recursor but also covering a large part of the Authoritative server. Their assessment included a manual code review, supported by threat analysis using the STRIDE method, and the use of various static analysis tools, fuzzing and manual testing with ad-hoc tools built for that project.
A total of 11 findings were reported to us. A few of them could have allowed an authenticated user to escalate privileges, using defects in our API and web management interface. The remaining issues were denial of service, most of them in the DNSSEC handling code in the Recursor.
All of these issues have been fixed shortly after the report was handed to us, and we published the relevant security advisories about them.

Bug Bounty Program

We believe that working with skilled security researchers across the globe is crucial in identifying weaknesses in any technology. Therefore, in addition to our website advertising clear and simple ways of securely report any concern to us, we have also been running a public bug bounty program on the Hacker One platform for more than two years now: https://hackerone.com/powerdns

This program covers all our open-source products and every type of vulnerability, allowing everyone finding a security issue in our products to do the right thing by reporting it to us, while still getting financially rewarded.

Hardening measures

To further restrict the impact of a security issue in one of our products, we enable several hardening measures by default.
Some of these try to detect pathological behavior as early as possible, make it very hard to exploit it and almost always turn it into a crash, which is mitigated by supervisors like systemd:

  • buffer overflow, format string detection and prevention (stack-protector, FORTIFY_SOURCE=2) ;
  • full address space layout randomization via Position-Independent Executable (PIE) ;
  • fully read-only relocations (-Wl,-z,relro,-z,now).

Other measures restrict the privileges of the process, thus limiting what an attacker could do even if it could take full control of the process-execution path:

  • filesystem protection (chroot; systemd’s PrivateTmp, PrivateDevices, ProtectSystem and ProtectHome) ;
  • privileges dropping (uid/gid switching; systemd’s CapabilityBoundingSet, NoNewPrivileges and RestrictAddressFamilies).

Fuzzing

We used to run fuzzing sessions manually using AFL and Address Sanitizer after every important change to one of our parsers, since they deal with user-provided input and are therefore the more exposed part of our code. While we discovered a few issues that way, catching them before they could make it into a release, we were not very thrilled with this method, because it required human interaction.

That’s why we are so happy to have joined the OSS-fuzz project, enabling continuous fuzzing of our parsers. Continuous fuzzing means that every change to the code will get automatically tested and that, should an issue be found, a ticket will be automatically generated and we will be notified.
The OSS-fuzz infrastructure will not only run our targets with AFL as we used to do manually, it will also use the other well-known fuzzing engine, libFuzzer. Both of them will be run with Address and Undefined Behavior sanitizers enabled, catching issues that could otherwise be completely undetected.

Security issue handling

Now that we have explained how we prevent, catch and mitigate security issues, a few words about the way we handle them when they occur in a supported version.

As soon as a security concern has been reported to us, be it via our website, in a private e-mail or our program on HackerOne, we immediately acknowledge the reception of the message and start working on reproducing and assessing the severity of the issue. We then begin working on a patch for all supported versions, and communicate a timeline and a draft of the security advisory to the reporter when applicable.

Every security issue is then communicated to our customers in advance, at least a week before the public release, along with patched packages. We also notify various vendors via the distros security mailing-list. After the release, every security issue is the subject of a public security announcement, and these are publicly available on our website:

dnsdist 1.3.3 released

We are very happy to announce the 1.3.3 release of dnsdist. This release contains a few new features, but is mostly fixing a security issue reported since the release of dnsdist 1.3.2.

Security fix

While working on a new feature, Richard Gibson noticed that it was possible for a remote attacker to craft a DNS query with trailing data such that the addition of a record by dnsdist, for example an OPT record when adding EDNS Client Subnet, might result in the trailing data being smuggled to the backend as a valid record while not seen by dnsdist. This is an issue when dnsdist is deployed as a DNS Firewall and used to filter some records that should not be received by the backend. This issue occurs only when either the ‘useClientSubnet’ or the experimental ‘addXPF’ parameters are used when declaring a new backend.

While dnsdist has not had any important security issue until now, we decided this was a good time to implement the same security polling mechanism that the authoritative server and the recursor have had for years. Starting with this release, dnsdist will regularly perform a security check using a DNS query to determine whether the current version is up-to-date security-wise, and let the administrator know otherwise.

Important changes

It is sometimes very useful to be able to generate answers directly from dnsdist, to quickly return a “No such domain” answer, spoof an “A” or “AAAA” answer, or even just reply with the TC bit set so that legitimate clients retry over TCP. Until now, answers generated that way were mirroring the flags and EDNS options, if any, of the initial query. This was not great because it could mislead the client into thinking that dnsdist, or the server behind it, was supporting features or a UDP payload size it did not.

Starting with this release, dnsdist is now generating a proper EDNS payload if the query had one, and responding without EDNS otherwise. This behavior can be turned off using the new setAddEDNSToSelfGeneratedResponses() directive if needed.

We must, however, provide a responder’s maximum payload size in this record, and we can’t easily know the maximum payload size of the actual backend so we picked a safe default value of 1500, which can be overridden using the new  setPayloadSizeOnSelfGeneratedAnswers() directive.

New features and improvements

A new load-balancing policy named “chashed” has been introduced, based on consistent hashing. This new policy load-balances the incoming queries based on a hash of the requested name, like the existing “whashed” one, but has the interesting property that adding or removing a server will only cause a very small portion of the incoming queries to be mapped to a different server than they were before, keeping the caches warm.

While we have been supporting the export of metrics using the well-known carbon protocol from day one, we have seen an increasing demand for supporting the emerging Prometheus protocol. Thanks to the work of Pavel Odintsov and Kai S, dnsdist now supports it natively.

Very large installations of the DNS over TLS feature introduced in 1.3.0 reported several issues that we addressed in this release:

  • dnsdist did not set TCP_NODELAY on its TLS sockets, causing needless latency ;
  • it was not possible to configure the number of stored TLS sessions ;
  • our OpenSSL implementation had a memory leak when some clients aborted prematurely because of a negotiation error during the TLS handshake.

We seized the opportunity to refactor the part of the code handling TLS connections with the use of smart pointers while fixing that last issue, making sure that this kind of memory leak will not happen again.

In 1.3.2, the optimized DynblockRulesGroup introduced in 1.3.0 gained the ability to whitelist and blacklist ranges from dynamic rules, for example to prevent some clients from ever being blocked by a rate-limiting rule. This feature has now been made available when our in-kernel eBPF filtering feature is used as well. At the same time, we introduced the ability to set up warning rates to the dynamic rules, making it possible to get an alert without blocking clients when they reach a configured rate, and to block them should they reach a higher rate.

Finally, we introduced several new rules to our existing set:

  • EDNSOptionRule, to be able to filter based on the presence of a given EDNS option ;
  • DSTPortRule, offering the ability to route queries by looking at their destination port ;
  • PoolAvailableRule, to be able to route queries based on whether a pool has at least one usable backend.

Please see the dnsdist website for the more complete changelog and the current documentation.

Release tarballs are available on the downloads website.

Several packages are also available on our repository. Please be aware that we have enabled a few additional features in our packages, like DNS over TLS and DNSTap support, on distributions where the required dependencies were available.

dnsdist 1.2.0 released

We are very pleased to announce the availability of dnsdist 1.2.0, bringing a lot of new features and fixes since 1.1.0.

This release also addresses two security issues of low severity, CVE-2016-7069 and CVE-2017-7557. The first issue can lead to a denial of service on 32-bit if a backend sends crafted answers, and the second to an alteration of dnsdist’s ACL if the API is enabled, writable and an authenticated user is tricked into visiting a crafted website. More information can be found in our security advisories 2017-01 and 2017-02.

Highlights include:

  • applying rules on cache hits
  • addition of runtime changeable rules that matches IP address for a certain time: TimedIPSetRule
  • SNMP support, exporting statistics and sending traps
  • preventing the packet cache from ageing responses when deployed in front of authoritative servers
  • TTL alteration capabilities
  • consistent hash results over multiple deployments
  • exporting CNAME records over protobuf
  • tuning the size of the ringbuffers used to keep track of recent queries and responses
  • various DNSCrypt-related fixes and improvements, including automatic key rotation

Users upgrading from a previous version should be aware that:

  •  the truncateTC option is now off by default, to follow the principle of least astonishment
  • the signature of the addLocal() and setLocal() functions has been changed, to make it easier to add new parameters without breaking existing configurations
  • the packet cache does not cache answers without any TTL anymore, to prevent them from being cached forever
  • blockfilter has been removed, since it was completely redundant

This release also deprecates a number of functions, which will be removed in 1.3.0. Those functions had the drawback of making dnsdist’s configuration less consistent by hiding the fact that each rule is composed of a selector and an action. They are still supported in 1.2.0 but a warning is displayed whenever they are used, and a replacement suggested.

For the many other new features, improvements and bug fixes, please see the dnsdist website for the more complete changelog, the current documentation, and the upgrade guide.

Release tarballs are available on the downloads website.

Several packages are also available on our repository.

Diverting recursor-to-auth attacks

We get frequent reports from users/customers about various DNS-related attacks they are facing on either their authoritatives or recursors. This post focuses on one kind of attack that involves both. (Cloudmark wrote about this some time ago as well).

The attack works like this: given a target domain of example.com, the attacker takes his botnet, and has it fire off high amounts of $RANDOM.example.com queries. These queries go from the infected hosts to their recursors (i.e. normal ISP recursors). The recursors then need to go out to the auths, after all they don’t have any random string in cache.

When this attack starts, there is no packet count amplification – bot sends query to recursor, recursor sends to auth, answer flows back, done. However, if enough of this happens, one or more of the auths for example.com may get overloaded, and start responding more slowly, or not respond at all. The recursors will then either retry or move on to other auths, spreading the attack in the most effective and destructive way over the auths.

These attacks are painful, especially for authoritatives backed by (SQL) databases, like many PowerDNS users are running. Legitimate traffic for existing names gets cached very well inside pdns_server, but even if you put a wildcard in your database, these random queries will cause an SQL query, and those are costly.

Because SQL and random names are a bad fit, we get requests for being able to combine the bindbackend and an SQL backend in one pdns_server process. This works, but does not have the desired effect of offloading the SQL – we query both before sending out a response. So, something else needs to happen. While pondering that question this week, a few ideas came up:

  1. use IPTables u32 to match queries for the victim domain, and redirect them (I understand this can be done without generating a lot of state)
  2. teach dnsdist to pick backends based on domain name
  3. somehow get the recursors to redirect their traffic

I did not try ideas 1 and 2; I trust they will work in practice, and will effectively remove load from the SQL backends, but they still involve handling the whole malicious query load on the same server pipe. Luckily, it turns out idea 3 is feasible.

The idea behind 3 is to convince a recursor it is talking to the wrong machines, by virtue of sending it a new NSset in the AUTHORITY section of a response. Some authoritative servers will include the NSset from the zone in every response, but PowerDNS does not do this – so we need another trick.

Some time ago we added an experimental, internal-use-only feature to the Authoritative Server called lua-prequery, to be used specifically for our Recursor regression tests. While never designed for production usage, we can abuse it to make idea 3 work.

require 'posix'

function endswith(s, send)
 return #s >= #send and s:find(send, #s-#send+1, true) and true or false
end

function prequery ( dnspacket )
 qname, qtype = dnspacket:getQuestion()
 print(os.time(), qname,qtype)
 if endswith(qname, '.example.com') and posix.stat('/etc/powerdns/dropit')
 then
   dnspacket:setRcode(pdns.NXDOMAIN)
   ret = {}
   ret[1] = {qname='example.com', qtype=pdns.NS, content="ns-nosql.example.com", place=2, ttl=30}
   ret[2] = {qname='example.com', qtype=pdns.NS, content="ns-nosql2.example.com", place=2, ttl=30}
   dnspacket:addRecords(ret)
   return true
 end
 return false
end

(A careful reader noted that the stat() call, while cached, may not be the most efficient way to enable/disable this thing. Caveat emptor.)

This piece of code, combined with a reference to it in pdns.conf (‘lua-prequery-script=/etc/powerdns/prequery.lua‘), will cause pdns_server to send authoritative NXDOMAINs for any query ending in example.com, and include a new NSset, suggesting the recursor go look ‘over there’.

In our testing, BIND simply ignored the new NSset (we did not investigate why). PowerDNS Recursor believes what you tell it, and will stick to it until the TTL (30 seconds in this example) runs out. Unbound will also believe you, but if none of the machines you redirect it to actually work, it will come back. So, in general we recommend you point the traffic to a set of machines that can give valid replies.

In a lab setting, we found that with both Unbound and PowerDNS Recursor, this approach can move -all- traffic from your normal nameservers to the offload hosts, except for a few packets every TTL seconds. Depending on attack rate and TTL, this easily means offloading >99.9% of traffic, assuming no BIND is involved. In the real world, where some ISPs do use BIND for recursion, you won’t hit 99% or 90% but this approach may still help a lot.

We have not tried this on a real world attack, yet.

What’s next?

If you are under such an attack, and would like to give this a shot, please contact us, we’d love to try this on a real attack!

If you feel like toying around with this (I really want to find out how to make BIND cooperate, but I ran out of time), please get in touch (IRC preferred), I want to talk to you 🙂

PowerDNS Security Advisory 2014-02

PowerDNS Security Advisory 2014-02: PowerDNS Recursor 3.6.1 and earlier can be made to provide bad service

Hi everybody,

Please be aware of PowerDNS Security Advisory 2014-02, which you can also find below. The good news is that the currently released version of the PowerDNS Recursor is safe. The bad news is that users of older versions will have to upgrade.

PowerDNS Recursor 3.6.2, released late October, is in wide production use and has been working well for our users. If however you have reasons not to upgrade, the advisory below contains a link to a patch which applies to older versions.

Finally, if you have problems upgrading, please either contact us on our mailing lists, or privately via powerdns.support@powerdns.com (should you wish to make use of our SLA-backed support program).

We want to thank Florian Maury of French government information security agency ANSSI for bringing this issue to our attention and coordinating the security release with us and other nameserver vendors.

  • CVE: CVE-2014-8601
  • Date: 8th of December 2014
  • Credit: Florian Maury (ANSSI)
  • Affects: PowerDNS Recursor versions 3.6.1 and earlier
  • Not affected: PowerDNS Recursor 3.6.2; no versions of PowerDNS Authoritative Server
  • Severity: High
  • Impact: Degraded service
  • Exploit: This problem can be triggered by sending queries for specifically configured domains
  • Risk of system compromise: No
  • Solution: Upgrade to PowerDNS Recursor 3.6.2
  • Workaround: None known. Exposure can be limited by configuring the allow-from setting so only trusted users can query your nameserver.

Recently we released PowerDNS Recursor 3.6.2 with a new feature that strictly limits the amount of work we’ll perform to resolve a single query. This feature was inspired by performance degradations noted when resolving  domains hosted by ‘ezdns.it’, which can require thousands of queries to  resolve.

During the 3.6.2 release process, we were contacted by a government security agency with news that they had found that all major caching nameservers, including PowerDNS, could be negatively impacted by specially configured, hard to resolve domain names. With their permission, we continued the 3.6.2 release process with the fix for the issue already in there.

We recommend that all users upgrade to 3.6.2 if at all possible. Alternatively, if you want to apply a minimal fix to your own tree, it can be found here, including patches for older versions.

As for workarounds, only clients in allow-from are able to trigger the degraded service, so this should be limited to your userbase.

Note that in addition to providing bad service, this issue can be abused to send unwanted traffic to an unwilling third party. Please see ANSSI’s report for more information.

Related to recent DoS attacks: Recursor configuration file guidance

Hi everybody,

Over the past week we’ve been contacted by a few users reporting their PowerDNS Recursor became unresponsive under a moderate denial of service attack, one which PowerDNS should be expected to weather without issues.

In the course of investigating this issue, we’ve found that many PowerDNS installations on Linux are configured to consume (far) more filedescriptors than are actually available, wasting resources, potentially leading to unresponsiveness.

To check if this is the case for you, multiply the ‘max-mthreads’ setting by the ‘threads’ setting. Default values are 2048 and 2, leading to a theoretical FD consumption of 4096. Many Linux distributions default to 1024. So, our defaults exceed the Linux defaults by a large margin!

(FreeBSD defaults are far higher, and should not pose an issue).

To fix, there are four options:

  1. Reduce max-mthreads to 512 (or threads to 1 and max-mthreads to 1024) (max-mthreads was introduced in Recursor 3.2; but if you are running a version that old, please upgrade it!)
  2. Run ‘ulimit -n 32768’ before starting (perhaps put this in /etc/init.d/ script). There’s little reason to skip on this number.
  3. Investigate defaults in /etc/security/limits.conf
  4. Apply the patch in https://github.com/Habbie/pdns/commit/e24b124a4c7b49f38ff8bcf6926cd69077d16ad8 (this patch is in our 3.6.0 release. We recommend just upgrading!)

The patch automates 1 and 2, either raising the limit if possible, or  reducing max-mthreads until “it fits”.

Thank you for your attention, and if you have results to report to us on previous or current DoS attacks, please contact us privately.

DNSSEC validation for the Recursor

The PowerDNS Recursor has a proven track record — it has been serving recursive answers for millions of users for many years, with very few complaints. To preserve this robustness that people have come to rely on, any major changes should happen very carefully. In this blog post, we aim to explain our plan for adding DNSSEC to the Recursor, without compromising our current stability.

It is because of this robustness  that we have decided that, at least initially, DNSSEC validation for the PowerDNS Recursor will be a separate project and a separate daemon. As it turns out, there are also technical and development benefits to this split. Eventually, from a functional standpoint, the split might become a technical detail that need not bother the administrator.

Traditionally (unbound, BIND), DNSSEC validators have had the recursor and the validator in a single process, providing some performance benefits (pass around pointers instead of sending packets, some cache sharing, grabbing DS records while iterating downwards, the list goes on). However, this combination may increase complexity of both parts if they are coupled too closely — although we have also seen issues in other implementations that appear to stem directly from a lack of information sharing between the two.

The DNSSEC RFCs (4033/4034/4035 and 6840) explicitly designate various roles that can be implemented in separate applications. While, as said, current implementations do not separate these roles, they can be told (through configuration or through DNS query flags) to stick to specific roles.

Our plan is two-fold.

One, we will upgrade the PowerDNS Recursor to perform, in words inspired by RFC 4033, the role of a ‘Non-Validating Security-Aware Resolver’. This is the role that other DNSSEC-aware recursors play when they receive ‘Checking Disabled’ queries. In other words, the Recursor supports all relevant DNSSEC records (RRSIG, DS, NSEC[3]), understands how they interact, and knows when to send those records along with query results. The current Recursor [3.3/3.5.x], as used in production by many parties, is a very lean communication machine that prefers to spend time waiting on the network, instead of doing calculations, and the design reflects this. As such, adding cryptography operations to the Recursor would destroy many of the benefits of the current design. Because of this, there will be no crypto in the Recursor — at least for now.

Two, we will build a Validator that is a client to a Security-Aware Resolver, expecting no validation from it. This validator does no recursion of its own, relying on the Resolver it is pointed to for that. In that sense, it is a bit like a ‘Validating Security-Aware Stub Resolver’, except that it takes queries from clients. This is similar to running other validating recursors in a forwarding mode. The validator receives queries from clients, collects all the data it needs to decide on the security of an answer (keeping in mind the four security states defined in RFC 4033 section 5 or RFC 4035 section 4.3), and returns a useful response to the client.

In theory (and, from what we have seen so far, also in practice!) this means that either part can be replaced with another validating recursor (like BIND or Unbound) and the system as a whole will keep operating. This allows individual developers to focus on one side of the PowerDNS Validating Recursor equation at a time, while relying on proven code for the other side. In the end, we will provide robust implementations of both sides, of course.

We are currently experimenting with implementation details on both ends, making sure our behaviour checks out with both reality and the relevant standards, and making sure we interoperate with the existing validators and existing authoritative implementations. As our code is in heavy flux right now, we are holding off on releasing for just a bit. We hope to release a stable base for the new Validator and the modified Recursor within a few weeks, and we cordially invite the community to join us at that time — either to make things or to break them! Implementing a full DNSSEC validation stack means ticking a lot of boxes, and we do not expect to tick them in a few weeks.

If you have questions, please let us know in the comments, via Twitter (@PowerDNS), via IRC or via our mailing lists. If there is demand, we will post a follow-up article with some more technical details.