An introduction to PowerDNS Recursor 3.7.0

Today we released the first Release Candidate (RC) of PowerDNS Recursor 3.7.0. And although we do document (almost) all of our features, documentation is not the same as effectively informing our users of what our software can do.

In this blog, I’ll be highlighting some of the things that are new in 3.7.0, or which arrived recently in the 3.6 series.

Binding to 0.0.0.0 and ::

For a long time, PowerDNS Recursor did not support binding to the ‘any’ addresses, because Unix & POSIX semantics mean that a socket bound to such an any address has no control over which IP address it answers from. So for example, if your server is on 192.168.1.2 and 192.168.1.3, and question comes in on a socket bound to 0.0.0.0, the answer might go out to either the .2 or the .3 address – no matter what address the question arrived on!

Binding to 0.0.0.0 (and ::, of course) is very useful for load balancing purposes, so as of 3.7.0, we use some specific calls to implement ‘sendfromto()’ so we can force the source address to be correct. Just bind to 0.0.0.0 or :: and the Recursor will do the right thing.

Live graphs

The PowerDNS Recursor has supported our HTTP-based RESTful JSON API for a while now, although we have not publicized this sufficiently while we stabilized the API. In 3.7.0, this API was enhanced to enable it to power a rather impressive (even if we say so ourselves) live display of traffic, updating every second. It looks like this:

For now, development of this live website is proceeding faster than the PowerDNS release schedule, which means you can find the HTML for this page (which is independent of PowerDNS itself) on GitHub, where you can also find instructions on how to enable this live display in the stock version of Recursor 3.7.0.

Graphing as a service

We’ve explained this at length in a previous post, but with our built-in Carbon/Graphite support, it is a breeze to get pretty graphs of your own recursor, or, if you have problems to send your statistics our way. By having a live view on your system, we can quickly diagnose issues. Recommended!

Enhanced Lua scripting

For the last few months, the world of DNS has been about one thing only: attacks. Amplification attacks, reflection attacks and ‘infinite recursion queries’. Our vision is that PowerDNS should be as robust as possible by generally being fast and smart, and therefore be resilient against attacks. Sadly, some attacks can be rather nasty, and can’t be fended off with generic solutions.

PowerDNS has extensive support for Lua scripting, and in 3.7.0 we added a number of features intended to serve as the basis of “scripts against the attack of the day”.

Next to the existing hooks (preresolve, which intercepts a packet after parsing, postresolve which can edit a packet after the answer is in, and nxdomain and noerror which are called for those two results), we’ve added two new ones: ipfilter and preoutquery.

‘ipfilter’ hooks in before client packet parsing even starts, and allows you to block a packet before it has used a lot of CPU cycles. This is a great place to (dynamically) stop attacks.

‘preoutquery’ is a new kind of hook. It does not fire based on user (client) traffic, but before a remote nameserver is consulted. Frequently, attack traffic is for loads of random domain names, but in the end only a few (fixed) IP addresses are the target of the attack. If you know these addresses, you can trigger on them in ‘preoutquery’, and drop the query quickly before it used more CPU and network. But even better, once you know the client IP address that is performing the attack on you, you can block it with ipfilter!

Here is a sample script that does just that:

lethalgroup=iputils.newnmgroup()
lethalgroup:add("192.121.121.0/24") -- touch these nameservers and you die

blockset=iputils.newipset() -- which client IP addresses we block

function preoutquery(remoteip, domain, qtype)
--  print("pdns wants to ask "..remoteip:tostring().." about "..domain.." "..qtype.." on behalf of requestor "..getlocaladdress())
    if(lethalgroup:match(remoteip))
    then
--      print("We matched the group "..lethalgroup:tostring().."!", "killing query dead & adding requestor "..getlocaladdress().." to block list")
        blockset[iputils.newca(getlocaladdress())]=1
        return -3,{} --   -3 means 'kill'
    end
    return -1,{}         --   -1 means 'no opinion'
end


local delcount=0

function ipfilter(remoteip)
    delcount=delcount+1

    if((delcount % 10000)==0)
    then
--      print("Clearing blockset!")
        blockset=iputils.newipset()  -- clear it
    end

    if(blockset[remoteip] ~= nil) then
        return 1         -- block!
    end
    return -1                -- no opinion
end

This script blocks an attacker for the next 10000 queries if they ever ‘touched’ a namserver known to be involved in malicious traffic.

The ‘ipset’ used above is fast enough that a 10 million long list of blocked IP addresses doesn’t noticeably affect performance!

Improved ringbuffers

To investigate traffic, ringbuffers of the last x-thousand (by default 10000) queries or errors are available. Some sample output:

# rec_control top-queries
Over last 10000 entries:
4.04% e3191.dscc.akamaiedge.net.|A
4.04% e3191.dscc.akamaiedge.net.|AAAA
1.50% pool.ntp.org.|A
1.44% us-courier.push-apple.com.akadns.net.|A
1.25% twitter.com.|A
1.24% twitter.com.|AAAA
0.42% www.google.com.|A
(...)
77.40% rest

When under attack, typically a lot of random traffic comes in which makes it harder to see what is going on. For this purpose, we’ve added the ‘pub’ filters, which group queries via the Mozilla Public Suffix list:

# rec_control top-pub-queries
Over last 10000 entries:
5.26% google.com.|A
5.21% google.com.|AAAA
4.38% akamaiedge.net.|AAAA
4.08% akamaiedge.net.|A
2.35% gstatic.com.|A

Other interesting rings:

  • top-servfail-queries, listing the top queries leading to servfail responses. (top-pub-servfail-queries grouped)
  • top-largeanswer-remotes, listing top clients receiving large answers (as typically used in reflection attacks)
  • top-remotes, listing clients causing most queries
  • top-servfail-remotes, listing clients causing most servfail responses

Finally

We hope you enjoy these new features, and we’d love to hear your thoughts on if they do the job, or how they could be better!


			
					

Diverting recursor-to-auth attacks

We get frequent reports from users/customers about various DNS-related attacks they are facing on either their authoritatives or recursors. This post focuses on one kind of attack that involves both. (Cloudmark wrote about this some time ago as well).

The attack works like this: given a target domain of example.com, the attacker takes his botnet, and has it fire off high amounts of $RANDOM.example.com queries. These queries go from the infected hosts to their recursors (i.e. normal ISP recursors). The recursors then need to go out to the auths, after all they don’t have any random string in cache.

When this attack starts, there is no packet count amplification – bot sends query to recursor, recursor sends to auth, answer flows back, done. However, if enough of this happens, one or more of the auths for example.com may get overloaded, and start responding more slowly, or not respond at all. The recursors will then either retry or move on to other auths, spreading the attack in the most effective and destructive way over the auths.

These attacks are painful, especially for authoritatives backed by (SQL) databases, like many PowerDNS users are running. Legitimate traffic for existing names gets cached very well inside pdns_server, but even if you put a wildcard in your database, these random queries will cause an SQL query, and those are costly.

Because SQL and random names are a bad fit, we get requests for being able to combine the bindbackend and an SQL backend in one pdns_server process. This works, but does not have the desired effect of offloading the SQL – we query both before sending out a response. So, something else needs to happen. While pondering that question this week, a few ideas came up:

  1. use IPTables u32 to match queries for the victim domain, and redirect them (I understand this can be done without generating a lot of state)
  2. teach dnsdist to pick backends based on domain name
  3. somehow get the recursors to redirect their traffic

I did not try ideas 1 and 2; I trust they will work in practice, and will effectively remove load from the SQL backends, but they still involve handling the whole malicious query load on the same server pipe. Luckily, it turns out idea 3 is feasible.

The idea behind 3 is to convince a recursor it is talking to the wrong machines, by virtue of sending it a new NSset in the AUTHORITY section of a response. Some authoritative servers will include the NSset from the zone in every response, but PowerDNS does not do this – so we need another trick.

Some time ago we added an experimental, internal-use-only feature to the Authoritative Server called lua-prequery, to be used specifically for our Recursor regression tests. While never designed for production usage, we can abuse it to make idea 3 work.

require 'posix'

function endswith(s, send)
 return #s >= #send and s:find(send, #s-#send+1, true) and true or false
end

function prequery ( dnspacket )
 qname, qtype = dnspacket:getQuestion()
 print(os.time(), qname,qtype)
 if endswith(qname, '.example.com') and posix.stat('/etc/powerdns/dropit')
 then
   dnspacket:setRcode(pdns.NXDOMAIN)
   ret = {}
   ret[1] = {qname='example.com', qtype=pdns.NS, content="ns-nosql.example.com", place=2, ttl=30}
   ret[2] = {qname='example.com', qtype=pdns.NS, content="ns-nosql2.example.com", place=2, ttl=30}
   dnspacket:addRecords(ret)
   return true
 end
 return false
end

(A careful reader noted that the stat() call, while cached, may not be the most efficient way to enable/disable this thing. Caveat emptor.)

This piece of code, combined with a reference to it in pdns.conf (‘lua-prequery-script=/etc/powerdns/prequery.lua‘), will cause pdns_server to send authoritative NXDOMAINs for any query ending in example.com, and include a new NSset, suggesting the recursor go look ‘over there’.

In our testing, BIND simply ignored the new NSset (we did not investigate why). PowerDNS Recursor believes what you tell it, and will stick to it until the TTL (30 seconds in this example) runs out. Unbound will also believe you, but if none of the machines you redirect it to actually work, it will come back. So, in general we recommend you point the traffic to a set of machines that can give valid replies.

In a lab setting, we found that with both Unbound and PowerDNS Recursor, this approach can move -all- traffic from your normal nameservers to the offload hosts, except for a few packets every TTL seconds. Depending on attack rate and TTL, this easily means offloading >99.9% of traffic, assuming no BIND is involved. In the real world, where some ISPs do use BIND for recursion, you won’t hit 99% or 90% but this approach may still help a lot.

We have not tried this on a real world attack, yet.

What’s next?

If you are under such an attack, and would like to give this a shot, please contact us, we’d love to try this on a real attack!

If you feel like toying around with this (I really want to find out how to make BIND cooperate, but I ran out of time), please get in touch (IRC preferred), I want to talk to you :-)

PowerDNS “Graphing as a Service”

Over the past few months,we’ve worked on our graphing tool, which has proved to be a wonderful aid in debugging. If you want to get the best help from us in diagnosing your problems, read on.

PowerDNS Authoritative Server and PowerDNS Recursor can both emit statistics using the ‘carbon’ format as used by Graphite. This means that feeding your stats into the very powerful Graphite software is as easy as setting:

carbon-server=2001:888:2000:1d::2

And from that point on, your PowerDNS product will send statistics to that IP address every 30 seconds. The target address should either run Graphite, or our own developed Metronome. Metronome is less powerful than Graphite, but very easy to setup. Graphs can be configured in Javascript, and out of the box, Metronome comes with support for graphing all PowerDNS products, plus the output of a small statistics program we wrote that emits network and CPU statistics.

Screenshot from 2014-12-11 14:31:11

To make this easier for everyone, we also run a public instance of Metronome on http://xs.powerdns.com/metronome/, and you are welcome to configure your PowerDNS products to send statistics by putting the following in pdns.conf or recursor.conf:

carbon-server=82.94.213.34    # the IPv6 address works as well

carbon-ourname=pick-something

Your data will then appear in the dropdown with the name ‘pick-something’. If you don’t pick your own name, PowerDNS will use your hostname, but you might consider that to be too revealing. It might also trample existing data. Carbon travels over port 2003, and if PowerDNS can’t connect to the server, nothing is interrupted.

For the PowerDNS Recursor, you can even enable carbon reporting at runtime using:

rec_control set-carbon-server 82.94.213.34 pick-something

To disable, run ‘rec_control set-carbon-server’.

If you have any (performance) issues with PowerDNS you want help with, it is tremendously useful to turn on reporting to our Metronome instance – very often we can spot your problem from the graph quickly.

Screenshot from 2014-12-11 14:37:20

Now, our public Metronome service is of course public, but you can obscure your data by picking an innocuous carbon-ourname.

Private Metronome service is also available for holders of PowerDNS support agreements. Finally, Metronome is easy to install, so you can also benefit from it locally.

Enjoy!

PowerDNS Security Advisory 2014-02

PowerDNS Security Advisory 2014-02: PowerDNS Recursor 3.6.1 and earlier can be made to provide bad service

Hi everybody,

Please be aware of PowerDNS Security Advisory 2014-02, which you can also find below. The good news is that the currently released version of the PowerDNS Recursor is safe. The bad news is that users of older versions will have to upgrade.

PowerDNS Recursor 3.6.2, released late October, is in wide production use and has been working well for our users. If however you have reasons not to upgrade, the advisory below contains a link to a patch which applies to older versions.

Finally, if you have problems upgrading, please either contact us on our mailing lists, or privately via powerdns.support@powerdns.com (should you wish to make use of our SLA-backed support program).

We want to thank Florian Maury of French government information security agency ANSSI for bringing this issue to our attention and coordinating the security release with us and other nameserver vendors.

  • CVE: CVE-2014-8601
  • Date: 8th of December 2014
  • Credit: Florian Maury (ANSSI)
  • Affects: PowerDNS Recursor versions 3.6.1 and earlier
  • Not affected: PowerDNS Recursor 3.6.2; no versions of PowerDNS Authoritative Server
  • Severity: High
  • Impact: Degraded service
  • Exploit: This problem can be triggered by sending queries for specifically configured domains
  • Risk of system compromise: No
  • Solution: Upgrade to PowerDNS Recursor 3.6.2
  • Workaround: None known. Exposure can be limited by configuring the allow-from setting so only trusted users can query your nameserver.

Recently we released PowerDNS Recursor 3.6.2 with a new feature that strictly limits the amount of work we’ll perform to resolve a single query. This feature was inspired by performance degradations noted when resolving  domains hosted by ‘ezdns.it’, which can require thousands of queries to  resolve.

During the 3.6.2 release process, we were contacted by a government security agency with news that they had found that all major caching nameservers, including PowerDNS, could be negatively impacted by specially configured, hard to resolve domain names. With their permission, we continued the 3.6.2 release process with the fix for the issue already in there.

We recommend that all users upgrade to 3.6.2 if at all possible. Alternatively, if you want to apply a minimal fix to your own tree, it can be found here, including patches for older versions.

As for workarounds, only clients in allow-from are able to trigger the degraded service, so this should be limited to your userbase.

Note that in addition to providing bad service, this issue can be abused to send unwanted traffic to an unwilling third party. Please see ANSSI’s report for more information.

Recursor 3.6.2

Note
Version 3.6.2 is a bugfix update to 3.6.1. Released on the 30th of October 2014.

Official download page

A list of changes since 3.6.1 follows.

  • commit ab14b4f: expedite servfail generation for ezdns-like failures (fully abort query resolving if we hit more than 50 outqueries)
  • commit 42025be: PowerDNS now polls the security status of a release at startup and periodically. More detail on this feature, and how to turn it off, can be found in Section 2, “Security polling”.
  • commit 5027429: We did not transmit the right ‘local’ socket address to Lua for TCP/IP queries in the recursor. In addition, we would attempt to lookup a filedescriptor that wasn’t there in an unlocked map which could conceivably lead to crashes. Closes ticket 1828, thanks Winfried for reporting
  • commit 752756c: Sync embedded yahttp copy. API: Replace HTTP Basic auth with static key in custom header
  • commit 6fdd40d: add missing #include <pthread.h> to rec-channel.hh (this fixes building on OS X).

Authoritative Server 3.4.1

Warning
Version 3.4.1. of the PowerDNS Authoritative Server is a major upgrade if you are coming from 2.9.x. Additionally, if you are coming from any 3.x version (including 3.3.1), there is a mandatory SQL schema upgrade. Please refer toSection 6, “From PowerDNS Authoritative Server 3.3.1 to 3.4.0” and any relevant sections before it, before deploying this version. There are no 3.4.1 upgrade notes.
[Note] Note
Released October 30th, 2014

Find the downloads on our download page.

This is a bugfix update to 3.4.0 and any earlier version.

A list of changes since 3.4.0 follows.

PowerDNS Security Status Polling

PowerDNS software sadly sometimes has critical security bugs. Even though we send out notifications of these via all channels available, our recent security releases have taught us that not everybody actually finds out about important security updates via our mailing lists, Facebook and Twitter.

To solve this, the development versions of PowerDNS software have been updated to  poll for security notifications over DNS, and log these periodically. Secondly, the security status of the software is available for monitoring using the built-in metrics. This allows operators to poll for the PowerDNS security status and alert on it.

In the implementation of this idea, we have taken the unique role of operating system distributors into account. Specifically, we can deal with backported security fixes.

This feature can easily be disabled, and operators can also point the queries point at their own status service.

In this post, we want to inform you that the most recent snapshots of PowerDNS now include security polling, and we want to solicit your rapid feedback before this feature becomes part of the next PowerDNS releases.

Implementation

PowerDNS software periodically tries to resolve ‘auth-x.y.z.security-status.secpoll.powerdns.com|TXT’ or ‘recursor-x.y.z.security-status.secpoll.powerdns.com|TXT’ (if the security-poll-suffix setting is left at the default of secpoll.powerdns.com). No other data is included in the request.

The data returned is in one of the following forms:

  • NXDOMAIN or resolution failure
  • “1 Ok” -> security-status=1
  • “2 Upgrade recommended for security reasons, see http://powerdns.com/..” -> security-status=2
  • “3 Upgrade mandatory for security reasons, see http://powerdns.com/..” -> security-status=3

In cases 2 or 3, periodic logging commences at syslog level ‘Error’. The metric security-status is set to 2 or 3 respectively. The security status could be lowered however if we discover the issue is less urgent than we thought.

If resolution fails, and the previous security-status was 1, the new security-status becomes 0 (‘no data’). If the security-status was higher than 1, it will remain that way, and not get set to 0. In this way, security-status of 0 really means ‘no data’, and can not mask a known problem.

Distributions

Distributions frequently backport security fixes to the PowerDNS versions they ship. This might lead to a version number that is known to us to be insecure to be secure in reality.

To solve this issue, PowerDNS can be compiled with a distribution setting which will move the security polls from: ‘auth-x.y.z.security-status.secpoll.powerdns.com’ to ‘auth-x.y.z-n.debian.security-status.secpoll.powerdns.com

Note two things, one, there is a separate namespace for debian, and secondly, we use the package version of this release. This allows us to know that 3.6.0-1 (say) is insecure, but that 3.6.0-2 is not.

Details and how to disable

The configuration setting ‘security-poll-suffix’ is by default set to ‘secpoll.powerdns.com’. If empty, nothing is polled. This can be moved to ‘secpoll.yourorganization.com’. Our up to date secpoll zonefile is available on github for this purpose.

If compiled with PACKAGEVERSION=3.1.6-abcde.debian, queries will be sent to “auth-3.1.6-abcde.debian.security-status.security-poll-suffix”.

Delegation

If a distribution wants to host its own file with version information, we can delegate dist.security-status.secpoll.powerdns.com to their nameservers directly.