PowerDNS Blog

Recursor: Extended DNS Errors Help You Troubleshooting | PowerDNS Blog

Written by Otto Moerbeek | Mar 12, 2024 10:31:37 AM

 

This is the seventh episode of a series of blog posts we are publishing, mostly around recent developments with respect to PowerDNS Recursor. The previous blog post is: ZONEMD, the missing validation.

This post is about how Extended DNS Errors (EDE) can help you diagnose DNS issues. EDEs are described in rfc8914. An EDE has two parts: an error code and an optional error text. This information is sent to clients in an EDNS OPT record that is added to answer. This IANA document contains the most recent list of assigned EDE codes.

Since the release of version 4.5.0 in May 2021 PowerDNS Recursor can add EDEs to query results. This version can add EDEs in three different ways:

Version 4.9.0, released in June 2023 further extends the ability to add EDEs, for example

  • when all authoritative servers for a zone are unreachable or,
  • when synthesizing answers by using the aggressive NSEC cache.
Below we present examples of cases where the Recursor adds EDEs because of DNSSEC valiation or other reasons, demonstrating how EDEs can help you troubleshooting DNS issues. In these examples, the Recursor runs on the local LAN or on the local machine and is configured with extended-resolution-errors enabled for EDEs to be added. Starting with version 5.0.0 this setting is enabled by default.

DNSSEC Validation Error

$ dig brokendnssec.net

; <<>> DiG 9.18.24 <<>> brokendnssec.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 62907
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
; EDE: 10 (RRSIGs Missing)
;; QUESTION SECTION:
;brokendnssec.net.        IN    A

;; Query time: 57 msec
;; SERVER: 192.168.178.6#53(192.168.178.6) (UDP)
;; WHEN: Thu Mar 07 10:16:27 CET 2024
;; MSG SIZE  rcvd: 51

We use the dig command, its output may be daunting at times, but other commands like host or nslookup hide too much details. When we look at the output, we see there are no ANSWER records, we did not receive an answer to our question. We also notice status: SERVFAIL. A resolver must return this error code if validation failed (or some other condition why it was not able to produce an answer). But why did DNSSEC validation fail? We did receive one ADDITIONAL record: an OPT record, listed by dig under the OPT PSEUDOSECTION. The OPT record contains the EDE information added by the Recursor: in this case, it is because the Recursor did not receive RRSIG records from the authoritative server, as is signaled by EDE: 10. The dig command prints the description of the EDE (RRSIGs Missing) next to the error code.

Other common causes of DNSSEC validation failures are 7: Signature Expired or 9: DNSKEY Missing.

Synthesized results producing an EDE

Starting with version 4.9, there are a few more cases where the Recursor adds an EDE to an answer. Let's look at a few examples:

$ dig doesnotexist.powerdns.com

; <<>> DiG 9.18.24 <<>> doesnotexist.powerdns.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 40432
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
; EDE: 29: (Result from negative cache for entire name)
;; QUESTION SECTION:
;doesnotexist.powerdns.com.    IN    A

;; AUTHORITY SECTION:
powerdns.com.        3597    IN    SOA    pdns-public-ns1.powerdns.com. peter\.van\.dijk.powerdns.com. 2024030702 10800 3600 604800 3600

;; Query time: 0 msec
;; SERVER: 192.168.178.6#53(192.168.178.6) (UDP)
;; WHEN: Thu Mar 07 10:14:19 CET 2024
;; MSG SIZE  rcvd: 169

In this case PowerDNS Recursor produces a synthesized result: the result was based on the contents of the caches it maintains without further contacting the authoritative servers of the powerdns.com zone. The EDE is 29, used to signal a synthesized result  (an addition to the list of EDEs on our request) and the text explains that this result was based on the negative cache. Its contents allowed the Recursor to conclude that this name does not exist in the powerdns.com zone. This is possible when previous queries for this name (possibly using another query type) produced a NXDOMAIN result.

The recursor can also decide a name does not exist because it knows the parent does not exist, resulting in another type of synthesized answer:

$ dig foo.doesnotexist.powerdns.com

; <<>> DiG 9.18.24 <<>> foo.doesnotexist.powerdns.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 23286
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
; EDE: 29: (Result synthesized by nothing-below-nxdomain (RFC8020))
;; QUESTION SECTION:
;foo.doesnotexist.powerdns.com.    IN    A

;; AUTHORITY SECTION:
powerdns.com.        3012    IN    SOA    pdns-public-ns1.powerdns.com. peter\.van\.dijk.powerdns.com. 2024030702 10800 3600 604800 3600

;; Query time: 0 msec
;; SERVER: 192.168.178.6#53(192.168.178.6) (UDP)
;; WHEN: Thu Mar 07 10:24:04 CET 2024
;; MSG SIZE  rcvd: 185

The above two examples help you understand why the Recursor says a name does not exist, which can help you diagnosing why a name does not resolve. A common scenario is: a user tries to resolve a name, resulting in a negative answer. Then the user adds the same name to the zone, and tries to resolve again. If the relevant content of the Recursor caches did not expire yet, the user still will get a negative result. But with the EDE, you know the result is based on cached information and that it is a matter of time (or flushing caches) before the name will resolve.

If DNSSEC is enabled on a zone, the NSEC aggressive cache allows for yet another type of synthesis: a NSEC or NSEC3 record might be known that denies the existence of the queried name (or a name and type combination).

$ dig www2.powerdns.com

; <<>> DiG 9.18.24 <<>> www2.powerdns.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 23719
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
; EDE: 29: (Result synthesized from aggressive NSEC cache (RFC8198))
;; QUESTION SECTION:
;www2.powerdns.com.        IN    A

;; AUTHORITY SECTION:
powerdns.com.        3596    IN    SOA    pdns-public-ns1.powerdns.com. peter\.van\.dijk.powerdns.com. 2024030702 10800 3600 604800 3600

;; Query time: 0 msec
;; SERVER: 192.168.178.6#53(192.168.178.6) (UDP)
;; WHEN: Thu Mar 07 10:26:58 CET 2024
;; MSG SIZE  rcvd: 174

Note that this result is also dependent on earlier query activity: the relevant NSEC record must be present in the aggressive NSEC cache. We have seen cases where an aggressive NSEC decision was not expected, not because the Recursor was wrong, but because the authoritative server did not provide the right NSEC records on earlier queries. Knowing the decision that lead to the NXDOMAIN result helps a lot in cases like this.

EDEs can also be added on non-failure results, for example when the aggressive NSEC cache is used to produce a NOERROR response without answer records (also known as NODATA), to signal that a particular name/type combination does not exist. This is in contrast to the case where a name does not exist at all.

$ dig powerdns.com AAAA

; <<>> DiG 9.18.24 <<>> +retry -p 5301 @127.0.0.1 powerdns.com AAAA
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12372
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
; EDE: 29: (Result synthesized from aggressive NSEC cache (RFC8198))
;; QUESTION SECTION:
;powerdns.com.            IN    AAAA

;; AUTHORITY SECTION:
powerdns.com.        3598    IN    SOA    pdns-public-ns1.powerdns.com. peter\.van\.dijk.powerdns.com. 2024030702 10800 3600 604800 3600

;; Query time: 0 msec
;; SERVER: 127.0.0.1#5301(127.0.0.1) (UDP)
;; WHEN: Mon Mar 11 12:57:13 CET 2024
;; MSG SIZE  rcvd: 169

EDE on authoritative server failure

The next type of EDE we like to show is a failure to contact an authoritative server needed to resolve a name. Let's first take a look at the case of complete network failure (due to a network cable being unplugged).

$ dig www.powerdns.com

; <<>> DiG 9.18.24 <<>> +retry -p 5301 @127.0.0.1 www.powerdns.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 41854
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
; EDE: 22 (No Reachable Authority): (delegation .)
;; QUESTION SECTION:
;www.powerdns.com.        IN    A

;; Query time: 1 msec
;; SERVER: 127.0.0.1#5301(127.0.0.1) (UDP)
;; WHEN: Thu Mar 07 10:32:00 CET 2024
;; MSG SIZE  rcvd: 63

We see that not even the root (.) servers could be contacted, a strong indication we are looking at a general connectivity issue. Another example, where no authoritative server (of a specific zone) could be reached:

$ dig www.powerdns.com

; <<>> DiG 9.18.24 <<>> +retry -p 5301 @127.0.0.1 www.powerdns.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 26622
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
; EDE: 22 (No Reachable Authority): (delegation powerdns.com)
;; QUESTION SECTION:
;www.powerdns.com.        IN    A

;; Query time: 0 msec
;; SERVER: 127.0.0.1#5301(127.0.0.1) (UDP)
;; WHEN: Thu Mar 07 10:33:55 CET 2024
;; MSG SIZE  rcvd: 74

The delegation information shows which name servers could not be reached (in a simulated test environment). This EDE could be a result of a specific connectivity issue or a misconfigured zone.

Using Lua scripting to set an EDE

The last example shows a unique feature of PowerDNS Recursor: the ability to modify query results using Lua scripting. The example uses a lua-dns-script having the following contents:

local suffixMatchGroup = newDS()
suffixMatchGroup:add({'example.com', 'example.net'})

function preresolve(dnsQuestion)
  if (dnsQuestion.qtype == pdns.A or dnsQuestion.qtype == pdns.AAAA) and
      suffixMatchGroup:check(dnsQuestion.qname) then
    dnsQuestion.extendedErrorCode = 17 -- Filtered
    dnsQuestion.extendedErrorExtra = "Result modified by example Lua script"
    dnsQuestion:setRecords({})
    return true -- we provided an answer
  end
  return false -- let regular processing continue
end

The preresolve function will be called by the Recursor before processing a query. When the conditions match, the script will provide an empty answer including EDE code and text. Querying for a name in one of the two example domains now results in:

$ dig www.example.com

; <<>> DiG 9.18.24 <<>> +retry -p 5301 @127.0.0.1 www.example.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45638
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
; EDE: 17 (Filtered): (Result modified by example Lua script)
;; QUESTION SECTION:
;www.example.com.        IN    A

;; Query time: 0 msec
;; SERVER: 127.0.0.1#5301(127.0.0.1) (UDP)
;; WHEN: Tue Mar 12 10:32:58 CET 2024
;; MSG SIZE  rcvd: 87

See malware protection for much more elaborate and scalable filtering solutions.

Conclusion

The Extended DNS Errors codes the Recursor adds to results help with the diagnosis of DNS issues. We are looking at more cases where we can add EDEs to answers, to help troubleshooting in even more cases.