Operational Challenges when Implementing DNSSEC
As a reader of this article, you are probably familiar with the DNS cache poisoning techniques discovered a few years ago. And you have most likely heard that DNSSEC is the long term cure. But you might not know exactly what challenges are involved with DNSSEC and what experience the early adopters have gathered and documented. Perhaps you waited with our own rollout until you could gather more documentation over the operational experience when rolling out DNSSEC.
We are DNS architects with significant DNSSEC experience. Torbjörn lives in Sweden and has helped several municipalities as well as other organizations to sign their zones. Stephan Lagerholm lives in Dallas, Texas, and has been involved in implementing DNSSEC at several U.S. federal agencies. This article summarizes our experiences and learnings from implementing the technology in production environments as well as discusses associated operational issues.
There is a plethora of information available on the Internet about DNSSEC and cache poisoning attacks so we are not going to repeat it, however we feel it’s important to state where DNSSEC is today.
During the last few years the number of deployments as well as the size and importance of the signed domains has increased significantly. One of the main reasons for the DNSSEC uptake during the past year was that the U.S. Office of Management and Budget (OMB) issued a mandate requiring the signing of the .GOV domain in the beginning of the year. U.S. federal agencies were mandated to sign their domains by the end 2009. Some agencies have already implemented the technology while others are still working on it.
Acceptance of DNSSEC technology is also reaching outside of the U.S. government. Top Level Domains (TLDs) around the globe have announced DNSSEC initiatives. To mention a few, Afilias signed.ORG and Neustar recently announced signing of .US. Several ccTLDs, including .NL and .DE, announced that DNSSEC implementation is a work in progress. VeriSign announced that it is working on signing the largest TLDs, namely .COM and .NET. Finally, ICANN along with VeriSign released a time plan for signing the root zone. And of course, the poster child .SE, is on its fourth year as a signed TLD.
Several vendors have released software and products to support and make the signing of zones easier. A range of different products is now available on the market. DNS professionals now have a broad choice of technology—from collections of open-source signing scripts to advanced systems with full automation and support for FIPS certified cryptography.
DNSSEC might have a high operational impact unless it is carefully implemented. The reason for this is that DNSSEC requires some changes to the underlying DNS protocol. Those changes are in fact the first significant changes that have been made to the DNS protocol since it was invented. Those changes might sometimes fool old systems into believing that the packets are illegal. DNSSEC also introduces new operational tasks such as rolling the keys and resigning the zone. Such tasks must be performed on regular intervals. Furthermore, as with any new technology, there are misconceptions about how to interpret the RFC standard.
The first issue reported
Late summer 2007, Torbjörn Eklöv convinced the municipality of Gavle in Sweden of the benefits of DNSSEC. He proudly signed what is believed to be the first municipality zone of the world, gavle.se. At first, everything worked fine. A week or so later, Gavle received reports from citizens who couldn’t reach the municipality’s websites. It turned out that a new version of BIND was rolled out by a large service provider and that this version of BIND introduced a rather odd bug that affected DNSSEC. The result of the bug was that home users with some home routers/firewalls couldn’t reach any signed domains.
Some people who have heard about the issue at gavle.se wrongly believed that DNSSEC caused the problem and that DNSSEC is broken. However, this is not true; DNSSEC worked as expected, but there was a bug in a particular version of BIND that caused the problem.
The issue triggered some research on how home routers handle DNSSEC. IIS that run the .SE TLD issued a report describing how commonly used home routers and firewalls handled the new protocols changes in DNS . Later, Nominet, that administers the .UK TLD, issued a similar report. In addition, DENIC that administers the .DE TLD researched the same issue. The results are all discouraging; only 9 out of 38 tested home gateways supported DNSSEC correctly in the most recent report.
A BoF session was held at the latest IETF meeting to discuss the issues involving home gateways . We look forward to seeing progress in this area.
Preparing your firewall for DNSSEC
Most problems with DNSSEC are related to firewalls. Make sure to involve your security and networking administrators so that they can make the required changes before taking DNSSEC into production.
Two types of firewall issues are most common:
The first issue involves TCP. There is a misconception among firewall vendors and security administrators that DNS queries use UDP and that zone transfers use TCP. Unfortunately, this is not entirely true. DNS queries first try UDP but revert to TCP if no response is received for the initial UDP query. The possibility of something in the path blocking the initial query is much higher with DNSSEC because of the increased size of the responses.
For DNSSEC to work correctly, it is mandatory that you open your firewall for both TCP and UDP over port 53.
The second issue is related to the packet size. The authors of the DNSSEC standard realized that there might be a potential problem with TCP queries. TCP puts a higher burden on the DNS servers. (TCP is much more expensive to process than UDP.) To avoid too much TCP traffic, the authors made the EDNS0 extension mandatory for DNSSEC. EDNS0 is a standard that among other things allows a client to signal that it is capable of receiving DNS replies over UDP that are larger than the previous limit of 512 bytes. Some firewalls are not aware of the fact that the EDNS0 standard allows for larger packets and they block any DNS packet larger than the previous limit. Other firewalls allow for the large packets by default, whereas a few vendors require the firewall to be manually configured to do so. Any device in the path that does packet inspection in the application layer must be aware of the EDNS0 standard to be able to make a correct decision whether to forward the packet or not. ICANN has summarized the status of EDN0 support in some commonly used firewalls.
Note that it is not enough to test that your firewall allows large incoming DNS replies by sending DNS queries to the Internet. You must also test that an external source can receive large DNS replies that your DNS server is sending. One way of doing so is to use an open DNSSEC aware resolver.
Test and configure your firewall to allow for packets larger than 512 bytes over UDP.
Preparing your slaves
Setting up DNSSEC involves substantial changes to the master name server so it can sign and serve the signed data. However, it is easy to foresee that the slaves must be upgraded, too. The slaves are much easier to upgrade and operate because they never produce signatures. They are secondary systems that transfer data from the primary server and respond to DNS queries. But the slaves must understand how to respond to queries requesting signed data.
Slaves must be upgraded to BIND 9.3 or better to understand the NSEC standard. The newer NSEC3 standard introduces some additional requirements for the slaves. If NSEC3 is being used, the slaves must be upgraded to BIND 9.6 or better. Version 3 of NSD and any version of Secure64 DNS Authority/Signer can do both NSEC and NSEC3. Windows Server 2008 R2 for the x86-64 architecture supports DNSSEC as a master, slave, and validating resolver. However, we recommend limiting the use of the Windows platform to slaves and for domains using NSEC. Our opinion is that it is very hard to implement DNSSEC on Windows and we suggest that you wait until there is a sensible GUI and support for NSEC3. Note that the Itanium version of Windows 2008 R2 doesn’t have support for DNS and DNSSEC.
Make sure your slaves can handle the version of DNSSEC you intend to use.
If the slaves are administered by another party, contact the administrator ahead of time. Make sure the slaves are running a version capable of DNSSEC. Stephan helped a large U.S. federal agency sign their domains. They used one of the major federal contractors to run their slave servers. After multiple attempts to reach somebody that understood DNS and DNSSEC, Stephan finally learned that the slaves were running BIND 9.2.3 and that the contractor had no plans to upgrade. The only alternative for the agency was to in-source the slaves and run them themselves.
If your slaves are administered by another party, make sure you know if and what version of DNSSEC they are supporting before you start implementing.
Communicate with your parent
There are two models on how TLDs allow you to communicate with them:
- Registrant – registrar – registry model. The most common model is the registrant – registrar – registry model. In this model, the registrant (example.org) does not communicate directly with the registry (.ORG). Instead, all communication related to DNS and DNSSEC is handled through a third party registrar. This model is for example used by the .SE and .ORG TLDs.
- Registrant – registry model. This model is normally used by smaller TLDs such as .GOV. It allows direct communication between the registrant (agency.GOV) and the registry (.GOV). The TLD acts as both a registrar and a registry in this model.
Most issues described below apply to both models, but issues involving multiple registries are obviously only applicable to the registrant – registrar – registry model.
Establishing a chain of trust in DNSSEC involves uploading one or more public keys to the parent. Ultimately a DS record is published by the parent. The DS record is a smaller fingerprint that can be constructed from the DNSKEY record. To upload your keys, you must use a registrar that supports DNSSEC. If your registrar doesn’t support DNSSEC, you need to move your domains to another registrar (or convince your current registrar to start supporting DNSSEC). It usually takes a few days or up to a week to move a domain from one registrar to another.
Make sure that your registrar supports DNSSEC. If it does not, move your domain to a registrar that supports DNSSEC prior to starting to sign your zone.
Some registrars allow registration under multiple TLDs. However, just because a registrar handles DNSSEC for one TLD doesn’t mean that it handles DNSSEC for all TLDs it serves. There are several examples of registrars in Sweden that support DNSSEC for .SE but not for .ORG or .US.
Make sure that your registrar handles DNSSEC under the TLD in question.
Most registrars offer you the opportunity to use their name server instead of your own. The service is either offered for free or for an additional cost. The registrar typically provides a web interface where you can change your zone data. This is a good service and a useful choice if your domains are uncomplicated and small. Larger and more complex domains are better operated on your own servers.
Some registrars that provide this type of service can only handle DNSSEC if you use their name servers and not your own name servers. The registrar can establish the chain of trust with the parent only if the zone under their control. They lack a user interface for uploading a DS key that you generate on your own name servers.
If you intend to use your own name servers, make sure that your registrar supports this and allows you to upload a DS record for further distribution to the registry.
In theory, the child zone system should create the DS record fingerprint and upload it to the parent. In practice, some registrars require you to upload the DNSKEY record to them. They then create the DS record for you. (This is bad practice because the registrar must know the hash algorithm used to construct the DS record, which it might not know.) The DNSKEY record comes in several different formats, depending on the platform you used to create the keys (BIND, Microsoft, NSD, Secure64, etc.). The formats have minor differences, and you might have to convert the DNSKEY into a format that the registrar accepts.
Not everything works smoothly, even with the correct DNSKEY format. The logic at one registrar’s website was to deny uploading of DNSKEYs unless the optional TTL field existed. (The TTL value is completely useless in the DNSKEY context as the parent overrides this value with its own TTL). You may have to manually change your DNSKEY before uploading it to comply with the checks that the registrar is performing.
If your registrar requires you to upload the DNSKEY, make sure that your solution can generate the requested format. If not, you need to manually change the fields with a text editor.
As noted above, some registrars are performing too many checks and irrelevant checks before accepting and creating the secure delegation. Other registrars do not check at all or have limited checks that don’t work as expected. For example, some registrars assume that your key is created using a certain algorithm, and they do not double check it prior to creating a DS record. One registrar created a bogus DS record if you uploaded a DNSKEY with upper-case characters in the domain name. The bogus DS record looked valid, and troubleshooting to find this error took hours.
Another example is keys created with the tool Webmin. Webmin is a graphical tool that can be used for signing zones. Webmin defaults to using the less common DSA algorithm for its DNSKEYs. The registrar did not complain when uploading the Webmin key, and it created a bogus DS record under the assumption that it was an RSA key.
It is hard for a registrant to do anything about errors at the registrars. The best you can do is to make sure that you upload the correct key with the correct parameters such as algorithm, key length, key-id, etc. If something goes wrong, you might have to change the keys in production. Rolling the keys to the same algorithm and key length is relatively easy. But changing your keys to another algorithm adds extra complexity. It is an interesting exercise to change to another algorithm in production, but it is something we recommend avoiding if possible.
Double check the DNSKEY/DS so that it is created with the correct parameters prior to uploading it.
Communicate with your children
If you have sub-domains in your domain, you must make sure that you can accept and publish the DS records that your children upload to you. This is not a problem if you are using zone files in text format, you can simply just insert the DS record using your favorite editor. But this might be a problem if you are using an IPAM system. In that case make sure that it can insert DS records into the zones that are managed by the system. Some IPAM systems do not support insertion of DS records correctly.
Make sure that your IPAM system can insert DS records into your zones.
A common strategy among organizations with high availability requirements for their critical servers is to use a global load balancer. The global load balancer is basically a DNS server that responds differently depending on the status of the service in question. For example, assume a load balancer can respond to a question for www.example.com with 192.0.2.1 and 192.0.2.2 if both web servers are up. If .1 becomes unavailable, the load balancer notices a failure and only responds with .2. To be able to do this, the global load balancer requires you to delegate www as a sub-domain to the load balancers own DNS process. When DNSSEC is implemented, you must make sure that the load balancer can handle DNSSEC (and not that many do), otherwise it is impossible to sign the responses for those resources. Unfortunately, these resources are the most critical resources for your environment and would benefit the most from DNSSEC signing.
Make sure that your load balancers support DNSSEC. If they don’t, have an alternative strategy.
Rolling the keys
The DNSKEYs should be changed on a regular basis and when the keys are believed to be compromised. The process of doing so is called rolling the keys. There are normally two different keys in DNSSEC, the key signing keys (KSKs) and the zone signing keys (ZSKs). Rolling the ZSK is an internal process and doesn’t require communication with the parent. Rolling the KSK on the other hand, requires the parent to publish a new DS record.
There is no standard yet that describes how the communication between the parent and the child should occur when a key is rolled. Early DNSSEC-capable registrants used a web interface that allowed their registrants to upload and manipulate the DNSSEC information. With a web interface, each domain must be handled separately and there is no easy way to automate the interaction.
The web interface works for a handful of domains but becomes very cumbersome when you have many domains. For those types of organizations, it is important to make sure that there is some kind of API or script access to the registrar. This allows the organization to upload new DS record during the rollover in a convenient way.
Make sure that your registrar supports automation via an API if you have many domains.
Scripting with an API as described above is one way of communicating with the registrar. Another way of achieving the same type of automation is for the parent (or registrar) to monitor the child for any changes to the DNSKEY records. Note that the chain of trust is still intact during a non-emergency rollover. The parent can in a secure way poll the child and grab the new DNSKEY records and convert them into DS records. The polling from the parent to each signed child needs to occur at on a regular basis so that a rollover is picked up quickly. This makes the scheme best for domains with fewer delegations (in the order of thousands, not millions —consider how much bandwidth an hourly polling of 15 million children would require).
Automation is a good thing, but make sure you understand the implications when opting in for automatic detection of key rollovers. The automation scripts are not bullet proof. It has been reported that early versions of such scripts under some circumstances wrongly assumed that a key rollover occurred and deleted the DS record, thus breaking the chain of trust.
Understand the implication when opting in for automatic detection, addition, and deletion of DS records.
Management of DNSSEC
Without DNSSEC, you are not bound to any particular registrar. You can fairly easily switch to a new registrar. With DNSSEC, this changes. First of all, if you let the registrar sign the zone on your behalf, the registrar will be in charge of the key used to sign your zone. Extracting your key so that it can be imported to another registrar is not always straight forward (also remember that there is really no incentive for your previous registrar to help you as you just discontinued its service). An alternative is to unsign the zone before you change registrar, however that might not always be a viable option. The lack of standards makes it hard to change registrars on a signed domain that is in production.
You must tell your new registrar that you are using DNSSEC, and you must make sure that the registrar supports it. If not, the registrar might accept the transfer but will not be able to publish the DNSKEY records. This would result in a DS record published by the registry but no corresponding DNSKEY records at the child. This makes the zone “security lame” and validation will fail.
The same types of problems exist if you are running your own name servers. If you change your master server, make sure that you transfer the secret keys as well. Signing with new keys will not work unless you flush out the old keys with rollovers and upload a new DS record to your parent.
Have a plan ready for how to transfer your keys to a new master server.
It is important to adjust your signature validity periods and the SOA timers so that they match your organizational requirements and operational practices. It is all too often that SOAs expire and signature validity periods are too short. Unless you are restricted by guidelines saying otherwise, you should strive to set the timers reasonably high. Set the timers so that your zones can cope with an outage as long as the longest period that the system might be unattended. For example, if you know that your top DNS administrator usually has three weeks of vacation in July, you could consider setting the times so that the zone can survive four weeks of downtime. If you are confident in your signing solution and are monitoring your signatures carefully, you might set it a little bit lower.
Signature lifetime is a tradeoff between security (low signature lifetimes) and convenience (high signature lifetime). Setting a really high signature lifetime is convenient from an operational perspective but is less secure. Some organizations such as the IETF use an obscene signature lifetime of one year (dig ietf.org DNSKEY +dnssec | grep RRSIG). This is clearly not recommended, and they should know better!
Carefully set your signature lifetimes and SOA times to reflect your organization’s operational requirements and practices.
A note on validation
This article has focused on the authoritative part of DNSSEC. That part includes signing resource records and serving DNS data. The operational challenges with signing data are much greater than the challenges of validating data. To validate data, the only thing you need to do on a regular basis is update your trust anchor file. Make sure to do so. Torbjörn reports several outages when the .SE DNSKEY used in .SE’s trust anchor expired in Jan-01. We look forward to the work being done in this area to automate the process.
DNSSEC has been deployed and taken in production for several large and critical domains.
It is not hard to implement DNSSEC but doing so introduces some operational challenges. Those challenges exist both during the implementation phase when the zone is being signed for the first time as well as during the operation of the zone. Make sure you understand the impact and plan ahead.
The list below is a checklist that summarizes the most important pitfalls with DNSSEC:
- Open your firewall for both large UDP packets and TCP over port 53.
- Check the DNSSEC capabilities of all your masters and slave servers.
- Check the DNSSEC capabilities of your registrar and understand their requirements for the public key you are uploading.
- Make sure your IPAM system can handle secure delegations.
- Plan how to handle load balancers.
- Develop an automation strategy if you have a lot of zones.
- Have a plan on how to transfer your keys to a new master server in case of disaster.
- Implement a policy for DNSSEC timer settings.