Application layer DoS attacks abuse a higher-level protocol—in our context DNS—and tie resources of other participants of the same protocol. This distinguishes application-level attacks from other forms of DoS attacks, e.g., volumetric attacks, which are agnostic to protocol and application, but relatively easy to filter and defend against. Application-layer attacks can target more different resources like CPU time and upstream bandwidth, while volumetric attacks can only consume downstream bandwidths, making them interesting for many cases. In this section we will first introduce DNS water torture attacks, an emerging application layer DoS technique that has already severely threatened the DNS infrastructure. We will then show that a smart attacker can craft delicate chains of DNS records to leverage resolvers for even more powerful attacks than those possible with DNS water torture.
3.1 DNS Water Torture
DNS water torture attacks—also known as random prefix attacks—flood the victim’s DNS servers with requests such that the server runs out of resources to respond to benign queries. Such attacks typically target the authoritative name server (ANS) hosting the victim’s domain, such that domains hosted at the target server become unreachable. Resolvers would typically cache the responses of the queried domains, and therefore mitigate naïve floods in that they refrain from identical follow-up queries. To this end, attackers evade caching by using unique domain names for each query, forcing resolvers to forward all queries to the target ANS. A common way is prepending a unique sequence to the domain—the random prefix. In practice, attackers either use monotonically increasing counters, hash this counter, or use a dictionary to create prefixes. As the DNS infrastructure, on the other hand, heavily relies on caching on multiple layers in the DNS hierarchy, ANS are typically not provisioned to withstand many unique and thus non-cached requests—leaving ANS vulnerable to water torture attacks.
Water torture attacks were observed for the first time in early 2014 [1, 41, 51] and have since been launched repeatedly. The main ingredient for this attack is sufficient attack bandwidth, which overloads the target ANS with “too many” requests. As this does not require IP spoofing, attackers can easily facilitate botnets to maximize their attack bandwidth. In fact, several large DDoS botnets (e.g., Mirai [2] or Elknot [28]) support DNS water torture.
While water torture attacks have been fairly effective, their naïve concept has noticeable limitations:
-
1.
Water torture attacks can usually be easily detected because the attack traffic shows exceptionally high failure rates for particular domains, as none of the requested (random-looking) domain names actually exists. NXDOMAIN responses are normally caused by configuration error and therefore often monitored.
-
2.
Water torture attacks provide no amplification, as every query by the attacker eventually results in only a single query to the target ANS—unless queries are resent in case of packet loss. The victim-facing attack traffic is thus bound by number of queries that the attacker can send. This is in stark contrast to volumetric attacks that offer more than tenfold amplification [44].
3.2 Chaining-Based DNS DoS Attack
We now propose a novel type of DNS application layer attacks that abuse chains in DNS to overcome the aforementioned limitations of water torture, yet stay in a similar threat model (Sect. 2). The main intuition of our attack is that an attacker can utilize request chains that amplify the attack volume towards a target ANS. This is achieved via aliases, i.e., a popular feature defined in the DNS specification and frequently used in practice.
CNAME Records DNS request chains exist due to the functionality of creating aliases in DNS, e.g., using standard CNAME resource records (RR) [31, 32]. A CNAME RR, short for canonical name, works similar to pointers in programming languages. Instead of providing the desired data for a resolver, CNAME specifies a different DNS location from where to request the RR. One common use is to share the same RRs for a domain and the which overloads the target ANS with “www” subdomain. In this case, a CNAME entry for “www.example.com.” points to “example.com.”. When a client asks the resolver for the RRs of a certain type and domain, the resolver recursively queries the ANS for the RRs, resulting in three cases to consider:
-
Domain Does Not Exist or No Data. The domain does not exist (NXDOMAIN status) or no matching resource record (including CNAME records) was found (NODATA status). The ANS returns this status.
-
Resource Records Exists. The desired resource record’s data is immediately returned by the ANS. The DNS specification enforces that either data, or an alias (i.e., CNAME) may exist for a domain, but never both—i.e., there was no CNAME record for the request domain.
-
Domain Exists and Contains. CNAME response The resolver must follow the CNAME regardless of the requested record type. This may cause the resolver to send new queries, potentially even to different ANSs.
The last case allows chaining of several requests. In case of CNAME records, resolvers have to perform multiple lookups to load the data (unless the records are cached). CNAME records can also be chained, meaning the target of a CNAME records points to another CNAME record. This increases the number of lookups per initial query. There is no strict limit to the length of chains. However, resolvers typically enforce a limit to prevent loops of CNAME records. After reaching this limit, resolvers either provide a partial answer, or respond with an error message.
Note that CNAME records provide delegation between arbitrary domains, i.e., also to domains in unrelated zones. If all the CNAME records are hosted in the same zone, the ANS can provide multiple CNAMEs in one answer, by already providing the next records in the chain. By chaining CNAME records between two different ANSs, i.e., by alternating between them, an ANS can only know the next CNAME entry in the chain.
DNS Chaining Attack. The possibility to chain DNS queries via CNAME RRs opens a new form of application-layer DoS attack. Let an attacker set up two domains on different ANSs. The first domain will be hosted by the target ANS, and the second (or optionally further) domain(s) by some intermediary ANS(s). The zones are configured to contain long CNAME chains alternating between both domains. An example can be found in Listing 1, where a chain ping-pongs between the target and an intermediary ANS, until the record with prefix i. If an attacker now sends a single name lookup to query for the record at the start of the chain, the resolver has to follow all chain elements to retrieve the final RR. A large final RR, such as the TXT, additionally targets the ANS’s upstream bandwidth. Figure 1 shows the queries sent between the attacker A, a resolver R, and both ANSs. The dashed arrows represent the CNAME pointers between the different domains, while the circled numbers (
—
) represent the order in which they are resolved. The attacker queries the first chain element and forces the resolver to query the target ANS repeatedly.
This provides severe amplification, as a single request by the attacker results in several requests towards the target ANS. For each query by the attacker N queries are sent by the resolver, where N is equal to the minimum of the chain length and a resolver dependent limit. The chain length is controllable by the attacker and effectively unlimited, but resolver implementations limit the maximum recursion depth (see Sect. 4.2). The amplification, as observed by the target ANS, is
, as every second chain record is served by the target ANS.
For illustrative purposes, Fig. 1 just shows a single resolver. In practice, an attacker would likely aim to spread the attack requests to thousands of resolvers, that is, not to overload a single resolver—recall that in our threat model the ANS is the victim (not the resolver). Furthermore, given two domains, an attacker can easily create multiple chains, e.g., by using distinct subdomains for each chain. The number of chains is bound (if at all) only by the number of subdomains supported by the target ANS.
There are no strict requirements for the intermediary ANS. In general, intermediary ANSs can be hosted by a hosting provider, self-hosted by the attacker, or even distributed between multiple hosters. The only exception is that the intermediary and target ANS should be not the same server. Some ANSs will follow CNAME chains if the ANS is authoritative for all domains in the chain. Requiring at least one dedicated intermediary ANSs ensures that only one answer can be returned. If the ANS is configured to only return one CNAME record, the same ANS can be used, doubling the amplification achieved with this attack. On the other extreme, it is perfectly possible to use multiple intermediate ANS, as long as every second element in the chain still points to the target ANS. Distributing the intermediary ANS will increase the reliability and reduce the load for each intermediary ANS, and raise the complexity in preventing the attack.
While the requirements for this attack may seem high, we note that attackers are already known to use complex setups for their operations. One example regarding DNS are fast-flux networks [16] which provide resilience against law-enforcement take-downs and work similar to CDNs. Attackers use fast changing DNS entries to distribute traffic across sometimes hundreds of machines.
3.3 Leveraging DNS Caching
DNS resolvers rely on record caching, such that queries for the same domain do not require additional recursive resolution if the resolver has those records cached. Technically, each resource record contains a Time-to-Live (TTL) value, which specifies how long it may be cached by a resolver, i.e., be answered without querying the ANS. Caching has a large influence on the DNS chaining attack, as it determines how frequent resolvers will query target and intermediary ANSs.
An attacker would aim for two compatible goals. On the one hand, given an attack time span, the target ANS should receive as many queries as possible. This means that caching for those records that are delivered by the target ANS should be ideally avoided. On the other hand, an attacker wants to minimize the number of queries sent to the intermediary ANS, as they would otherwise slow down the overall attack. We discuss both parts individually in the following.
Avoiding Caching at Target ANS: Determining the overall impact on ANS requires an understanding how often each resolver can be used by the attacker during an attack. That is, if all records of a chain are cached, the resolver would not query the target ANS. To solve this problem, attackers can disable caching for records hosted by the target ANS. Specifying a TTL value of zero indicates that the resource should never be cached [32, Sect. 3.2.1]. We assume that resolvers honor a TTL of zero, i.e., do not cache such entries. We evaluate this assumption in Sect. 4.1.
However, we have observed that resolvers implement additional micro-caching strategies to further reduce the number of outgoing queries. A strategy we have typically observed is that resolvers coalesce multiple identical incoming or outgoing requests. If a resolver detects that a given RR is not in the cache, it starts requesting the data from the ANS. Queries by other clients for the same RR may arrive in the meantime. A micro-caching resolver can answer all outstanding client queries at once when the authoritative answer arrives, even if the RR would not normally be cached (i.e., TTL = 0). In our context, such micro-caching might occur if the resolver receives a query for a CNAME record of which the target is not cached, but another query for the same target is already outstanding. Coalescing identical queries thus results in fewer outgoing queries to the ANSs, because a single authoritative reply is used to answer multiple client queries. This reduces the amplification caused by the resolver. Micro-caching is a defense mechanism against cache poisoning attacks which make use of the “birthday attack”, such as the Kaminsky attack [7, 18].
We thus define the per-resolver query frequency as the maximum number of queries per second an attacker can send to a given resolver without any query being answered by caching or micro-caching. It equals the optimal attack speed: Fewer queries would not use the resolver’s full amplification potential, more queries would waste attack bandwidth.
Leveraging Caching at Intermediary ANSs: Recall that every other chain element points to a record hosted by an intermediary ANS. In principle, this would require resolvers to query the intermediary ANS for every second step in the chain, which significantly reduces the frequency in which the target ANS receives queries. However, those records do not change, so we can leverage caching to increase this frequency. By setting a non-zero TTL for the records hosted by the intermediary ANSs, the resolvers only have to fetch the records on the first query of the chain. After the caches are “warmed up”, the resolvers will only fetch the records from the target ANS. The frequency of attack queries is thus largely determined by the round trip time (RTT) between resolver and target ANS. In contrast, the RTT between resolver and intermediary ANS is irrelevant.
3.4 Attack Variant with DNAME Resource Records
One drawback of the CNAME-based attack is, that it requires definitions of records per chain. If an attacker aims to abuse multiple chains in parallel (e.g., to increase the per-resolver query frequency), they have to define dozens of CNAME records. One slight variation of the CNAME-based attack thus uses DNAME records. Using DNAME resource records [5, 43] allows arbitrary many subdomains for the chain with only a single entry. Conceptually, DNAMEs are similar to CNAMEs and are created like CNAME records, e.g., “www.target-ans.com. IN DNAME intermediary.org.”. The difference is that DNAME records allow the ANS to replace the occurrence of the owner (left-hand side) by the target (right-hand side) for all queries to a subdomain of the owner. For example, a query to “a.www.target-ans.com.” would be rewritten to “a.intermediary.org.” with the given rule.
Technically, the answer for a DNAME resource record does not only contain the DNAME resource records. For backwards compatibility, ANSs will create a synthetic CNAME resource record for the exact query domain. Resolvers can also directly support DNAME resource records, providing a better user experience. However, resolvers that lack support for DNAME records fall back using the CNAME records. An attacker can abuse those resolvers to query chains defined with DNAME entries, for simulating an arbitrary number of chains and avoid caching. Those resolvers have to use the synthetic CNAME records to follow the chain. Because the records are synthetically created for the exact query domain, they are indistinguishable from “normal” CNAME records in a zone. This forces the resolver to query the ANS for each newly observed subdomain.
Resolvers that support DNAMEs can use a cached entry to directly answer queries for all subdomains, even if the exact subdomain has never been observed. This improves the resolver’s performance, as only one cache entry has to be stored (compared to many CNAMEs) and authoritative queries only need to be issued, if the DNAME entry expires (compared to once for each new subdomain). This effectively limits the number of simulated chains to one, which falls back to the same properties as the classic CNAME-based chain. Resolvers without DNAME support can be queried as often as permitted by the resolver’s resources, without paying attention to any macro- or micro-caching. Furthermore, handling DNAME queries consumes more resources at the target ANS, as resolvers usually create and send synthetic CNAME records in addition to DNAME records.