Before the Internet became a global network connecting millions of devices, it was a simple research experiment connecting a handful of institutions. In the beginning, the number of unique internet addresses could be measured in the tens. As the network expanded that number quickly grew into the hundreds and thousands and it became difficult to remember and type in IP addresses for each of these hosts.

To manage the growing number of network hosts, a simple text file, called HOSTS.txt recorded each host and their IP address. To add your name to the hosts file, you needed to send an e-mail describing the changes you wanted to apply. The authority for the HOSTS.txt file would apply these changes once or twice a week and anyone who wanted to grab the updated list would periodically FTP to the canonical source, grab the latest file, and update their own list of hosts. Naturally, as this small network expanded into, and was eventually replaced by, the Internet, this solution became untenable – there were just too many hosts to keep track of, keep consistent, and to serve from a single canonical file using FTP and manual updates. HOSTS.txt did not scale.

The Domain Name System (DNS) was developed to scale the HOSTS.txt model to the global Internet. The goals for the system were to allow for local administration of portions of the data set while also making changes and updates to local data available to the global Internet. The result is a globally distributed hierarchical database that maps domain names to Internet hosts throughout the world.

The Domain Namespace

The DNS distributed database is an inverted tree indexed by domain names. Taken together, the entire tree is called the domain namespace and represents the entire set of Internet domain names. Like a file system, the tree begins at a root node, inner nodes in the tree help organize hosts into domains, and leaf nodes provide information on a single host. Each node in the tree has a text label describing its portion of a fully qualified domain name. The full domain name for any node is the sequence of labels on the path from that node up to the root of the tree, with a dot separating the text labels along the path. The only restriction to node labels is that siblings in the tree have unique names to guarantee that a domain name uniquely identifies a single node in the tree.

An example will help illustrate the concept.

A simplified domain namespace

Remember that domain names are just indexes into the DNS database. For leaf nodes, the data at the node represents an individual host on the network with information like the network addresses, mail-routing information, or hardware information. Nodes inside the tree can represent both a domain and a particular host. In our example above, the sookocheff.com node represents both the sookocheff domain and it represents the hosts that serve the sookocheff.com site you are currently looking at.

Resource Records and Zone Files

The data indexed by a domain name is called a resource record. There are several types of records for different types of data. For example, there are unique resource records for mail routing and for or host address information. Each record type specifies is own syntax and semantic rules to follow.

The collection of resource records stored by a host are stored in zone files. Every domain that a host knows about is stored in a zone file, and it is these zone files that get distributed across the Internet to form the global distributed DNS database. A zone file is a simple text file that contains the mappings between domain names and IP addresses. DNS nameservers use this zone file to find out which IP address should be contacted when a user requests a particular domain name.

The zone file contains different classes of DNS records. For our purposes, we will focus on the IN record class that defines the set of DNS records for the Internet. All resource records use the following format, regardless of class or type.

host labelttlrecord classrecord typerecord data
Exampleexample.com.60INA104.255.228.125
  • Host Label. A host label defines the hostname of a record and whether the $ORIGIN hostname will be appended to the label. Fully qualified hostnames terminated by a period will not append the origin.
  • TTL. TTL is the amount of time in seconds that a DNS record will be cached by an outside DNS server or resolver.
  • Record Class. There are three classes of DNS records: IN (Internet), CH (Chaosnet), and HS (Hesiod). The IN class is used by the Internet, the other classes are used for alternate networks we won’t discuss here.
  • Record Type. Defines the syntax and semantics for this record.
  • Record Data. The actual data for the record, such as an IP address, hostname, or other information. Different record types will contain different types of record data.

A and AAAA Records

A and AAAA both map a domain to an IP address, with the A record used to map a host to an IPv4 IP address, and an AAAA record used to map a host to an IPv6 address.

The general format of these records is this:

sookocheff  IN      A       IPv4_address
sookocheff  IN      AAAA    IPv6_address

CNAME Records

CNAME records define an alias for an A or AAAA record. For instance, we could have an A name record defining the “sookocheff” host and then use the “www” as an alias for this host:

sookocheff  IN  A       111.111.111.111
www         IN  CNAME   sookocheff

MX Records

MX records are used to define the mail exchanges used by the domain to route email messages addressed to this domain to the appropriate mail server. Unlike many other record types, mail records generally don’t map a host to something, because they apply to the entire zone. As such, MX records are usually defined with no host name at the beginning:

        IN  MX  10   mail.sookocheff.com.

Also note that there is an extra number in the record (10). This is the preference number that helps computers decide which server to send mail to if there are multiple mail servers defined. Lower numbers have a higher priority.

NS Records

This record type defines the name servers that are used for this zone.

You may be wondering, “if a namserver manages the zone file, why do we need to specify a nameserver in the zone file?”. To answer this, we need to think about what makes DNS so successful – it’s distributed database with multiple levels of caching. Fefining nameservers within the zone file is necessary because the zone file may be served from a cached or slave copy of the file on another name server. In this case, you need to reference the master nameserver in the zone file in cases where your cache is old or out of date.

Like the MX records, these are zone-wide parameters, so they do not specify hosts. NS records look like:

        IN  NS     ns1.sookocheff.com.
        IN  NS     ns2.sookocheff.com.

An Example Zone File

The following file provides a full example of a zone file

$ORIGIN sookocheff.com.
@                      3600 SOA   ns1.p30.dynect.net. (
                              zone-admin.dyndns.com.     ; address of responsible party
                              2016072701                 ; serial number
                              3600                       ; refresh period
                              600                        ; retry period
                              604800                     ; expire time
                              1800                     ) ; minimum ttl
                      86400 NS    ns1.p30.dynect.net.
                      86400 NS    ns2.p30.dynect.net.
                      86400 NS    ns3.p30.dynect.net.
                      86400 NS    ns4.p30.dynect.net.
                       3600 MX    10 mail.example.com.
                       3600 MX    20 vpn.example.com.
                       3600 MX    30 mail.example.com.
                         60 A     204.13.248.106
                       3600 TXT   "v=spf1 includespf.dynect.net ~all"
mail                  14400 A     204.13.248.106
vpn                      60 A     216.146.45.240
webapp                   60 A     216.146.46.10
webapp                   60 A     216.146.46.11
www                   43200 CNAME example.com.

In a zone file, $ORIGIN indicates a node in the DNS domain namespace tree. Any labels below the origin will append the origin hostname to assemble a fully qualified hostname. Any label within a record that uses a fully qualified domain terminating with an ending period will not append the origin hostname. For example, by stating $ORIGIN sookocheff.com., any record where the host label field is not followed by a period will have sookocheff.com. will be appended to them. This means that the label mail will be interpreted as mail.sookocheff.com..

The @ symbol is a special label that is simply a short-hand for $ORIGIN. During resolution, the @ symbol will be replaced by example.com..

The $ORIGIN is followed by the zone’s Start Of Authority (SOA) record. A Start Of Authority record is required for each zone. It starts with the primary nameserver of the zone, and is followed by a block of metadata including the e-mail address of the party responsible for administering the domain’s zone file, the current serial number of the zone which should be modified whenever data in the zone file changes, and various timing elements for caching, refresh, and retry.

After the SOA portion of the zone file come the resource records this nameserver knows about defined using the resource types listed in the previous section.

Nameservers and Zones

Each domain namespace is served by a program called a nameserver. Nameservers generally have complete information about some part of the domain namespace, called a zone. The nameserver with this complete information is called the authority for that zone.

The difference between a namespace and a zone is subtle but important. Whereas a domain is the strict labeling of a portion of the namespace, each domain can be broken up into smaller units called zones by delegation. For example, the .ca domain for Canada can be broken up into different zones for each province: gc.ca, ab.ca, on.ca, and so on. Each of these provincial zones can be administered by the provinces using authoratitive nameservers, while the .ca zone would contain the delegation information pointing to the nameservers of each of the delegated provincial zones. The .ca zone does not have to delegate. In some cases, the top-level zone may be the authoritative nameserver for some of the lower-level zones. The following figure, from the 5th Edition of DNS and BIND shows an example division of the .ca domain into multiple zones where some of the zones are delegated to provinces and others are handled by the root .ca zone.

Sample zones for the .ca domain

There are two types of nameservers in DNS: master (*or primary) servers that read zone data from a datafile on the host, and slave (or secondary) servers that read zone data from master or other slave servers. Whenever slave servers start-up, and periodically afterwards, they contact their master server to fetch updated data for their zone. The master server and any secondaries are all considered authoritative for a zone. The data on the servers are simply the resource records that describe the zone stored in a zone file. These records describe all the hosts in the zone and record any delegation points that direct to subdomains.

Resolvers

DNS resolvers are the clients that query for DNS information from a nameserver. These programs run on a host to query a DNS nameserver, interpret the response, and return the information to the programs that request it. In DNS, the resolver implements the recursive query algorithm that traverses the inverted namespace tree until it finds the result for a query (or an error).

Resolvers are only useful when doing DNS resolution, which we cover next.

Resolution: Putting it all Together

As we’ve discussed, the domain namespace is structured as an inverted tree. This structure allows a nameserver to use a single piece of information — the location of the root nameservers — to find any other domain in the tree.

The root nameservers are the authoritative nameservers for all top-level domains. That is, given a query about any domain name, the root nameservers can provide the names and addresses of the authoritative nameservers for the top-level domains. In turn, the top-level nameservers can provide the list of authoritative nameservers for the second-level domains, and so on. In this recursive fashion, every time a nameserver is queried, it will either return the data for the domains it is authoritative for, or it will return information that is closer to the correct answer.

The following diagram from Amazon’s Route 53 documentation gives an overview of how recursive and authoritative DNS services work together to route an end user to your website or application.

An example DNS resolution
  1. A user opens a web browser, enters www.example.com in the address bar, and presses Enter.
  2. The request for www.example.com is routed to a DNS resolver, which is typically managed by the user’s Internet service provider (ISP), such as a cable Internet provider, a DSL broadband provider, or a corporate network.
  3. The DNS resolver for the ISP forwards the request for www.example.com to a DNS root name server. The root name server responds with the authoritative namerservers for the .com top-level domain (TLD)aut
  4. The DNS resolver for the ISP forwards the request for www.example.com again, this time to one of the TLD name servers for .com domains. The name server for .com domains responds to the request with the names of the nameservers that are associated with the example.com domain. In this example, those nameservers are implemented using Amazon Route 53.
  5. The DNS resolver for the ISP chooses an Amazon Route 53 name server and forwards the request for www.example.com to that name server.
  6. The Amazon Route 53 name server looks in the example.com hosted zone datafile for the www.example.com record, gets the associated value, such as the IP address for a web server, 192.0.2.44, and returns the IP address to the DNS resolver.
  7. The DNS resolver for the ISP finally has the IP address that the user needs. The resolver returns that value to the web browser. The DNS resolver also caches (stores) the IP address for example.com so that it can respond more quickly the next time someone browses to example.com.
  8. The web browser sends a request to the IP address that it got from the DNS resolver.
  9. The web server or other resource at 192.0.2.44 returns the web page for www.example.com to the web browser, and the web browser displays the page.

The example resolution we’ve used to convert the www.example.com domain into the 192.0.2.44 IP address is fairly convoluted. To improve access speeds, namservers typically cache query results to help speed up successive queries.

References

This article provides an introduction to DNS. If you want to learn more, there are several great resources to choose from: