Before the Internet became a global network connecting millions of devices, it was a simple research experiment connecting a handful of institutions. In the beginning, the number of unique internet addresses could be measured in the tens. As the network expanded that number quickly grew into the hundreds and thousands and it became difficult to remember and type in IP addresses for each of these hosts.
To manage the growing number of network hosts, a simple text file, called
HOSTS.txt recorded each host and their IP address. To add your name to the
hosts file, you needed to send an e-mail describing the changes you wanted to
apply. The authority for the
HOSTS.txt file would apply these changes once or
twice a week and anyone who wanted to grab the updated list would periodically
FTP to the canonical source, grab the latest file, and update their own list of
hosts. Naturally, as this small network expanded into, and was eventually
replaced by, the Internet, this solution became untenable – there were just too
many hosts to keep track of, keep consistent, and to serve from a single
canonical file using FTP and manual updates.
HOSTS.txt did not scale.
The Domain Name System (DNS) was developed to scale the
HOSTS.txt model to the
global Internet. The goals for the system were to allow for local administration
of portions of the data set while also making changes and updates to local data
available to the global Internet. The result is a globally distributed
hierarchical database that maps domain names to Internet hosts throughout the
The Domain Namespace
The DNS distributed database is an inverted tree indexed by domain names. Taken together, the entire tree is called the domain namespace and represents the entire set of Internet domain names. Like a file system, the tree begins at a root node, inner nodes in the tree help organize hosts into domains, and leaf nodes provide information on a single host. Each node in the tree has a text label describing its portion of a fully qualified domain name. The full domain name for any node is the sequence of labels on the path from that node up to the root of the tree, with a dot separating the text labels along the path. The only restriction to node labels is that siblings in the tree have unique names to guarantee that a domain name uniquely identifies a single node in the tree.
An example will help illustrate the concept.
Remember that domain names are just indexes into the DNS database. For leaf
nodes, the data at the node represents an individual host on the network with
information like the network addresses, mail-routing information, or hardware
information. Nodes inside the tree can represent both a domain and a particular
host. In our example above, the
sookocheff.com node represents both the sookocheff
domain and it represents the hosts that serve the
sookocheff.com site you are
currently looking at.
Resource Records and Zone Files
The data indexed by a domain name is called a resource record. There are several types of records for different types of data. For example, there are unique resource records for mail routing and for or host address information. Each record type specifies is own syntax and semantic rules to follow.
The collection of resource records stored by a host are stored in zone files. Every domain that a host knows about is stored in a zone file, and it is these zone files that get distributed across the Internet to form the global distributed DNS database. A zone file is a simple text file that contains the mappings between domain names and IP addresses. DNS nameservers use this zone file to find out which IP address should be contacted when a user requests a particular domain name.
The zone file contains different classes of DNS records. For our purposes, we
will focus on the
IN record class that defines the set of DNS records for the
Internet. All resource records use the following format, regardless of class or
|host label||ttl||record class||record type||record data|
- Host Label. A host label defines the hostname of a record and whether the $ORIGIN hostname will be appended to the label. Fully qualified hostnames terminated by a period will not append the origin.
- TTL. TTL is the amount of time in seconds that a DNS record will be cached by an outside DNS server or resolver.
- Record Class. There are three classes of DNS records: IN (Internet), CH (Chaosnet), and HS (Hesiod). The IN class is used by the Internet, the other classes are used for alternate networks we won’t discuss here.
- Record Type. Defines the syntax and semantics for this record.
- Record Data. The actual data for the record, such as an IP address, hostname, or other information. Different record types will contain different types of record data.
A and AAAA Records
AAAA both map a domain to an IP address, with the
A record used to
map a host to an IPv4 IP address, and an
AAAA record used to map a host to an
The general format of these records is this:
sookocheff IN A IPv4_address sookocheff IN AAAA IPv6_address
CNAME records define an alias for an
AAAA record. For instance, we
could have an
A name record defining the “sookocheff” host and then use the “www”
as an alias for this host:
sookocheff IN A 22.214.171.124 www IN CNAME sookocheff
MX records are used to define the mail exchanges used by the domain to route
email messages addressed to this domain to the appropriate mail server. Unlike
many other record types, mail records generally don’t map a host to something,
because they apply to the entire zone. As such,
MX records are usually defined
with no host name at the beginning:
IN MX 10 mail.sookocheff.com.
Also note that there is an extra number in the record (
10). This is the
preference number that helps computers decide which server to send mail to if
there are multiple mail servers defined. Lower numbers have a higher priority.
This record type defines the name servers that are used for this zone.
You may be wondering, “if a namserver manages the zone file, why do we need to specify a nameserver in the zone file?”. To answer this, we need to think about what makes DNS so successful – it’s distributed database with multiple levels of caching. Fefining nameservers within the zone file is necessary because the zone file may be served from a cached or slave copy of the file on another name server. In this case, you need to reference the master nameserver in the zone file in cases where your cache is old or out of date.
MX records, these are zone-wide parameters, so they do not specify
NS records look like:
IN NS ns1.sookocheff.com. IN NS ns2.sookocheff.com.
An Example Zone File
The following file provides a full example of a zone file
$ORIGIN sookocheff.com. @ 3600 SOA ns1.p30.dynect.net. ( zone-admin.dyndns.com. ; address of responsible party 2016072701 ; serial number 3600 ; refresh period 600 ; retry period 604800 ; expire time 1800 ) ; minimum ttl 86400 NS ns1.p30.dynect.net. 86400 NS ns2.p30.dynect.net. 86400 NS ns3.p30.dynect.net. 86400 NS ns4.p30.dynect.net. 3600 MX 10 mail.example.com. 3600 MX 20 vpn.example.com. 3600 MX 30 mail.example.com. 60 A 126.96.36.199 3600 TXT "v=spf1 includespf.dynect.net ~all" mail 14400 A 188.8.131.52 vpn 60 A 184.108.40.206 webapp 60 A 220.127.116.11 webapp 60 A 18.104.22.168 www 43200 CNAME example.com.
In a zone file,
$ORIGIN indicates a node in the DNS domain namespace tree. Any
labels below the origin will append the origin hostname to assemble a fully
qualified hostname. Any label within a record that uses a fully qualified domain
terminating with an ending period will not append the origin hostname. For
example, by stating
$ORIGIN sookocheff.com., any record where the host label field is
not followed by a period will have
sookocheff.com. will be appended to them.
This means that the label
@ symbol is a special label that is simply a short-hand for
During resolution, the
@ symbol will be replaced by
$ORIGIN is followed by the zone’s Start Of Authority (
SOA) record. A
Start Of Authority record is required for each zone. It starts with the primary
nameserver of the zone, and is followed by a block of metadata including the
e-mail address of the party responsible for administering the domain’s zone
file, the current serial number of the zone which should be modified whenever
data in the zone file changes, and various timing elements for caching, refresh,
SOA portion of the zone file come the resource records this
nameserver knows about defined using the resource types listed in the previous
Nameservers and Zones
Each domain namespace is served by a program called a nameserver. Nameservers generally have complete information about some part of the domain namespace, called a zone. The nameserver with this complete information is called the authority for that zone.
The difference between a namespace and a zone is subtle but important. Whereas a
domain is the strict labeling of a portion of the namespace, each domain can be
broken up into smaller units called zones by delegation. For example, the
domain for Canada can be broken up into different zones for each province:
on.ca, and so on. Each of these provincial zones can be
administered by the provinces using authoratitive nameservers, while the
zone would contain the delegation information pointing to the nameservers of
each of the delegated provincial zones. The
.ca zone does not have to
delegate. In some cases, the top-level zone may be the authoritative nameserver
for some of the lower-level zones. The following figure, from the 5th Edition of
DNS and BIND shows an
example division of the
.ca domain into multiple zones where some of the zones
are delegated to provinces and others are handled by the root
There are two types of nameservers in DNS: master (*or primary) servers that read zone data from a datafile on the host, and slave (or secondary) servers that read zone data from master or other slave servers. Whenever slave servers start-up, and periodically afterwards, they contact their master server to fetch updated data for their zone. The master server and any secondaries are all considered authoritative for a zone. The data on the servers are simply the resource records that describe the zone stored in a zone file. These records describe all the hosts in the zone and record any delegation points that direct to subdomains.
DNS resolvers are the clients that query for DNS information from a nameserver. These programs run on a host to query a DNS nameserver, interpret the response, and return the information to the programs that request it. In DNS, the resolver implements the recursive query algorithm that traverses the inverted namespace tree until it finds the result for a query (or an error).
Resolvers are only useful when doing DNS resolution, which we cover next.
Resolution: Putting it all Together
As we’ve discussed, the domain namespace is structured as an inverted tree. This structure allows a nameserver to use a single piece of information — the location of the root nameservers — to find any other domain in the tree.
The root nameservers are the authoritative nameservers for all top-level domains. That is, given a query about any domain name, the root nameservers can provide the names and addresses of the authoritative nameservers for the top-level domains. In turn, the top-level nameservers can provide the list of authoritative nameservers for the second-level domains, and so on. In this recursive fashion, every time a nameserver is queried, it will either return the data for the domains it is authoritative for, or it will return information that is closer to the correct answer.
The following diagram from Amazon’s Route 53 documentation gives an overview of how recursive and authoritative DNS services work together to route an end user to your website or application.
- A user opens a web browser, enters www.example.com in the address bar, and presses Enter.
- The request for www.example.com is routed to a DNS resolver, which is typically managed by the user’s Internet service provider (ISP), such as a cable Internet provider, a DSL broadband provider, or a corporate network.
- The DNS resolver for the ISP forwards the request for www.example.com to a DNS root name server. The root name server responds with the authoritative namerservers for the .com top-level domain (TLD)aut
- The DNS resolver for the ISP forwards the request for www.example.com again, this time to one of the TLD name servers for .com domains. The name server for .com domains responds to the request with the names of the nameservers that are associated with the example.com domain. In this example, those nameservers are implemented using Amazon Route 53.
- The DNS resolver for the ISP chooses an Amazon Route 53 name server and forwards the request for www.example.com to that name server.
- The Amazon Route 53 name server looks in the example.com hosted zone datafile for the www.example.com record, gets the associated value, such as the IP address for a web server, 192.0.2.44, and returns the IP address to the DNS resolver.
- The DNS resolver for the ISP finally has the IP address that the user needs. The resolver returns that value to the web browser. The DNS resolver also caches (stores) the IP address for example.com so that it can respond more quickly the next time someone browses to example.com.
- The web browser sends a request to the IP address that it got from the DNS resolver.
- The web server or other resource at 192.0.2.44 returns the web page for www.example.com to the web browser, and the web browser displays the page.
The example resolution we’ve used to convert the
www.example.com domain into
192.0.2.44 IP address is fairly convoluted. To improve access speeds,
namservers typically cache query results to help speed up successive queries.
This article provides an introduction to DNS. If you want to learn more, there are several great resources to choose from:
- What is DNS?
- DNS and BIND, 5th Edition
- An Introduction to DNS Terminology, Components, and Concepts
- RFC 1035 Domain Names - Implementation and Specification