HTTP is a protocol for sending data on the web. Cookies are those things that make websites ask you to click “Accept” before accessing the site content. Sessions are why you probably haven’t seen the Facebook homepage in a long long time.
Even after a few years of making websites professionally, this was my rough level of understanding of these three important tenants of modern internet infrastructure. Eventually I decided it was time to learn about these more deeply. So I began writing.
We’ll explore each one of these in context within the process that unfolds every time you visit a website.
The interent as a set of rules, standards that every computer that participates on the internet must follow to be understood by any other computer.
In some ways, it’s like a language — a set of rules allowing disparate people to communicate. The internet isn’t headquatered anywhere nor owned by anyone. It’s an idea. Hackers couldn’t shut down the internet anymore than terrorists could shut down the Spanish language. The internet we know and love isn’t even the only network like this, but these are beyond th scope of this essay.
This internet has two roles: a client or a server.
- The servers hold data.
- The clients access it.
The access is mediated through a series of interactions called requests and responses. Clients make requests and servers issue responses.
These requests and responses are formatted according to the Hypertext Transport Protocol (HTTP).
HTTP is simple: Each request and response is a piece of data divided into a head and body.
- The head contains pieces of data called headers.
- The body contains data to render or modify the view in the browser.
In some ways this simplicity is good. Simplicity allows very primitive devices to connect to the web. This allows refigerators to send and receive tweets.
One feature of the protocol is that HTTP is “stateless”. What this means is that the protocol has no memory, no native way to refer to a previous request. Servers spend no resources remembering who you are. The only request the server needs to worry about is the last one recieved.
This is efficient for high throughput systems like a popular internet application. So how does it work where sites remember you without a password? HTTP alone can’t do it. You need some simple way to store data between requests. You need HTTP and Cookies.
With Cookies built in to HTTP, web developers can implement a data structure called a Session. Sessions let web servers and web browsers establish identity on the web and support all secured behavior on the web including communication, banking, and social media.
Let’s start with HTTP.
The Hypertext Transfer Protocol (HTTP) defines how computers on the internet talk to each other. It describes a system of clients and servers exchanging messages.
HTTP is a client-server model where clients ask servers for resources and the server replies. HTTP is a text-based protocol and it is readily human readable. It is also highly flexible to developer needs—if you need to relay extra data, just add another line to the header.
One important distinction about HTTP is that it is fundamentally “stateless”— each HTTP message is completely ignorant of earlier messages. As you interact with a site on the web, the server only hears a stream of HTTP requests.
But our experience on the web is decidedly “stateful”. We don’t have to log in on every new page we reach on Twitter. To do this, engineers construct stateful mechanisms on top of HTTP.
Let’s dissect a request. You can fire one of these from a bash terminal with
GET / HTTP/1.1 Host: twitter.com User-Agent: Mozilla/5.0 ...
The first lines define how the request is routed.
- The GET is the HTTP method
- The / is the path
- The HTTP/1.1 is the protocol version.
The lines beyond the first line are called “HTTP Headers”. Headers let the client and the server pass additional information within the request and response. These are very important in web development and we’ll talk more about headers later.
Once the request is received and interpreted, the server will
hopefully eventually promptly respond. The response might look like this:
HTTP/1.1 200 OK Content-Length: 9001 Content-Type: text/html; charset=UTF-8 Date: Tue, 24 Sep 2019 20:30:00 GMT <!DOCTYPE html ...
In the response, the first line is now a protocol and a status. The status is a number and a message from a discrete list of HTTP statuses that encode how the server interpreted the request.
- 1xx - Informational (“Hold on”)
- 2xx - Success (“Here you go”)
- 3xx - Redirection (“Go away”)
- 4xx - Client error (“You messed up”)
- 5xx - Server error (“I messed up”)
- 200 OK - Request succeeded
- 206 Partial Content - Request for specific byte range succeeded
- 301 Moved Permanently - Resource has a new permanent URL
- 302 Found - Resource temporarily resides at a different URL
- 304 Not Modified - Resource has not been modified since last cached
Client Error Codes
- 400 Bad Request - Malformed request
- 401 Unauthorized - Resource is protected, need to authorize
- 403 Forbidden - Resource is protected, denying access
- 404 Not Found - Ya’ll know this one
Server Error Codes
- 500 Internal Server Error - Generic server error
- 502 Bad Gateway - Server is a proxy; backend server is unreachable
- 503 Service Unavailable - Server is overloaded or down for maintenance
- 504 Gateway Timeout - Server is a proxy; backend server responded too slowly
Headers are key-value pairs included in the HTTP message. Some keys have standard meanings but the protocol allows developers to add any data desired. The extensibility allows engineers to implement more complex patterns as well as experimental spec changes without requiring changes to the core protocol.
Useful HTTP request headers
- Host - The domain name of the server (e.g. example.com)
- User-Agent - The name of your browser and operating system
- Referer - The webpage which led you to this page (misspelled)
- Cookie - The cookie server gave you earlier; keeps you logged in
- Range - Specifies a subset of bytes to fetch
- Cache-Control - Specifies if you want a cached response or not
- If-Modified-Since - Only send resource if it changed recently
- Connection - Control TCP socket (e.g. keep-alive or close)
- Accept - Which type of content we want (e.g. text/html)
- Accept-Encoding - Encoding algorithms we understand (e.g. gzip)
- Accept-Language - What language we want (e.g. es)
HTTP/1.1 200 OK Content-Length: 9001 Content-Type: text/html; charset=UTF-8 Date: Tue, 24 Sep 2019 20:30:00 GMT<!DOCTYPE html ..
Useful HTTP response headers
- Date - When response was sent
- Last-Modified - When content was last modified
- Cache-Control - Specifies whether to cache response or not
- Expires - Discard response from cache after this date
- Vary - List of headers which affect response; used by cache
- Set-Cookie - Set a cookie on the client
Useful HTTP response headers
- Location - URL to redirect the client to (used with 3xx responses)
- Connection - Control TCP socket (e.g. keep-alive or close)
- Content-Type - Type of content in response (e.g. text/html)
- Content-Encoding - Encoding of the response (e.g. gzip)
- Content-Language - Language of the response (e.g. ar)
- Content-Length - Length of the response in bytes
Though not directly related to the implementation of a session, there are a few pieces of infrastructure that are critical to understanding how websites work.
TCP is another set of rules that is also involved in computer connections over the internet. It is designed to optimize for reliable, ordered, accurate (error-checked), exchanges of data.
TCP is very foundational to internet communication, typically built out well beneath application code (the kind that web developers like me write) and so I’ll not focus too much on how it works.
The browser parses
www.twitter.com in the address bar. Off the bat, the browser has no idea what twitter.com means. Computers connected to the web are officially recognized by an IP number. So the first step is a DNS lookup to see if the domain name twitter.com matches a known IP address.
The Domain Name System (DNS) is an internet phonebook—a registry between IP addresses and human-friendly domain name. The lookup process involves recursively parsing the domain name with each respective nameserver.
- The client asks DNS Recursive Resolver to lookup a hostname, “twitter.com”
- The DNS Recursive Resolver sends DNS query to Root Nameserver
- Root Nameserver responds with IP address of TLD Nameserver (the “.com” Nameserver)
- DNS Recursive Resolver sends DNS query to TLD Nameserver
- TLD Nameserver responds with IP address of Domain Nameserver (“twitter.com” Nameserver)
- DNS Recursive Resolver sends DNS query to Domain Nameserver
- Domain Nameserver is authoritative, so replies with server IP address.
- DNS Recursive Resolver finally responds to Client, sending server IP address
In practice, this lookup doesn’t happen every time. The result will typically be cached locally in your browser or operating system.
DNS is managed by ICANN, a nonprofit organization that maintains a lot of critical internet infrastructure, including the DNS Nameservers. However, the remote calls to Nameservers in a DNS Lookup can be subverted in a type of security attack called a DNS Hijack.
A common cyber security attack involves an attacker inserting a rogue call in to this DNS lookup process. It’s called a DNS hijack.
In a DNS hijacking, an attacker will change the DNS records of a target to point to an unintended IP address. This has the potential to harm many users because this corrupted record will send every visitor to the attackers desired IP address.
The DNS Lookup can be subverted at many points. Attack vectors include malware on your own machine that changes your local DNS settings as well as compromised DNS resolver, router, or nameserver.
Typically the motivation is phishing for private information or revenue through ads traffic. Recently, his technique has been used to exploit cryptocurrency users.
We have described HTTP, the protocol that allows computers on the web to understand each other. We see how clients and servers send HTTP messages across the web. These messages have headers and bodies, bodies containing data and headers containing data about the body. Cookies are designated headers containing unique strings of data. Servers and browsers use these data to establish identity.
This process is built on top of TCP and made accessible by DNS, systems that enable our browsing experience.
So, what happens when you type a URL and press enter?
- Perform a DNS lookup on the hostname (example.com) to get an IP address (220.127.116.11)
- Open a TCP socket to 18.104.22.168 on port 80 (the HTTP port)
- Send an HTTP request that includes the desired path (/)
- Read the HTTP response from the socket
- Parse the HTML into the DOM
- Render the page based on the DOM7. Repeat until all external resources are loaded:
- If there are pending external resources, make HTTP requests for these (run steps 1-4)
- Render the resources into the page