How Does the Internet Work?

Very well.

When I was a kid, there was an old man in our town who was a magician. He sometimes did shows for kids at a local church. One trick I remember was when he poured water into a small paper cup. Then he poked a long, pointy sword (a foil) through the cup sideways. He held out the sword, very slowly tipped the cup over… and the water was gone! Whenever he finished a show, he would ask “Would you like to know how I did that?” and the kids would all yell Yeah! Tell us! And his answer was “Very well, thank you.”

The internet is like a good magic trick. You’d probably never guess how it works unless somebody told you, and when it works well you should never accidentally see behind the scenes. Also like magic, it took decades of work and effort to get it polished and perfected. The only difference between the people who build the internet and you is practice (and having somebody who tells you the secrets).

What is it, sort of?

The internet is a network of networks. A network is computers sharing information. Computers can be connected together with a switch or hub to make a local network (also called a local area network or LAN). The switch can send information to a router, then the router can send it to another far away router which is connected to a computer or another local network. The router in your house usually has the switch build in.

To be able to find the remote computer, your router needs at minimum to know the IP address of the other router. An IP address is just a string of numbers divided by a separator. Since numbers like that are hard to remember, the other computer is usually referred to by a domain name, like www.wikipedia.org. When you type a domain name in a browser, your computer still needs to know the IP address it represents. To find it, it contacts a service called a domain name system, or DNS, to get the IP address corresponding to the domain name. The browser caches it for later so it doesn’t have to look it up every time.

The information usually doesn’t travel straight from your computer to the remote one; it has to pass through a network of routers to the destination and back. It doesn’t always follow the same path.

There are still many steps and many layers of protocol involved before you can share useful information.

Parts of a URL

The complete recipe for a URL is: scheme + domain + port + path + query string.

A URL is usually made up of two to five parts. The first part is the scheme. This tells the browser what protocol to use; usually HTTP or HTTPS. A protocol is an agreed-on standard for interacting. The second part is the domain. This is the address of the other computer you’re trying to connect to – for example, www.wikipedia.org. The third part is the port number. It isn’t always needed; if it isn’t included then a default value is determined by which scheme is used. The port is added to the end of the URL, like this: www.wikipedia.org:80. The next part is the path. It specifies the resource on the remote computer that you want to access. For example, in https://www.google.com/maps, the /maps is the path. It looks like a file directory but works differently. The last part is the query string. One of the uses for this is sending data entered into a form on a web page. The query string starts with a ?, followed by a key/value pair separated by a =. For example, in https://www.google.com/search?q=launchschool, the ?q=launchschool is the query string, where q is the search query and launchschool is the value of the query. Some characters are invalid or unsafe to use in a query string, such as a space or & or =. They need to be URL-encoded using valid characters. For example, if spaces are typed in an internet search, the spaces are converted to either + or %20 in the query string. Any ASCII character can be URL encoded. 

More about IP addresses

An IP address is the unique address that every computer connected to the internet is assigned. For one computer to talk to another across a network, its IP address must be known.  IP addresses currently come in two formats: IPv4 and IPv6. In IPv4, an address is a 32-bit value divided into four 8-bit parts. In decimal it’s represented by four numbers 0-255, like this: 192.168.0.1. In the newer IPv6, it’s a 128-bit value, divided into eight groups of 4-digit hexadecimal numbers, like this: 1050:0:0:0:5:600:300c:326b.

An IP address only tells you what machine to look for. If you want to send data to a specific process or multiple processes on the same machine, you need a way to specify them. This is done with a port number. Ports are represented by a 16-bit number, and so are numbered 0-65535. They’re appended to an IP address like this: 192.168.0.1:80. The IP address and port together are called a socket. Programs or processes can create socket objects which are used to communicate with other processes, on either a remote computer or the same one.

More about DNS

When you type a domain name in your browser, it needs to look up the corresponding IP address. First it checks the browser cache; if it isn’t there, it contacts the domain name system. The DNS is a distributed database used to look up which IP address corresponds to which domain names. If the DNS can’t find the IP address, it returns a DNS error. This could be because you typed the wrong domain name or it doesn’t exist.

The first step in the DNS is the DNS recursive resolver. It might be hosted by your internet provider or wireless carrier. The resolver contacts a DNS root server; there are many around the word. The root server returns the address of the top-level domain, or TLD, server for that domain. The TLD server is responsible for domains ending with a given extension, such as .com, .net, .org or .edu. Then it contacts the TLD DNS server to get the address of the domain’s nameserver. The nameserver has the IP addresses of many domains. Then it contacts the nameserver to get the IP address of the domain. Lastly, the resolver returns the IP address to the browser. Then the browser can access the site.

Not so fast

The physical network is the system of routers and cables and the electrical, optical or radio signals they transmit. The physical network is concerned with identifying and addressing other devices and sending binary data from one device to another.

Several layers of communication protocols are built on top of this system. Each of these layers takes time to process and transmit the data being sent. Latency and bandwidth are two measures of that time. 

The first, latency, is the amount of time data takes to be transmitted from one device to another. There are several types of latency:

  • Propagation – The time it takes the data to physically travel across the network. It’s ultimately limited by the speed of light.
  • Transmission – The amount of time it takes the data travel along a given path through the network. The path can be adjusted to account for things like congestion.
  • Processing – The routers along the network don’t just bounce data from one to another – they do a certain amount of processing on each unit of data they receive.
  • Queuing – If any device along the network path receives more data than it can process then it can cause congestion.

Latency is usually measured in milliseconds.

The other important characteristic of the physical network is bandwidth. Bandwidth is the amount of data a device can send or receive in a unit of time. The bandwidth is determined by the types of devices and the types of infrastructure, such as copper cables, fiber optics and radio/cellular. Of those three, fiber optics has the most bandwidth because it transmits signals using much higher frequencies than the others. 

Depending on the application, minimizing latency or maximizing bandwidth might be more important.

How do your computer and one where a website lives agree to talk to each other?

Once your browser requests a website, it and the server need to agree how to share information. The general framework for this agreement is called the client-server model. In the client-server model, the server is a computer that stores web pages or other information and the client requests that information, usually over a network. (A client and server can exist on the same machine, but that’s for another article.) The client and server also need to agree on many other things like the protocol and security methods.

First we’ll look at what usually goes on in a server. 

A server has three primary components:

  • Web server – this responds to requests for data that don’t need to be processed – images, text, HTML, CSS, JS, etc.
  • Application server – this responds to requests that require processing, like dynamically building web pages. This is where the business logic lives.
  • Data store – this could be a database or document or photo archive. The application server can pull information from here to serve to the client.

There can be (and usually are) several other components, such as load balancers and caches, in addition to these three. There are also usually multiple copies of each server in case one fails. All these servers can be separate physical machines or can be software on the same machine, depending on many factors.

Next is the client. It can be a browser or terminal using a program like CURL. There are many protocols that the client and server can user to communicate, but the most common for accessing websites and web-based apps is hyper-text transfer protocol, or HTTP. HTTP is a text-based protocol and, by itself, doesn’t provide encryption or other security.

Request and Response

Each exchange of information by HTTP is made up of a request and response. The request is sent by a client, and the server replies with a response. When a browser requests a web page, for example, it sends a request to the server, which specifies the web page or other resource. It can also send authentication info. When the server receives the request, it replies with an HTTP response. The response body will include the website data the browser requested.

What does an HTTP request look like?

An HTTP request is a message in plain text and is made of several parts – the request line, the headers and the body. The request line and headers are required but the body is optional. The request line is made up of the request method, the URI and the HTTP version. The headers define things like the character set and language to use, the address and port number of the resource being requested, and any session ids or cookies being sent. The request also includes an optional body. 

The request method is a verb and defines the purpose of the request. For example, the request could be to only access something on the server or to send or change something. There are currently nine possible request methods. If the client wants to access data but change nothing, it usually sends a GET request. If the client wants to send data, for example in a web form, it can use a POST request. Updating existing data is usually done with a PUT request.

What does an HTTP response look like?

When the server receives the request, it replies with an HTTP response. The first part of the response is the status line. It has a status code and the HTTP version. The status code is a three-digit number. The response can also include headers, which define info like the date, server name and connection type, and a body, which is any data (such as a web page) requested. The status line is required but the rest is optional. 

The first digit of the status code is a number 1-5 and represents the category of the response, such as ‘successful’ or ‘server error.’ A status code of 200 means the request was successful. The code 404 means the requested resource wasn’t found – for example, a typo in a domain name. 302 means the resource has moved, usually resulting in a re-direct. 500 represents a generic server error – this could be from server code causing an exception. Other things that can be in the response header are the name of the server, content type and content length. 

HTTP has no memory

An HTTP request knows nothing about any previous request. It is referred to as being ‘stateless.’ ‘State’ roughly means that an application remembers what happened in previous requests and uses that to inform future responses. It can only respond to exactly the request it receives. This means that every HTTP request has to include all the information necessary to process the request, such as session info, logins and security. So how does your online banking know that you typed in your username and password earlier when you click to see your statement? Why don’t you have to type your credentials every time you click something?

This model of stateless requests and responses using request methods to define specific actions is called REST, for REpresentational State Transfer. In a stateless architecture like REST, the server application is stateless and the client stores all the information needed to simulate state. On the client side, state can be simulated using session ids, cookies and AJAX. 

A session identifier, or session id, is a piece of data that the server sends to the client as a way to keep track of each client and simulate a persistent connection. The client sends the session id with each request, and the server checks each request to see if it includes a session id and whether it’s valid.

A cookie is a type of session id; it’s a small file sent by the server as part of its response. It stores the session and authentication info for the client. The client includes the cookie with each HTTP request. This lets the client remember things like login status or items in a shopping cart, for example. A cookie used for authentication usually expires after a set amount of time.

AJAX stands for Asynchronous Javascript and XML. This involves one kind of code, Javascript, sending and receiving HTTP requests, then updating other code, the HTML making up the webpage. This lets a website send requests and receive responses without needing to refresh the page. Getting notifications on your facebook page would be an example of AJAX.

The next sections dig a little deeper. 

HTTP is how your browser talks to a web server. But how does your laptop or phone talk to your (it is yours, right?) router? How does your router talk to other routers? How do you prove who you are and not let somebody else spend your money?

Protocols within protocols

HTTP is a higher-lever protocol that makes it easy to send text, HTML and photos. Lower-level protocols, though, are the ones closer to the physical devices. They’re how the hardware talks to each other.

An analogy for thinking about this is to think about buying a small item online. It will be shipped in a small box with your name and address on it. Many of those small boxes will be put into bigger boxes with your post office’s address on it. Then many of those big boxes are put into a truck with your town’s name on it. Each of those services can operate independently – the trucking company, the post office and your online store are distinct services, but the products of one service are carried in another.

In the same way, units of information in one protocol are bundled, or encapsulated, into the next protocol layer. 

In the Internet Protocol Suite model, also called the TCP/IP model, there are four protocol layers: Application, Transport, Internet, and Link.

The highest level, application, is the level of HTTP and other protocols. It carries data which programs use to communicate. When you’re using a browser or most apps, you’re interacting with the application layer.

The next layer is the transport layer. The common protocols here are TCP and UDP. This layer establishes the connection between computers which the application layer uses.

The next layer down is the internet layer. This layer is responsible for sending data between networks. It’s the layer of IP addresses. It establishes which other machines can be connected to and it hides them and their details away behind an IP address. This layer established the sockets which the transport layer can address.

The lowest layer is the link or data layer. At this layer the physical devices are talking to each other. The most important protocols at this layer are Ethernet and Wi-Fi. These are the protocols which devices, like your phone or laptop, use to communicate with switches or routers.

The unit of data a given protocol uses is called a protocol data unit, or PDU. A PDU in the internet layer is called a packet.

Let’s look at what goes on in these lower layers. We’ll start at the transport layer since the application layer has been covered in the HTTP sections.

The transport layer

TCP and UDP are the protocols used in the transport layer. UDP is simple and can be used for transmitting lots of data when perfect integrity and delivery aren’t needed. It’s better when being delivered fast is more important than fidelity. In contrast is TCP, which provides many mechanisms for security, in-order delivery, retrying and more.

More about TCP

Some features of TCP are being connection-based, having flow control, congestion avoidance, and retransmission.

A PDU in TCP is called a segment. Similar to an HTTP request, a TCP segment has headers which specify things like source and destination port, flags, checksum and many others.

TCP is a connection-based protocol. This means there’s a process for the sender and receiver to exchange a request and acknowledgement to establish a connection before data is shared. This is called the three-way handshake. It’s called this because the sender first sends a SYN message, the receiver responds with a SYN ACK, then the sender responds with an ACK message (SYN and ACK are flags in the TCP segments). The sender application can start sending data after the ACK is sent. The receiver application can send data after the ACK is received. A three-way handshake (with different flags) is also used to end a connection.

The next thing TCP provides is flow control. Flow control is meant to keep the sender from sending more data than the receiver can process. The receiver and sender each indicate how much data it’s able to accept using the WINDOW field in the TCP header.

While flow control keeps the sender and receiver from overwhelming each other, it doesn’t keep them from overwhelming the network between them. This is handled by TCP’s congestion avoidance mechanism. Since routers have to process all the IP layer packets they relay, such as to make a checksum, get the source & destination address and calculate the route, they can only handle so much data at once. If the router’s buffer is full, then any more packets sent to it will be lost. If packets are being lost because of congestion and data is being retransmitted then TCP slows down the transmission rate.

While the lower layers (link and internet) have CRC (cyclic redundancy check) checks to detect when PDUs get corrupted, they don’t have ways to recover lost data. TCP introduces ways to ensure that all the sent segments have arrived; they’re retransmitted if they get lost.

A drawback of TCP is the latency involved in establishing a connection. If the connection is broken, a new connection has to be established before more data is sent. Another issue, not unique to TCP, is head-of-line blocking. Since TCP segments are sent in order, if one segment is delayed then the ones after it will be also.

More about UDP

While TCP provides flow control, congestion avoidance, retransmission and in-order delivery, UDP is much simpler. It doesn’t have any of those. 

The PDU in UDP is called a datagram. The UDP datagram still encapsulates data similar to TCP, but the header is only made up of source and destination port, length and checksum.

Unlike TCP, UDP is a connectionless protocol. A connection doesn’t need to be established before the sender can start sending data. It avoids the latency of the three-way handshake. Head-of-line blocking is avoided since UDP doesn’t provide for in-order delivery. When speed and low latency are needed but perfect data transmission isn’t, for example in video or audio streams or video games, UDP might be a better choice.

Depending on the application, it might be necessary to implement some of the features UDP doesn’t provide.

The Link layer

At the lowest layer is the link layer. The protocol used at this layer, for wired networks, is Ethernet. 

The important aspects of ethernet protocol are switching and framing.

While computers on a network are assigned an IP address which can be changed, every network-enabled device has a hard-wired address, called the media access control or MAC address, that is unique to it. Like an IP address, the MAC address is string of numbers. The switch keeps track of which MAC addresses are assigned to each physical ethernet port. 

In the ethernet protocol, a PDU is called a frame. An ethernet frame is an encapsulation of the PDUs of the next higher layer, the internet or network layer. A frame includes a header which stores the source and destination MAC addresses. The switch sends the frames to the correct devices according to their destination MAC address.

Ethernet frames have error checking for each frame, but no checking for lost frames. 

Security

This last section is about security. 

HTTP is inherently insecure. That’s partly because messages are sent in plain text. Unless they’re protected somehow, there would be no way to prevent somebody from intercepting and reading them. 

The main way to handle this is by encrypting the messages. For web traffic, this is done at the transport layer, and hence is called transport layer security, or TLS. TLS supersedes SSL, or secure socket layers, but the terms tend to be used interchangeably.

TLS provides three security services to HTTP: encryption, authentication, and integrity.

Encryption 

To establish an encrypted connection, TLS uses a handshake procedure after the TCP three-way handshake is finished. First the client requests a secure connection to the server and includes a cipher suite, which is a list of encryption methods it can use, and a random string of bits (called the client secret). The server responds with its chosen encryption method, its encryption certificate, and a different random string of bits (called the server secret). The client verifies the server’s certificate with a certificate authority (explained later). Then the client extracts the server’s public key from the certificate it received, generates a new random string of bits (called the premaster secret), encrypts that with the server’s public key, then sends it to the server. The server decrypts it with its private key.

At this point, the client and server both have the client secret, server secret, and premaster secret. The client and server each use those to calculate a session key. Then the client and server send each other a message encrypted using the session key. If the server can decrypt the client’s message, and vice versa, then the handshake was successful and they can now freely communicate.

Since the session key is shared by the client and server, they can both use it to encrypt and decrypt messages. This is called symmetric key encryption.

When the client first requests to establish a secure connection, and the server responds with its certificate containing its public key, the client uses the server’s public key to send it a message, which only the server can decrypt. This is called asymmetric encryption because it uses two keys: the public key, which anyone can use to encrypt a message, and the private key, which is kept secret and is needed to decrypt and read the message. The public and private keys are generated and have to be used together.

Thus the client and server use an asymmetric cipher to establish a symmetric cipher.

Authentication

When an encrypted connection is established, the server sends the client a certificate as part of the TLS handshake to use for further communications. The certificate is digitally signed by the server. The signature is made by encrypting a hash of a part of the message with the server’s private key. The client receives the certificate and verifies the signature by decrypting the hash with the server’s public key. If the client’s hash of the message matches the one from the server, then the message is authentic.

But what if you’re connected to a fake website, like some people click on in a phishing email? 

The solution is to use a trusted chain of authority to establish authenticity. The key that the server uses to sign the certificate has to be issued by an intermediary, called a certificate authority, or CA, which is an organization that verifies that the server is who it says it is. 

How do you know you can trust the CA? 

The certificates that the intermediate CA issues in turn have to be signed by a root CA. There are a small number of root CAs who closely guard their root certificates, and can revoke them if they’re misused. The browser or operating system has a stored list of certificates from root or trusted CAs. When a browser receives a certificate from a server, it checks that it was signed at some point by one of the trusted CAs. If so, it establishes a secure connection. If not, it might show a message about the connection not being secure.

Integrity

Similar to the checksums used for PDUs in other protocols, TLS provides  a message authentication code, or MAC, as part of its encapsulation. The purpose of the MAC is to ensure that the message hasn’t been tampered with or altered. The MAC is made from a hash of the message which is encrypted with the symmetric key being used. 

It doesn’t stop there, though

Since HTTP authentication data is stored on the client, for example as a session id or TLS certificate, that makes it prone to other security threats. For example, if your security certificate was copied, somebody else might be able to read your messages or login to your account. Stealing somebody’s session id is called session hijacking.

There are several effective ways to prevent this. Having the session id expire after a set amount of time is one. Another is invalidating the session id when a new one is issued. Another thing that can be done is generating a session id only after establishing a secure connection. That way the session id will be encrypted.

Another important way to prevent session hijacking is to enforce a same-origin policy. This allows unlimited requests coming from the same origin, meaning the same scheme, hostname and port, but restricts access to a different origin. What it mainly restricts is API calls (requests to other servers) to other origins. Things like links, redirects, and embedded scripts and images are generally allowed. An example of what this can prevent is if a webpage is hacked and contains malicious code, when you load it, the malicious code can’t contact a server other than  the one it was loaded from. 

A last vulnerability of HTTP is cross-site scripting, or XSS. This is when HTML, Javascript, or other code is entered into a text box on a web page. The browser might interpret the code and execute it. One way to prevent this is by sanitizing input, meaning not allowing code to be entered into a text field. Another way is escaping the input, meaning storing it as plain text and not valid code. That’s similar to what this comic refers to.

If you made it to here, you either thought this was very interesting or you like looking at words you don’t understand. This was a high-level overview with some deeper nosedives here and there. Any of the concepts mentioned have books worth of detail written about them. This should have at least given you a vague idea of how the magic behind the internet works. It’s all devices and code built by people who spent a long time learning how it works.

It’s also much more complicated than a sponge.