Part 1- Overview of the Internet and Networking
In the previous article, we learned about the hybrid consensus mechanism of AuxPoW + DPoS employed by Elastos and showed how it secures the Elastos Blockchain. We will now move on to the next pillar of the Elastos ecosystem: the Elastos Carrier. The Elastos Carrier is a completely decentralized peer-to-peer network. The Carrier network is serverless, provides end-to-end encryption, and uses Blockchain-issued DIDs to facilitate authenticated connections. All data transfer between participants in the Elastos ecosystem is conveyed through the Carrier network. The Elastos Carrier is vital to the Elastos ecosystem and can be viewed as the Web component of the Smart-Web. When we visualize a fully adopted Smart-Web, the Carrier network will be acting as the “workhorse,” ensuring the connectivity between thousands of Decentralized Applications and self-contained networks of sidechain projects. To fully grasp and appreciate Elastos Carrier, a general understanding of computer networking and the Internet is required. The purpose of this article is to give you the information and tools to fully conceptualize Elastos Carrier and realize its technological advantages.
Disclaimer: Not every piece of information provided in this article is directly related to the Elastos Carrier. However, having a broad understanding of the Internet is useful in realizing the many benefits of the Carrier and it makes for more fruitful and informed discussions.
A computer network is a group of two or more computers linked together via a medium, such as ethernet cables or WiFi. These links allow for data exchange, and resource-sharing between connected devices. Resource-sharing could be anything from sharing documents to sharing peripherals such as printers. Over time, a variety of computer networks have been developed to meet specific needs, or have evolved due to the release of newer technologies. As such, a framework for categorizing computer networks was established.
Example of Classifications
To outline the classification process, let’s take a brief look at a few networks and how they’ve gained their titles.
Local Area Networks
A local area network (LAN) is a group of computers and associated devices that share a common communications line or wireless link to a server or router. Typically, a LAN encompasses computers and peripherals within a small geographic area, such as a home, office, or commercial establishment. Computers or other devices connected to a LAN can share resources such as a printer or network storage.
Here is a list of common network classifications:
1. Metropolitan Area Network (MAN)
MANs are made up of multiple interconnected LANs and span entire geographic areas, such as towns or cities. Ownership and maintenance is typically handled by a company or local council.
2. Wide Area Network (WAN)
A WAN connects LANs across long physical distances. WANs allow computers to communicate over one large network even when they are many miles apart. Because of a WAN’s vast reach, it is typically owned and maintained by multiple administrators or the public.
This provides a perfect segway into understanding what the internet is. When we think of the internet, we sometimes think of some arcane source of energy in the sky that magically communicates what we want displayed on our devices. However, the internet is nothing more than a combination of LANs, WANs, MANs, and other network classifications working together to seamlessly transmit and share data. In a phrase, the Internet is a network of networks. In fact, the term, “Internet,” is shorthand for internetworking.
Remember the Local Area Network? Well without the internet, devices in a Local Area Network would only be able to communicate with devices within that network. What allows a user from within one LAN to communicate with the various other networks throughout the world are routers, switches, and various other networking devices that understand and speak the language of the TCP/IP protocol.
Now that we have established a general picture of what the internet is, we can move on to understanding some of the nuts and bolts that make it work.
The TCP/IP Protocol
There are two primary protocols within TCP/IP that allow for communication across networks. These protocols should be easy to remember, as the suite is named after them!
Transmission Control Protocol (TCP) and Internet Protocol (IP) combine to make “TCP/IP.” Another name for TCP/IP is the “Internet Protocol Suite,” which is the phrase used in official
Internet standards documents.
The goal of the TCP/IP suite is simple: provide methods and standards for exchanging data from a host in one network to a host in another network. A host is any computer or device connected to the Internet. Data is sent over the internet by dividing it into small segments called network packets. The TCP/IP stack consists of four layers which ultimately act to segment data into tiny packets and subsequently transport them from one host to another.
The layers of the TCP/IP stack are as follows:
- Application layer
- Transport Layer
- Internetwork Layer (Network Layer for short)
- Network Interface & Hardware Layer (sometimes called Physical Layer)
Before we go on to explain the layers, it is useful to think of the stack through an analogy with a familiar protocol. The Postal Service “protocol” is in many ways very similar to the Internet Protocol suite. Let’s detail the journey of an item that is created, packaged, and sent from one person to another through the Postal Service. Also assume that the recipient lives in an apartment complex.
Postal Service Protocol:
- Contents (Application Layer): The first thing you need to send a package is something to send (who would have thought?)
- Packaging (Transport Layer): Once you have something to send, you package the item and label it with a destination address and apartment number. You might also put a return address on the package. Then the package is put into a Postal Service mailbox.
- Pick-up/Routing (Network + Hardware Layer): A Postal Service worker will then pick up the package and drive it to a local distribution center. Throughout its journey to the destination address, the package may go through many intermediate facilities that route it to the next until it reaches the local Post Office of the destination address. The package will then be delivered to the destination address and the Postal Service worker will put it into the mailbox with the correct apartment number.
- Receiving/Unpackaging (Transport Layer): The recipient will receive the package from the mailbox and unpackage it.
- Receive and Use the item (Application Layer): The recipient will receive and enjoy the shipped item.
Now let’s detail the journey of data from a process running in host A to a process running in host B:
- Contents (Application Layer): The data is first created by some process running in the application layer of host A.
- Packaging (Transport Layer): The data is then “passed down” to the transport layer, where it is broken down into segmented transport packets. There are various transport layer protocols that have their own rules. The destination port number is contained in these packets, which distinguishes which process the data is meant for on the receiving host. The port number is similar to the apartment number in the previous analogy.
- Pick-up/Routing (Network+Physical Layer): The transport packets are then passed down to the network layer, where they are wrapped in network layer packets. These packets contain the IP address of the destination host, which is similar to the destination address on a piece of mail. Using this address, the packets are routed and forwarded through a network of routers, switches, and links that make up the network and physical layers. This is analogous to the routing of a package through various distribution centers, where the physical means of transportation are cars, trucks, and planes. The packets will then arrive at the destination IP address, at which point the packets are passed up to the transport layer at the end host.
- Unpackaging (Transport Layer): Packets will then be “unwrapped.” Using the destination port contained in the transport layer packets, the data can be entered into the correct destination port. The destination port is like the numbered mailbox of the recipient in the apartment complex.
- Content (Application Layer): The data is then delivered to the correct process on the end host associated with the destination port.
The journey data takes from host A to host B looks roughly like the following:
Each layer is abstracted from the ones surrounding it. Each end host must have all the protocols at its disposal to create and interpret any packet sent using the TCP/IP suite. The devices at intermediate destinations that forward and route packets do not need to contain transport or application layer protocols. A packet leaving a host would look something like this:
Assume that you are trying to send a file to a friend. How exactly would this work under the hood? From you and your friend’s point of view, the file is sent over the internet, and arrives at its destination in its entirety. The details are all abstracted and to the user it seems very simple. In reality, there are many complex protocols all working together to get that file to your friend. In the TCP/IP Suite, the Transport Layer is responsible for receiving data from the application layer, breaking it into small packets, packaging it, and then passing it down to the network layer. So the file would come from some process running in the application layer such as FTP (file transfer protocol), at which point a transport protocol would break it down into many small packets. Each transport layer packet has a header which contains important information responsible for getting data to the correct process once it reaches the destination host. Exactly how these packets are packaged depends upon the specific transport protocol used. The two main transport layer protocols are the Transmission Control Protocol (TCP) and User Datagram Protocol (UDP).
Transmission Control Protocol (TCP)
TCP is what is called a connection-oriented protocol. With the TCP protocol, before any data can be transferred between two hosts, a stable connection must first be established. TCP provides reliable data transfer, congestion and flow control, and ordered delivery. It is for these reasons that TCP is the most widely used transport protocol.
The TCP “Three Way Handshake”
As stated, TCP is a connection-oriented protocol. This means that before data is sent, a connection must be established between the two hosts that are trying to communicate. In the case of sending a file to your friend, before you can begin transmitting the file, a stable connection must first be established. TCP creates stable connections by doing what’s called a “Three Way Handshake.” Hosts wishing to connect with one another must go through a process of sending synchronization (SYN) and acknowledgement (ACK) messages to each other.
- First, host A will send a synchronization message. This message can be likened to saying, “Hey, you there?”
- Once host B receives the SYN message, it will send both an ACK and SYN message, which can be likened to saying, “Hey, I got your SYN packet and I’m ready for the next one!”
- Finally, host A send its own acknowledgement message back to host B, which if successful, means the connection has been established.
After the connection has been established, the data transfer phase begins. After all the data is transmitted, the connection is terminated. With TCP, two hosts stay connected for the duration of the transfer. This is known as opening and maintaining a “session.”
Packet Sequencing and Retransmission
When packets from the transport layer are “shipped” by passing them to the network layer, there is no guarantee these packets will ever reach their destination. If they do, they can still arrive out of order. TCP uses packet sequence numbers to identify the correct ordering of the packets being sent so that they can be reordered correctly if they arrive out of order. Any unreceived packets will be retransmitted by the sender.
TCP Pros and Cons
TCP is a very useful protocol. It guarantees that all packets are delivered to applications in the correct order, and with minimal loss. However, there are some cons. While transferring data over TCP is very reliable, it puts a heavier load on host devices and takes up more bandwidth because the protocol forces hosts to constantly monitor the connection and the data going across. The connection process itself introduces latency and bandwidth costs. The downsides of TCP can be summarized as follows:
- Network Intensive Protocol – Because TCP must constantly monitor the status of a session, it can be quite stressful on a network. This is especially true in applications that rely on quickly sending data packets over somewhat unreliable connections, such as streaming services. Luckily, the rate at which data can be transferred has increased exponentially since the inception of the internet, so this is not as big of an issue as it used to be. However, with increasing internet speeds come larger, more demanding applications. Thus, TCP will likely continue to have limitations with various applications regardless of increases in bandwidth. This is an inherent tradeoff in the design of the protocol.
- Too Many Rules – “Windowing” is the process by which TCP gauges how much data it can send at a time to a destination host. If the connection between the source host and the destination host is stable, TCP will continue to send a continuous stream of data to the destination host without issues or breakages. However, because TCP needs to keep an open session between two devices to continue sending TCP Packets, it must restart the process of the three way handshake to resume sending packets if the session is interrupted in any-way (i.e. if internet drops for a very short while). This introduces a wide range of issues, as it relates to video streaming services and other services where any breakages in the data stream can produce a reduction in the quality of service (QoS).
While all the rules of TCP allow for reliable connections between hosts, It is clear that TCP is not ideal for every application. There is a need for a protocol that can distribute data packets as fast as possible, without worrying as much about stability and packet loss. As it relates to packet distribution, the less rules that need to be checked off, the faster packets can be sent from host to host. This should already begin to give good insight into why UDP is a good alternative to TCP for particular use cases.
User Datagram Protocol
Unlike the connection-oriented protocol that is TCP, UDP is a connectionless protocol. It is often referred to as a “bare-bones” transport protocol, as it provides the minimum amount of features. The UDP protocol is basically a less cumbersome version of TCP but at the same time, it’s also less reliable. UDP does not sequence the data packets, and thus does not care about the order in which segments arrive at the destination host. UDP also makes no attempt to retransmit any packets that are dropped in the network layer. Consequently, UDP is considered to be an unreliable, connectionless, and stateless protocol. There is no handshaking process and hosts do not need to keep track of any session information. However, because UDP skips all the extra rules and processes, it has significantly lower overhead. UDP packets can be constructed and transmitted at significantly faster rates than TCP packets. As it relates to UDP, it is important to know that this method of transmission does not provide any guarantee that the data you send will ever reach its destination. Drawing comparison to the Postal Service analogy, UDP is like placing mail in your mailbox and blindly hoping the Postal Service gets it to the proper location. Most of the time they do, but sometimes it gets lost along the way. If it does get lost, you’re out of luck and it’s gone forever.
UDP Flow Control & Packet Segmentation
As we have learned, as result of UDP’s absence of any sort of flow control, handshaking process, packet sequencing, or error checking, it is able to disperse packets at much higher speeds than TCP. However, one might wonder how we still use UDP without such seemingly vital features. Well, innovations upon the UDP protocol are made at the application layer. Developers can implement their own methods of introducing sequencing and flow control at the bottom of the application level instead of at the transport layer level. This can give UDP some of the reliability of TCP, while still trying to maintain the low overhead and speed of the User Datagram Protocol. The benefit is that developers can use the bare-bones UDP protocol and tailor it to the needs of their specific application. This is the case for Elastos Carrier, as we will see later on.
- Connectionless – UDP is a connectionless and unreliable transport protocol. There is no default function to ensure dropped packets are retransmitted, or that data is received in the same order as it was sent. These features must be added at the application level, which can be cumbersome to implement and manage.
- Error Control – UDP does not use any error control. If UDP detects any error in the received packet, it silently drops it.
In general, there is a trade-off between reliability and speed with regard to TCP and UDP. TCP offers reliability at the cost of speed, while UDP offers speed at the cost of reliability. Whether TCP or UDP is used depends upon the needs of the application. As of now, more than 90% of applications on the internet use TCP, as it is easier to work with due to its reliability. However, this does not mean that UDP is not important, and in fact, UDP is gaining ground on TCP. Google’s UDP based protocol QUIC now accounts for 35% of its outbound traffic, and that will probably grow in the coming years. The increase in UDP’s popularity is likely due to its flexibility, in that it can be augmented at the application layer to fit certain needs, while maintaining a higher transmission speed. The Elastos Carrier runs mostly atop UDP for its secure end-to-end encrypted communications (~80%UDP, ~20%TCP). As we will see, the Carrier protocol uses its own methods for reliable data transfer over UDP.
Internetwork and Hardware Layer
In our analogy with the Postal Service, the Transport Layer is where packaging takes place, and the Internetwork Layer is where shipping occurs. The Internetwork layer is what allows hosts within different networks to communicate with each other. The Network Layer is analogous to the various distribution centers that the Postal Service uses to route packages from one point to another. The Physical Layer can be likened to the roads, trucks, and planes which represent the physical means of getting a package from one distribution center to the next. In place of roads, trucks, and planes, the underlying hardware-based infrastructure of the internet is composed of many interconnected links, routers, and switches. In this analogy, the cables and links are the roads and the trucks and routers and switches are the distribution centers.
The Internet Protocol (IP)
When transport packets are passed down to the Network Layer, the IP determines how these packets are routed to the destination host. Without the Internet Protocol, hosts would not be able to communicate with external networks, which is the essence of the Internet. The IP protocol allows hosts in LANs, MANs, and WANs to communicate with other hosts in completely different networks, both logically and geographically. Fundamental to the IP protocol is the IP address.
What is an IP Address?
Every machine on the Internet is assigned a unique number called an IP address. Without a unique IP address assigned to your machine, you will not be able to communicate with other devices on the Internet. An IP Address can be considered a unique badge number that identifies a device on the internet. In our Postal Service analogy, the IP address is like the address of a home, office, or commercial establishment. After the transport protocol breaks data into small packets, they are put into IP datagrams which contains the IP address of the destination host, among other details. This IP datagram is passed to the nearest router within the host’s own network, called the gateway router. This router can then forward the packet to the other routers it is connected to. We won’t dive into the specifics of the protocol here, but the packet is more or less continuously forwarded across various routers’ switches based on the destination IP address. Each packet can take a completely different route, and have a unique arrival time. As such, packets can arrive out of order. Eventually, the packets will reach the gateway router of the destination network, at which point they will be delivered to the end host.
The IP protocol is a best-effort delivery protocol, meaning that it gives no guarantee that a packet will not be lost along the way. It is also similar to UDP in the sense that it is connectionless. The Internet Protocol does not care about the order in which these packets arrive at the end host. All it cares about is getting the packets from one host to another through the optimal path. Some networking technicians refer to TCP and UDP packets as “riding” the IP protocol. It is up to the transport protocols of the internet to account for out of order delivery, packet loss, and flow control.
To be continued in part 2…
Charles Coombs-Esmail[u/C00mbsie on reddit]
Amos Thomas[Famous Amos on youtube]
Michael Ekpo[adeshino on discord]
Eric Coombs Esmail