Protocols & Сlient-Server Architecture
Structured Overview of Protocols and Client-Server Interaction.
Computer Networks
Computer networks are systems of connected computers which is used to share information. Most modern computer networks¹ classified as packet-switched networks.
In such networks, the physical path between two hosts is not established in advance and transmitted data are grouped into packets. Each packet chooses its own way to receive a destination.
Except for packet-switching, there are other types of switching techniques: Circuit Switching and Message Switching.
Circuit Switching networks provide a temporary direct circuit between two devices through switching centers. After the dedicated routing path is established the data is transmitted and nobody can use it until the connection has been closed. It’s useful for establishing a connection which demands long stable connection without delays, such as real-time communication of both voice and data transmission.
Message Switching decides the problem of idle resources when the source station doesn’t have enough data to transmit continuously. In this switching method, no dedicated physical path is established in advance, and nodes are equipped with buffers to hold the incoming message. In the beginning, messages store in the first node. As soon as a free channel is seized, the first node sends a copy of the stored message to the next node on the path through the communication channel, just seized. The message hops from one node to another node until it reaches the destination.
The first electromechanical telecommunication system used message switching for telegrams.
With packets, the bandwidth of the transmission medium can be better shared among users than if the network were circuit-switched. When one user is not sending packets, the link can be filled with packets from other users, and so the cost can be shared.
Network Packet
A network packet is a formatted unit of data carried by a packet-switched network.
A packet consists of two types of data:
- Control information
- User data (payload)
The control information provides data the network needs to deliver the user data. For example, source and destination network addresses, error detection codes, and etc. Typically, control information is found in packet headers and trailers, with payload data in between.
Transferring of different types of data between different types of devices is a challenge. Historically, their solution began to use the principle of layering.
Each layer must decide only one task.
The most know set of layers is the Internet protocol suite. It is commonly known as TCP/IP because the foundational protocols in the suite are TCP and IP.
- The application layer is a top layer, which represents the scope within which applications create user data and communicate this data to other applications on another or the same host. Applications are addressed via ports that essentially represent applications.
Protocols worked on the layer: SMTP, FTP, SSH, HTTP, Telnet, DNS. - The transport layer performs host-to-host communications services for applications, such as connection-oriented communication, reliability, flow control, and multiplexing.
Protocols worked on the layer: TCP, UDP. - The internet layer exchanges datagrams across network boundaries. It provides a uniform networking interface that hides the actual layout of the underlying network connections.
Protocols worked on the layer: IP. - The link layer defines the networking methods within the scope of the local network link on which hosts communicate. This layer includes the protocols used to describe the local network topology and the interfaces needed to affect the transmission of Internet layer datagrams to next-neighbor hosts.
Protocols worked on the layer: ARP, RARP, NDP.
The technical standards underlying the Internet protocol suite and its constituent protocols have been delegated to the Internet Engineering Task Force (IETF). The defining specification of the suite is RFC 1122, which broadly outlines four layers. These have stood the test of time, as the IETF has never modified this structure.
IP protocol
- One-way data transfer between two hosts. Every host has an IP address that differentiates it from other computers in the network.
- Information is transmitted in the form of datagrams. In other words, it’s transmitted in small pieces of data.
- The protocol guarantees the delivery of data packets to the right place, but it is not responsible for the order in which packets are delivered and does not guarantee that the data was received.
Currently, there are two IP versions in use: IPv4 and IPv6:
IPv4
Address structure: [0–255].[0–255].[0–255].[0–255]
, 2⁸ -1 = 255
Address example: 208.80.153.224
— The IP address of the host where located Wikipedia.org
IPv4 has a capacity of just 4.3 billion (25⁶⁴ = 4'294'967'296) IP addresses, only 3.7 are usable by ordinary Internet access devices. The others are used for special protocols like IP multi-casting.
IPv6
IPv6 was created as a solution to the inevitable threat posed by the exhaustion of IPv4 addresses.
Address structure: [0–ffff]:[0–ffff]:[0–ffff]:[0–ffff]:[0–ffff]:[0–ffff]:[0–ffff]:[0–ffff]
, 16⁴-1 = ffff
Address example: 2001:0db8:0000:0000:0000:ff00:0042:8329
After removing all leading zeros: 2001:db8:0:0:0:ff00:42:8329
After omitting consecutive sections of zeros:2001:db8::ff00:42:8329
Networks using IPv6 cannot communicate directly to those, still dominant today, using IPv4.
Domain name conversion to IP address
TCP protocol
- Works over IP protocol.
- Guarantees data delivery.
- Duplex data transfer.
- Link programs with TCP ports.
Initially, TCP managed both datagram transmissions and routing, but as experience with the protocol grew, collaborators recommended division of functionality into layers of distinct protocols. One of the principal designers of the TCP/IP, Jon Postel wrote: “We are screwing up in our design of Internet protocols by violating the principle of layering.”
A TCP port is a number from 0 to 65535 indicated an “address” of a network connection within a single host. Thus, you can support many open connections on one machine. With a port, you can understand which protocol the program uses for data transfer. For example
20,21 — FTP
22 — SSH
25 — SMTP
80 — HTTP
443 — HTTPS
Ports below 1024 require superuser privileges to use.
During the installation of TCP connections, the server and the client exchange data — SYN / ACK packets. Thus, RTT time is required to establish a TCP connection.
RTT (Round trip time) — time travel data from the client to the server and back.
TLS protocol
TLS (formerly SSL) is a cryptographic protocol that provides secure data transfer between hosts. Solves the problem of the TCP protocol, in which data is transmitted in clear form, and attackers can intercept them. If the program uses the HTTPS protocol, the browser establishes a TLS connection over TCP.
- Server authentication. It is used to protect against server address spoofing. For example, when attacking the
tsp-hosts
file on the client computer. - The client signs his message.
- Encryption and compression of transmitted information.
- Anti-spoofing and message integrity checking.
- (Optional) Client Authentication. It is usually used in two-way verification when the client verifies that the server is really who it claims to be and the server does the same in response. Two-way authentication improves security and can also replace authentication using a username and password.
ClientHello
is a message in which the client indicates those connection options that he would like to use: encryption, type of encryption, file compression.ServerHello
is a message in which the server confirms that it agrees with the client and can work with the specified encryption and compression parameters.Certificate
contains a public key for server authentication. The certificate is signed by another certificate belonging to the authorization center.
After the client has verified the certificate, the client and server pass each other a common key for symmetric encryption.
Thus, TLS connection requires a minimum of 2 round trip time. Therefore, a TLS connection is not established for each request.
HTTP protocol
- Transfer of documents (structure) and meta-information.
- Authentication (RFC 7235).
- Session Support. An HTTP session is a sequence of network request-response transactions. An HTTP client initiates a request by establishing a TCP connection. An HTTP server listening waits for a client’s request message. Upon receiving the request, the server sends back a status line and a message of its own.
- Caching (RFC 7234).
- Conditional Requests (RFC 7232).
- Connection management. As we saw earlier, establishing a connection is an expensive procedure. The solution to that issue is HTTP persistent connection, also called HTTP keep-alive. It is the idea of using a single TCP connection to send and receive multiple HTTP requests/responses, as opposed to opening a new connection for every single request/response pair.
Structure of HTTP protocol
- Starting line
Client request
Method URI HTTP/version
Host: name_of_hostServer response
HTTP/version STATUSClient request example
GET https://printbox.io/api/scanners/ HTTP/2.0
Host: printbox.ioServer response example
HTTP/1.1 OK
HTTP/1.1 400 Bad Request
Request methods
There are no required methods for the client and server in the HTTP protocol. If the server doesn’t recognize pointed request method it has to return status 501 (Not implemented)
. But there are recommended to use several methods described in the HTTP specification.
The GET method requests a representation of the specified resource. Requests using GET should only retrieve data and should have no other effect.
The HEAD method asks for a response identical to that of a GET request, but without the response body. It’s usually used to extract metadata, to check a resource existing.
The OPTIONS method returns the HTTP methods that the server supports for the specified URL. To know all server supported method you can sendOPTIONS * HTTP/1.1
The POST method requests that the server accept the entity enclosed in the request as a new subordinate of the web resource identified by the URI.
The PUT method requests that the enclosed entity be stored under the supplied URI. As the POST, if the URI refers to an already existing resource, it is modified, if the URI does not point to an existing resource, then the server can create the resource with that URI.
The PATCH method applies partial modifications to a resource.
Headers
HTTP headers let the client and the server pass additional information with an HTTP request or response. An HTTP header consists of its case-insensitive name followed by a colon (:
), then by its value. Whitespace before the value is ignored.
Custom proprietary headers have historically been used with an X-
prefix, but this convention was deprecated in June 2012 because of the inconveniences it caused when nonstandard fields became standard in RFC 6648;
Client maintain headers
Connection: keep-alive
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0)...
Accept: application/json
Referer: https://printbox.io/ //URL from which the client made a request
Accept-Encoding: gzip, deflate, br
Accept-Language: ru,en
Cookie: ...Server maintain headers
Server: nginx/1.10.3 (Ubuntu)
Date: Sun, 17 Nov 2019 09:07:15 GMT
Content-Type: application/json;charset=utf_8
Transfer-Encoding: chunked
Connection: keep-alive
Allow: GET, HEAD, OPTIONS
Content-Encoding: gzip
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers
A cool article about HTTP and vulnerability protection.👮
Client-Server architecture
The server is a program that performs functions at the request of a client and provides him with access to certain resources. The interaction between the client and the server is carried out using network protocols.
The client works on the user’s computer. The client’s task is to download the document over the network and display it.
Web servers
The web server is a server that receives HTTP requests from clients (usually from browsers) and gives them documents. Usually, the server consists of one master process and many processes with reduced privileges processed incoming requests in a loop (workers). The master process is a root daemon* process that:
- Reads and validates a config.
- Opens sockets and logs.
- Launches and processes child processes (workers).
- Supports the required number of workers.
*Daemon is a program that is not connected to the console or graphical interface, constantly hanging in the memory of the OS.
Suppose we have no workers and everything is done in one thread.
How server works without workers
- The server accepts the incoming connection.
- Part of the time the program code is executed, variables are created for sockets.
- Reading from the socket.
recv
transfer control to the operating system. - The operating system also does its job, in particular, if there is no data in the socket yet, the process goes into standby mode and falls asleep. Therefore, the server can’t serve other clients at this time. It calls blocking Input/Output.
- When data appears, the process wakes up and returns from the system call.
Block I/O Solutions
Variant 1: Many processes — prefork, a pool of workers.
One process = one client.
A process is an instance of a program to which system resources are allocated (processor time, memory). Each process runs in a separate address space: one process cannot access the variables and data structures of another.
- Requires a lot of memory
+ Ease of development - the code differs little from single-threaded
Variant 2: multithreading
A thread uses the same stack space as a process, and many threads share their state data. As a rule, each thread can work (read and write) with the same memory area.
+ Streams are more lightweight structures
- Shared memory - which means hard development, working with locks.
Variant 3: Non-Blocking I/O
In this case, the socket is created with the non-blocking input/output option, therefore, instead of lulling the process, the kernel of the operating system returns control to the server with the special code EAGAIN/E_WOULDBLOCK
. Now, instead of waiting, the server can serve another client. By the time, when data appears in the socket, the server checks the socket again.
Existing servers
Apache is the oldest one, it works in prefork mode — it starts 10–100 workers, and now it has learned how to start threads. Written inC
— therefore quick.
Gunicorn is the server written in python
, with a pre-fork worker model. It’s slow but easy to integrate with python code, used for business logic and does not render static files.
Nginx is asynchronous, written in C
, using a non-blocking input/output server.
Node.js is an asynchronous (non-blocking I/O) server written in javascript
.
Backend/Frontend architecture
Types of requests
- Requests for static files. Static files have a permanent address and persistent content.
- Requests for dynamic files. Dynamic files are created for each request, the content depends on the time and user.
- Requests to the site API — an interface for interacting with other systems, such as a database.
- AJAX requests. AJAX (asynchronous javascript and XML) — an approach consisting of the background exchange of data between the browser and the webserver. JSON / XML is used as a data transfer format.
Of all this, the webserver can independently serve only requests for static files. All other types of requests must somehow use the application logic (business logic). Therefore, the server is usually divided into two:
Frontend server
- Gives static from disk.
- Sends requests (proxies) to backend servers.
- Balancing the load — selects the backend server that is less loaчded
caches backend documents. - Authorizes the client.
- Supports encryption, image slicing, gzip compression.
Backend server (or application server)
It deals with the business logic of the application: generating HTML, data in JSON format, updating SQL and so on.
[1] Original design for the ARPANET was Circuit Switching, but not packet switching.
I hope you found this helpful.
If you’ve learned something new,
please clap 👏 button below so more people can see this