A WebSocket is a persistent connection between a client and server. WebSockets provide a bidirectional, full-duplex communications channel that operates over HTTP through a single TCP/IP socket connection. At its core, the WebSocket protocol facilitates message passing between a client and server. This article provides an introduction to the WebSocket protocol, including what problem WebSockets solve, and an overview of how WebSockets are described at the protocol level.
The idea of WebSockets was borne out of the limitations of HTTP-based technology. With HTTP, a client requests a resource, and the server responds with the requested data. HTTP is a strictly unidirectional protocol — any data sent from the server to the client must be first requested by the client. Long-polling has traditionally acted as a workaround for this limitation. With long-polling, a client makes an HTTP request with a long timeout period, and the server uses that long timeout to push data to the client. Long-polling works, but comes with a drawback — resources on the server are tied up throughout the length of the long-poll, even when no data is available to send.
WebSockets, on the other hand, allow for sending message-based data, similar to UDP, but with the reliability of TCP. WebSocket uses HTTP as the initial transport mechanism, but keeps the TCP connection alive after the HTTP response is received so that it can be used for sending messages between client and server. WebSockets allow us to build “real-time” applications without the use of long-polling.
The protocol consists of an opening handshake followed by basic message framing, layered over TCP.
WebSockets begin life as a standard HTTP request and response. Within that request response chain, the client asks to open a WebSocket connection, and the server responds (if its able to). If this initial handshake is successful, the client and server have agreed to use the existing TCP/IP connection that was established for the HTTP request as a WebSocket connection. Data can now flow over this connection using a basic framed message protocol. Once both parties acknowledge that the WebSocket connection should be closed, the TCP connection is torn down.
Establishing a WebSocket connection — The WebSocket Open Handshake
WebSockets do not use the
https:// scheme (because they do
not follow the HTTP protocol). Rather, WebSocket URIs use a new scheme
wss: for a secure WebSocket). The remainder of the URI is the
same as an HTTP URI: a host, port, path and any query parameters.
"ws:" "//" host [ ":" port ] path [ "?" query ] "wss:" "//" host [ ":" port ] path [ "?" query ]
WebSocket connections can only be established to URIs that follow this
scheme. That is, if you see a URI with a scheme of
then both the client and the server MUST follow the WebSocket connection
protocol to follow the WebSocket specification.
WebSocket connections are established by upgrading an HTTP request/response pair. A client that supports WebSockets and wants to establish a connection will send an HTTP request that includes a few required headers:
Connectionheader generally controls whether or not the network connection stays open after the current transaction finishes. A common value for this header is
keep-aliveto make sure the connection is persistent to allow for subsequent requests to the same server. During the WebSocket opening handshake we set to header to
Upgrade, signaling that we want to keep the connection alive, and use it for non-HTTP requests.
Upgradeheader is used by clients to ask the server to switch to one of the listed protocols, in descending preference order. We specify
websockethere to signal that the client wants to establish a WebSocket connection.
Sec-WebSocket-Keyis a one-time random value (a nonce) generated by the client. The value is a randomly selected 16-byte value that has been base64-encoded.
- The only accepted version of the WebSocket protocol is 13. Any other version listed in this header is invalid.
Together, these headers would result in an HTTP GET request from the
client to a
ws:// URI like in the following example:
GET ws://example.com:8181/ HTTP/1.1 Host: localhost:8181 Connection: Upgrade Pragma: no-cache Cache-Control: no-cache Upgrade: websocket Sec-WebSocket-Version: 13 Sec-WebSocket-Key: q4xkcO32u266gldTuKaSOw==
Once a client sends the initial request to open a WebSocket connection, it
waits for the server’s reply. The reply must have an
HTTP 101 Switching
Protocols response code. The
HTTP 101 Switching Protocols response
indicates that the server is switching to the protocol that the client
requested in its
Upgrade request header. In addition, the server must
include HTTP headers that validate the connection was successfully
HTTP/1.1 101 Switching Protocols Upgrade: websocket Connection: Upgrade Sec-WebSocket-Accept: fA9dggdnMPU79lJgAE3W4TRnyDM=
- Confirms that the connection has been upgraded.
- Confirms that the connection has been upgraded.
Sec-WebSocket-Acceptis base64 encoded, SHA-1 hashed value. You generate this value by concatenating the clients
Sec-WebSocket-Keynonce and the static value
258EAFA5-E914-47DA-95CA-C5AB0DC85B11defined in RFC 6455. Although the
Sec-WebSocket-Key andSec-WebSocket-Accept` seem complicated, they exist so that both the client and the server can know that their counterpart supports WebSockets. Since the WebSocket re-uses the HTTP connection, there are potential security concerns if either side interprets WebSocket data as an HTTP request.
After the client receives the server response, the WebSocket connection is open to start transmitting data.
The WebSocket Protocol
WebSocket is a framed protocol, meaning that a chunk of data (a message) is divided into a number of discrete chunks, with the size of the chunk encoded in the frame. The frame includes a frame type, a payload length, and a data portion. An overview of the frame is given in RFC 6455 and reproduced here.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-------+-+-------------+-------------------------------+ |F|R|R|R| opcode|M| Payload len | Extended payload length | |I|S|S|S| (4) |A| (7) | (16/64) | |N|V|V|V| |S| | (if payload len==126/127) | | |1|2|3| |K| | | +-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - + | Extended payload length continued, if payload len == 127 | + - - - - - - - - - - - - - - - +-------------------------------+ | |Masking-key, if MASK set to 1 | +-------------------------------+-------------------------------+ | Masking-key (continued) | Payload Data | +-------------------------------- - - - - - - - - - - - - - - - + : Payload Data continued ... : + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + | Payload Data continued ... | +---------------------------------------------------------------+
I won’t cover every piece of the frame protocol here. Refer to RFC 6455 for full details. Rather, I will cover the most important bits so that we can gain an understanding of the WebSocket protocol.
The first bit of the WebSocket header is the Fin bit. This bit is set if this frame is the last data to complete this message.
RSV1, RSV2, RSV3 Bits
These bits are reserved for future use.
Every frame has an opcode that determines how to interpret this frame’s payload data.
|0x00||This frame continues the payload from the previous frame.|
|0x01||Denotes a text frame. Text frames are UTF-8 decoded by the server.|
|0x02||Denotes a binary frame. Binary frames are delivered unchanged by the server.|
|0x03-0x07||Reserved for future use.|
|0x08||Denotes the client wishes to close the connection.|
|0x09||A ping frame. Serves as a heartbeat mechanism ensuring the connection is still alive. The receiver must respond with a pong.|
|0x0a||A pong frame. Serves as a heartbeat mechanism ensuring the connection is still alive. The receiver must respond with a ping frame.|
|0x0b-0x0f||Reserved for future use.|
Setting this bit to 1 enables masking. WebSockets require that all payload be obfuscated using a random key (the mask) chosen by the client. The masking key is combined with the payload data using an XOR operation before sending data to the payload. This masking prevents caches from misinterpreting WebSocket frames as cacheable data. Why should we prevent caching of WebSocket data? Security.
During development of the WebSocket protocol, it was shown that if a compromised server is deployed, and clients connect to that server, it is possible to have intermediate proxies or infrastructure cache the responses of the compromised server so that future clients requesting that data receive the incorrect response. This attack is called cache poisoning, and results from the fact that we cannot control how misbehaving proxies behave in the wild. This is especially problematic when introducing a new protocol like WebSocket that has to interact with the existing infrastructure of the internet.
Payload len field and
Extended payload length field are used to
encode the total length of the payload data for this frame. If the payload
data is small (under 126 bytes), the length is encoded in the
len field. As the payload data grows, we use the additional fields to
encode the length of the payload.
As discussed with the
MASK bit, all frames sent from the client to the
server are masked by a 32-bit value that is contained within the frame.
This field is present if the mask bit is set to 1 and is absent if the
mask bit is set to 0.
Payload data includes arbitrary application data and any extension
data that has been negotiated between the client and the server.
Extensions are negotiated during the initial handshake and allow you to
extend the WebSocket protocol for additional uses.
Closing a WebSocket connection — The WebSocket Close Handshake
To close a WebSocket connection, a closing frame is sent (opcode
In addition to the opcode, the close frame may contain a body that
indicates the reason for closing. If either side of a connection receives
a close frame, it must send a close frame in response, and no more data
should be sent over the connection. Once the close frame has been received
by both parties, the TCP connection is torn down. The server always
initiates closing the TCP connection.
This article provides an introduction to the WebSocket protocol, and covers a lot of ground. However, the full protocol has more detail than what I could fit in to this blog post. If you want to learn more, there are several great resources to choose from: