An Introduction to WebSockets

A WebSocket is a persistent connection between a client and server. WebSockets provide a bidirectional, full-duplex communications channel that operates over HTTP through a single TCP/IP socket connection. At its core, the WebSocket protocol facilitates message passing between a client and server. This article provides an introduction to the WebSocket protocol, including what problem WebSockets solve, and an overview of how WebSockets are described at the protocol level.

Why WebSocket?

The idea of WebSockets was borne out of the limitations of HTTP-based technology. With HTTP, a client requests a resource, and the server responds with the requested data. HTTP is a strictly unidirectional protocol — any data sent from the server to the client must be first requested by the client. Long-polling has traditionally acted as a workaround for this limitation. With long-polling, a client makes an HTTP request with a long timeout period, and the server uses that long timeout to push data to the client. Long-polling works, but comes with a drawback — resources on the server are tied up throughout the length of the long-poll, even when no data is available to send.

WebSockets, on the other hand, allow for sending message-based data, similar to UDP, but with the reliability of TCP. WebSocket uses HTTP as the initial transport mechanism, but keeps the TCP connection alive after the HTTP response is received so that it can be used for sending messages between client and server. WebSockets allow us to build “real-time” applications without the use of long-polling.

Protocol Overview

The protocol consists of an opening handshake followed by basic message framing, layered over TCP.

RFC 6455 - The WebSocket Protocol

WebSockets begin life as a standard HTTP request and response. Within that request response chain, the client asks to open a WebSocket connection, and the server responds (if its able to). If this initial handshake is successful, the client and server have agreed to use the existing TCP/IP connection that was established for the HTTP request as a WebSocket connection. Data can now flow over this connection using a basic framed message protocol. Once both parties acknowledge that the WebSocket connection should be closed, the TCP connection is torn down.

Establishing a WebSocket connection — The WebSocket Open Handshake

WebSockets do not use the http:// or https:// scheme (because they do not follow the HTTP protocol). Rather, WebSocket URIs use a new scheme ws: (or wss: for a secure WebSocket). The remainder of the URI is the same as an HTTP URI: a host, port, path and any query parameters.

"ws:" "//" host [ ":" port ] path [ "?" query ]
"wss:" "//" host [ ":" port ] path [ "?" query ]

WebSocket connections can only be established to URIs that follow this scheme. That is, if you see a URI with a scheme of ws:// (or wss://), then both the client and the server MUST follow the WebSocket connection protocol to follow the WebSocket specification.

WebSocket connections are established by upgrading an HTTP request/response pair. A client that supports WebSockets and wants to establish a connection will send an HTTP request that includes a few required headers:

  • Connection: Upgrade
    • The Connection header generally controls whether or not the network connection stays open after the current transaction finishes. A common value for this header is keep-alive to make sure the connection is persistent to allow for subsequent requests to the same server. During the WebSocket opening handshake we set to header to Upgrade, signaling that we want to keep the connection alive, and use it for non-HTTP requests.
  • Upgrade: websocket
    • The Upgrade header is used by clients to ask the server to switch to one of the listed protocols, in descending preference order. We specify websocket here to signal that the client wants to establish a WebSocket connection.
  • Sec-WebSocket-Key: q4xkcO32u266gldTuKaSOw==
    • The Sec-WebSocket-Key is a one-time random value (a nonce) generated by the client. The value is a randomly selected 16-byte value that has been base64-encoded.
  • Sec-WebSocket-Version: 13
    • The only accepted version of the WebSocket protocol is 13. Any other version listed in this header is invalid.

Together, these headers would result in an HTTP GET request from the client to a ws:// URI like in the following example:

GET ws://example.com:8181/ HTTP/1.1
Host: localhost:8181
Connection: Upgrade
Pragma: no-cache
Cache-Control: no-cache
Upgrade: websocket
Sec-WebSocket-Version: 13
Sec-WebSocket-Key: q4xkcO32u266gldTuKaSOw==

Once a client sends the initial request to open a WebSocket connection, it waits for the server’s reply. The reply must have an HTTP 101 Switching Protocols response code. The HTTP 101 Switching Protocols response indicates that the server is switching to the protocol that the client requested in its Upgrade request header. In addition, the server must include HTTP headers that validate the connection was successfully upgraded:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: fA9dggdnMPU79lJgAE3W4TRnyDM=
  • Connection: Upgrade
    • Confirms that the connection has been upgraded.
  • Upgrade: websocket
    • Confirms that the connection has been upgraded.
  • Sec-WebSocket-Accept: fA9dggdnMPU79lJgAE3W4TRnyDM=`
    • Sec-WebSocket-Accept is base64 encoded, SHA-1 hashed value. You generate this value by concatenating the clients Sec-WebSocket-Key nonce and the static value 258EAFA5-E914-47DA-95CA-C5AB0DC85B11 defined in RFC 6455. Although the Sec-WebSocket-Key and Sec-WebSocket-Accept` seem complicated, they exist so that both the client and the server can know that their counterpart supports WebSockets. Since the WebSocket re-uses the HTTP connection, there are potential security concerns if either side interprets WebSocket data as an HTTP request.

After the client receives the server response, the WebSocket connection is open to start transmitting data.

The WebSocket Protocol

WebSocket is a framed protocol, meaning that a chunk of data (a message) is divided into a number of discrete chunks, with the size of the chunk encoded in the frame. The frame includes a frame type, a payload length, and a data portion. An overview of the frame is given in RFC 6455 and reproduced here.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len |    Extended payload length    |
|I|S|S|S|  (4)  |A|     (7)     |             (16/64)           |
|N|V|V|V|       |S|             |   (if payload len==126/127)   |
| |1|2|3|       |K|             |                               |
+-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
|     Extended payload length continued, if payload len == 127  |
+ - - - - - - - - - - - - - - - +-------------------------------+
|                               |Masking-key, if MASK set to 1  |
+-------------------------------+-------------------------------+
| Masking-key (continued)       |          Payload Data         |
+-------------------------------- - - - - - - - - - - - - - - - +
:                     Payload Data continued ...                :
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
|                     Payload Data continued ...                |
+---------------------------------------------------------------+

I won’t cover every piece of the frame protocol here. Refer to RFC 6455 for full details. Rather, I will cover the most important bits so that we can gain an understanding of the WebSocket protocol.

Fin Bit

The first bit of the WebSocket header is the Fin bit. This bit is set if this frame is the last data to complete this message.

RSV1, RSV2, RSV3 Bits

These bits are reserved for future use.

opcode

Every frame has an opcode that determines how to interpret this frame’s payload data.

Opcode value Description
0x00 This frame continues the payload from the previous frame.
0x01 Denotes a text frame. Text frames are UTF-8 decoded by the server.
0x02 Denotes a binary frame. Binary frames are delivered unchanged by the server.
0x03-0x07 Reserved for future use.
0x08 Denotes the client wishes to close the connection.
0x09 A ping frame. Serves as a heartbeat mechanism ensuring the connection is still alive. The receiver must respond with a pong.
0x0a A pong frame. Serves as a heartbeat mechanism ensuring the connection is still alive. The receiver must respond with a ping frame.
0x0b-0x0f Reserved for future use.

Mask

Setting this bit to 1 enables masking. WebSockets require that all payload be obfuscated using a random key (the mask) chosen by the client. The masking key is combined with the payload data using an XOR operation before sending data to the payload. This masking prevents caches from misinterpreting WebSocket frames as cacheable data. Why should we prevent caching of WebSocket data? Security.

During development of the WebSocket protocol, it was shown that if a compromised server is deployed, and clients connect to that server, it is possible to have intermediate proxies or infrastructure cache the responses of the compromised server so that future clients requesting that data receive the incorrect response. This attack is called cache poisoning, and results from the fact that we cannot control how misbehaving proxies behave in the wild. This is especially problematic when introducing a new protocol like WebSocket that has to interact with the existing infrastructure of the internet.

Payload len

The Payload len field and Extended payload length field are used to encode the total length of the payload data for this frame. If the payload data is small (under 126 bytes), the length is encoded in the Payload len field. As the payload data grows, we use the additional fields to encode the length of the payload.

Masking-key

As discussed with the MASK bit, all frames sent from the client to the server are masked by a 32-bit value that is contained within the frame. This field is present if the mask bit is set to 1 and is absent if the mask bit is set to 0.

Payload data

The Payload data includes arbitrary application data and any extension data that has been negotiated between the client and the server. Extensions are negotiated during the initial handshake and allow you to extend the WebSocket protocol for additional uses.

Closing a WebSocket connection — The WebSocket Close Handshake

To close a WebSocket connection, a closing frame is sent (opcode 0x08). In addition to the opcode, the close frame may contain a body that indicates the reason for closing. If either side of a connection receives a close frame, it must send a close frame in response, and no more data should be sent over the connection. Once the close frame has been received by both parties, the TCP connection is torn down. The server always initiates closing the TCP connection.

More References

This article provides an introduction to the WebSocket protocol, and covers a lot of ground. However, the full protocol has more detail than what I could fit in to this blog post. If you want to learn more, there are several great resources to choose from:

See also

comments powered by Disqus