The scalability problem with using HTTPS for M2M

In the grand scale of Internet of Things (IoT) and Machine-to-Machine (M2M) communication, it is tempting to use HTTP and specifically HTTPS as a communication protocol between devices and servers. The most prominent reasons for choosing to use the HTTPS protocol are:

  • It is common to use web/application servers for the backend IoT infrastructure and such servers are designed to speak HTTPS; thus the clients are typically required to use HTTPS messages for communication.
  • The infrastructure for the HTTPS protocol is designed to bypass firewalls and proxies. Devices using HTTPS can then piggyback on the HTTPS infrastructure and easily bypass any hinders.
  • HTTP server infrastructure typically includes SSL as a component and ready to use HTTP authentication mechanisms, making it easy to design secure IoT solutions.

A device can use a small HTTP client library together with a small SSL stack, making it very convenient for the device computer programmers to design an HTTPS based IoT protocol that can communicate with any type of backend application/web server. Although a web server can handle HTTPS requests, it cannot typically handle the business logic required for managing the data sent to and from the connected devices, so in most cases an application server is required for the backend infrastructure.

You may think that sending a simple HTTPS message from the device and receiving a simple HTTPS response from the server does not impose a lot of protocol overhead. However, what appears to be a simple HTTPS request sent from the client and simple HTTPS response from the server, is actually a long sequence of commands sent between the client and server. Let's take a look at what goes on behind the scene in a typical HTTPS client request and server response:

HTTPS Handshake

Figure 1 Shows the handshaking sequence for the three protocols: TCP/IP (blue), SSL(red), and HTTPS (green).

As you see from figure 1, what appears from a higher level perspective as a single HTTPS request/response is actually a set of several messages sent on the wire. This imposes a problem in HTTPS based systems that require data sent from the server to the device client since such systems require that the client polls the server for updates. The more frequent the polls, the more load placed on the server. As an example, assume we design a remote control system for opening and closing garage doors.

IoT Garage Door Controller

Here's how a poll based HTTPS system works: Only the client can connect and send asynchronous messages to the server. The server cannot send an unsolicited (asynchronous) message to the client, and it must wait for the client to connect. In our garage door system: the client opens a connection and asks the server for a garage door command, the server responds, and the client closes the connection. This sequence is repeated indefinitely -- i.e. the client will repeatedly ask if it should close the garage door, and the server will respond yes or no. The server can only respond when the client asks.

Figure 2 shows how the microcontroller based garage door controller continuously sends HTTPS poll requests to the server.

A simple web based phone app, connected to the online server, lets the subscriber remotely manage his/her own garage door. Say you forget to close your garage door. You open your phone app(a client), and you click the close door button. The message is immediately sent to the online server. The server finds the corresponding garage door IoT device from the connected phone app's user credentials. The server cannot immediately send the close door control message to the garage door IoT device client. Instead it saves the control command message that it received from the phone and waits for the next poll request from the garage door IoT device client. The close garage door control command will then be sent as the response message to the client poll request.

How long are you prepared to wait for the garage door to close after you press the close button? Probably not that long, and probably not longer than one minute. This means the device poll frequency must be at least in one minute intervals.

HTTP is a stateless protocol where the connection is initiated by a client and the client closes the connection after it receives the response. Persistent HTTP 1.1 connections will not change this behavior since the server cannot push multiple responses for the same request. The server has no option but to wait for the device client to reconnect as part of its poll cycle before it can send a message to the device client.

Now, say that the online server manages 250,000 garage doors, each polling the server for data every minute. The server will then need to handle 250,000 HTTPS request/response pairs every minute or 4 request/response pairs every millisecond. This will obviously create a huge network load, but the big question is if the server can handle that many connection requests. Recall that for each HTTPS request/response pair, we have at a bare minimum 9 messages sent on the wire. This means the server must be able to handle 36 low level messages every millisecond. In addition, the very heavy to compute asymmetric key encryption places a huge computation load on the server. Each HTTPS handshaking requires that the server performs asymmetric key encryption for that particular connection.

Circumventing the Polling Problem by Using Persistent Connections

64K Connection Myth

It's a common misconception that a TCP/IP stack can only handle 64K connections since the port number is 16 bit. This is simply not the case since the complete TCP/IP address, known as a 5-tuple, enables a server to uniquely identify virtually an unlimited amount of connected clients. The 5 tuple is made up of: source IP address, destination IP address, source port number, destination port number, and the protocol in use.

The obvious solution to the problem is to use persistent connections. A high end TCP/IP stack easily handles 250K, or 500K, or even more persistent socket connections. The question is if your web/application server solution can handle as many connections and how much memory the server uses if it supports this many connections. For this reason, selecting a good web/application server solution is very important. We will discuss this in greater detail later.

A standard TCP/IP socket connection is by definition persistent, and it can handle messages sent in both directions at the same time. Creating a custom listen server may not be part of your web/application server infrastructure/API, and a custom non HTTPS based protocol will prevent the client from penetrating firewalls and proxies. What we need is a protocol that starts out as HTTPS and then morphs into a secure persistent socket connection, keeping all the benefits of HTTPS.

The WebSocket Protocol

The SharkSSL stack includes a secure WebSocket client library that enables real-time communication between a device and any WebSocket enabled web/application server solution.

The WebSocket protocol defined in RFC 6455 specifies how a standard HTTP request/response pair can be upgraded to a persistent full duplex connection. When using SSL, it lets you morph a HTTPS connection into a secure persistent socket connection. WebSocket-based applications enable real-time communication just like a regular socket connection. What makes WebSocket unique is that it inherits all of the benefits of HTTPS since it initially starts as a HTTPS request/response pair. This means that you can bypass firewalls/proxies, communicate over SSL, and easily provide authentication.

The History and Complexity of the WebSocket Protocol

When using WebSocket for IoT, the communication is between a device and the server, not between a browser and a server. In other words, no humans are directly involved, and there is no human presentation logic involved. Only control data is sent on the wire.

The WebSocket protocol was designed as a means for HTML5 JavaScript web browser applications to escape the polling problem found in many web applications. It enables browser-based JavaScript applications to open a persistent connection to the server and communicate with the server in real-time. However, the WebSocket protocol implements many security mechanisms designed specifically for JavaScript, and these security mechanisms serve no purpose for non browser based WebSocket clients such as IoT clients.

Designing an Alternative Protocol to WebSocket that is More Suitable for IoT Communication

We can implement a simplified version of the WebSocket protocol for IoT communication with the same transport features since a lot of the security features in the WebSocket protocol are designed exclusively for JavaScript. These security features are simply unnecessary in IoT communication. A simplified WebSocket protocol helps us save precious flash memory in microcontroller based IoT solutions.

Our raw TCP/IP IoT demos can be downloaded from our web site and the demos include the C source code for the simplified WebSocket protocol.

Our IoT demos use a simplified version of the WebSocket protocol. To understand how this works, we need to look at how the HTTPS upgrade sequence works for both protocols. Both the WebSocket protocol and the simplified WebSocket protocol upgrade an HTTPS connection to a secure, persistent, asynchronous, and bi-directional connection. Figure 3 below shows the HTTP request headers and HTTP response headers used by WebSocket protocol and the simplified WebSocket protocol used in the IoT demos.

WebSocket Protocol (RFC 6455) Protocol Used in IoT Demos
Client HTTPS Request
GET /my-server-service HTTPS/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: key-data
Sec-WebSocket-Version: 13
Origin: http://example.com
GET /my-server-service HTTPS/1.1
Host: server.example.com
Server HTTPS Response
HTTPS/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: key-data
The server side logic skips this part and goes directly to the next step, thus an HTTPS client library is not required by the device.
HTTPS switches to socket connection Secure Bi-Directional Connection Secure Bi-Directional Connection

Figure 3 shows the HTTP request/response for upgrading an HTTPS connection to a secure, persistent, asynchronous, and bi-directional connection for the WebSocket protocol and the simplified WebSocket protocol used in the IoT demos.

After upgrading, the WebSocket protocol switches to a frame based protocol. The simplified WebSocket IoT protocol functions similarly to a standard socket connection, except that it is secure -- i.e. uses SSL. A unique feature of the IoT protocol is that it also functions as a frame based protocol, but without using a frame header as the standard WebSocket protocol. The reason for this is that the most common symmetric encryption algorithms, such as AES, are block based, and reading from the socket stream will be in packet chunks of the same size as sent from the peer side. SharkSSL manages the block/packet reading. The benefit to using a frame based protocol is that it simplifies reading control messages on the wire. Both the WebSocket protocol and our simplified WebSocket protocol used in the IoT demos will for this reason behave identically when reading data from the socket stream when using block based symmetric encryption.

As we mentioned above, SharkSSL includes a WebSocket client library. You can use this library as a foundation for designing WebSocket based IoT solutions or you can use the simplified WebSocket library found in our IoT demos. The decision is entirely yours. The benefit with the simplified WebSocket library is that it requires less code and processing in the device. The source code for the IoT demos is also included in the SharkSSL delivery.

Selecting a Low Overhead and Low Memory IoT Backend Application Server

So far, we have covered how to upgrade an HTTPS request/response pair to a secure, persistent, asynchronous, and bi-directional connection on the device side, i.e. the client side. The device clients require that you have a backend server infrastructure that can handle all the connected clients. When using a high number of persistent connections, it is important to select an application server backend infrastructure that can handle a high number of concurrent connections while using little memory and processing overhead per connected client.

Our Barracuda Application Server, when compiled for Linux, handles virtually an unlimited number of persistent connections and is thus an ideal back-end application server for device solutions based on the IoT protocol. The Mako Server, a derivative of the Barracuda Application Server product, is a standalone application server product. The high end version of the Mako Server allows for virtually an unlimited number of connected device clients. See the Mako Server for details.

The following video shows how to use the Mako Server for setting up a secure IoT solution.

Further reading:

Posted in Whitepapers