The scalability problem with using HTTPS for M2M

(The scalability problem with using REST for IoT)

See the live example at the end of the of this article.

In the grand scale of Internet of Things (IoT) and Machine-to-Machine (M2M) communication, it is tempting to use HTTP and specifically HTTPS as a communication protocol between devices and servers. The most prominent reasons for choosing to use the HTTPS protocol are:

  • It is common to use web/application servers for the backend IoT infrastructure and such servers are designed to speak HTTPS; thus the clients are typically required to use HTTPS messages for communication.
  • The infrastructure for the HTTPS protocol is designed to bypass firewalls and proxies. Devices using HTTPS can then piggyback on the HTTPS infrastructure and easily bypass any hinders.
  • HTTP server infrastructure typically includes SSL as a component and ready to use HTTP authentication mechanisms, making it easy to design secure IoT solutions.

A device can use a small HTTP client library together with a small SSL stack, making it very convenient for the device computer programmers to design an HTTPS based IoT protocol that can communicate with any type of backend application/web server. Although a web server can handle HTTPS requests, it cannot typically handle the business logic required for managing the data sent to and from the connected devices, so in most cases an application server is required for the backend infrastructure.

You may think that sending a simple HTTPS message from the device and receiving a simple HTTPS response from the server does not impose a lot of protocol overhead. However, what appears to be a simple HTTPS request sent from the client and simple HTTPS response from the server, is actually a long sequence of commands sent between the client and server. Let's take a look at what goes on behind the scene in a typical HTTPS client request and server response:

HTTPS Handshake

Figure 1 Shows the handshaking sequence for the three protocols: TCP/IP (blue), SSL(red), and HTTPS (green).

As you see from figure 1, what appears from a higher level perspective as a single HTTPS request/response is actually a set of several messages sent on the wire. This imposes a problem in HTTPS based systems that require data sent from the server to the device client since such systems require that the client polls the server for updates. The more frequent the polls, the more load placed on the server. As an example, assume we design a remote control system for opening and closing garage doors.

IoT Garage Door Controller

Here's how a poll based HTTPS system works: Only the client can connect and send asynchronous messages to the server. The server cannot send an unsolicited (asynchronous) message to the client, and it must wait for the client to connect. In our garage door system: the client opens a connection and asks the server for a garage door command, the server responds, and the client closes the connection. This sequence is repeated indefinitely -- i.e. the client will repeatedly ask if it should close the garage door, and the server will respond yes or no. The server can only respond when the client asks.

Figure 2 shows how the microcontroller based garage door controller continuously sends HTTPS poll requests to the server.

A simple web based phone app, connected to the online server, lets the subscriber remotely manage his/her own garage door. Say you forget to close your garage door. You open your phone app(a client), and you click the close door button. The message is immediately sent to the online server. The server finds the corresponding garage door IoT device from the connected phone app's user credentials. The server cannot immediately send the close door control message to the garage door IoT device client. Instead it saves the control command message that it received from the phone and waits for the next poll request from the garage door IoT device client. The close garage door control command will then be sent as the response message to the client poll request.

How long are you prepared to wait for the garage door to close after you press the close button? Probably not that long, and probably not longer than one minute. This means the device poll frequency must be at least in one minute intervals.

HTTP is a stateless protocol where the connection is initiated by a client and the client closes the connection after it receives the response. Persistent HTTP 1.1 connections will not change this behavior since the server cannot push multiple responses for the same request. The server has no option but to wait for the device client to reconnect as part of its poll cycle before it can send a message to the device client.

Now, say that the online server manages 250,000 garage doors, each polling the server for data every minute. The server will then need to handle 250,000 HTTPS request/response pairs every minute or 4 request/response pairs every millisecond. This will obviously create a huge network load, but the big question is if the server can handle that many connection requests. Recall that for each HTTPS request/response pair, we have at a bare minimum 9 messages sent on the wire. This means the server must be able to handle 36 low level messages every millisecond. In addition, the very heavy to compute asymmetric key encryption places a huge computation load on the server. Each HTTPS handshaking requires that the server performs asymmetric key encryption for that particular connection.

Circumventing the Polling Problem by Using Persistent Connections

64K Connection Myth

It's a common misconception that a TCP/IP stack can only handle 64K connections since the port number is 16 bit. This is simply not the case since the complete TCP/IP address, known as a 5-tuple, enables a server to uniquely identify virtually an unlimited amount of connected clients. The 5 tuple is made up of: source IP address, destination IP address, source port number, destination port number, and the protocol in use.

The obvious solution to the problem is to use persistent connections. A high end TCP/IP stack easily handles 250K, or 500K, or even more persistent socket connections. The question is if your web/application server solution can handle as many connections and how much memory the server uses if it supports this many connections. For this reason, selecting a good web/application server solution is very important. We will discuss this in greater detail later.

A standard TCP/IP socket connection is by definition persistent, and it can handle messages sent in both directions at the same time. Creating a custom listen server may not be part of your web/application server infrastructure/API, and a custom non HTTPS based protocol will prevent the client from penetrating firewalls and proxies. What we need is a protocol that starts out as HTTPS and then morphs into a secure persistent socket connection, keeping all the benefits of HTTPS.

The WebSocket Protocol

The SharkSSL stack includes a secure WebSocket client library that enables real-time communication between a device and any WebSocket enabled web/application server solution.

The WebSocket protocol defined in RFC 6455 specifies how a standard HTTP request/response pair can be upgraded to a persistent full duplex connection. When using SSL, it lets you morph a HTTPS connection into a secure persistent socket connection. WebSocket-based applications enable real-time communication just like a regular socket connection. What makes WebSocket unique is that it inherits all of the benefits of HTTPS since it initially starts as a HTTPS request/response pair. This means that you can bypass firewalls/proxies, communicate over SSL, and easily provide authentication.

Live Real-Time WebSockets Example:

The following embedded iframe shows our online real time LED demo. Controlling LEDs (lights) via an online server requires real time features similar to remotely controlling a garage door. The browser-server communication uses the SMQ pub/sub protocol, which uses WebSockets for the transport. The server acts as a proxy for communication between the user interfaces (browsers) and devices.

Downloading and connecting your own device to the online proxy gives a better understanding of how this works. We provide a pre-compiled simulated device for Windows. Download the pre-compiled device simulator, and start the executable by double clicking on the file. The simulated device, running in a DOS console window, connects to the online server and shows up in the above user interface. See the SMQ source code page for more download options, including source code.

See also: A Modern Approach to Embedding a Web Server in a Device

Posted in Whitepapers