MQTT Tutorial for Arduino, ESP8266 and ESP32

In this article you learn what MQTT is and how this message protocol works.

This tutorial covers the following parts:

  • Sequence of MQTT
  • Message Protocol
  • Message Formats
  • Security of the MQTT Protocol.
MQTT Sequence

Table of Contents

MQTT stand for Message Queuing Telemetry Transport and was invented by Andy Stanford-Clark of IBM and Arlen Nipper of Cirrus Link in 1999. Over the last years the “internet of thinks (IoT) became very popular. The core component of the IoT is the communication and interaction of different physical devices directly or over the internet. Therefore a machine to machine (M2M) communication protocol is needed. That is what MQTT is. MQTT is a light messaging transport protocol based on publish/subscribe messaging and works on top of TCP/IP.

Therefore the protocol is suitable for microcontrollers like the Arduino, ESP8266, ESP32 or Raspberry Pi. I personal use MQTT for sending data from my weather stations, build with an NodeMCU, to my Raspberry Pi which is the central control unit for my smart home. If you are interested in the practical example of the MQTT connection, read the microcontroller to Raspberry Pi WiFi MQTT communication article.

There are also industry applications based on MQTT. For example the Facebook messenger is based on MQTT.

The payload which can be send via the MQTT protocol is plain text. Therefore the corresponding unity has to be added by the subscriber. The minimal message length is 2 bytes and the maximal message length is 265 megabytes.

MQTT- How it works

As mentioned MQTT base on a publish and subscribe pattern. Therefore a message broker, often called server, is needed to manage the connection between the publisher and the subscriber. There is no limitation that only one broker can interact in the network. Publisher and subscriber are also called clients.

Information are organized in a hierarchy of topics. This means that every information has a

In total there are 3 different parts which interact differently in an MQTT interaction:

  1. Publisher
    The publisher sends information to the broker. In a smart home use case a weather station would be a publisher because it sends temperature information. One advantage of the MQTT protocol is that a publisher does not need any information about the subscribers in quantity and connection.

  2. Subscriber
    Subscriber get information from the broker. A laptop with a dashboard that shows graphics of the temperature from the weather station would be a subscriber. Like the publisher, the subscriber does not need any information about the publishers connection.

  3. Broker
    The message broker can also be publisher or subscriber at the same time. I use a Raspberry Pi as server, which is at the same time subscriber to all publisher to show me a dashboard of the current smart home status.
    The broker will always save the last message from every topic even if there is no subscriber for the topic. Therefore if there is a new client subscribing for a topic, the subscriber will get the last message instead of waiting for the next time a publisher sends data to the broker.
    There are different form of broker. We distinguish between self hosted broker like Mosquitto or HiveMQ and cloud based broker like IBM or Microsoft (Azure).

Sequence of MQTT

MQTT Sequence
  1. Publisher and subscriber connect to broker
  2. When Publisher get new data to distribute, the publisher sends a message including the topic and the data to the broker.
  3. The broker distributes the incoming data to all clients which are subscribed to the topic.

Microcontroller Datasheet eBook

The 35 pages Microcontroller Datasheet Playbook contains the most useful information of 14 Arduino, ESP8266 and ESP32 microcontroller boards.

MQTT Message Protocol

01234567
Byte 1Message Type = 8DUPQoS Level-MQTT fixed header
Byte 2Remaining Length
Byte 3Message ID (MSB)MQTT variable header
Byte 4Message ID (LSB)
Byte 5Topic Name String Length (MSB)List of topics
Byte 6Topic Name String Length (LSB)
Byte 7Topic Name
...
Byte n
Reserved ved (not used)QoS Level

The MQTT protocol works by exchanging a series of MQTT Control Packets. An MQTT Control Packet consists of up to three parts, always in the following order:

  • Fixed header, present in all MQTT Control Packets
  • Variable header, present in some MQTT Control Packets
  • Payload, present in some MQTT Control Packets

Fixed header

1234567
Byte 1Message Type = 1 MQTT Controll Packet type MQTT fixed header
Byte 2Remaining Length
The fixed header has a length of 1 byte and is split in two parts. Bit 7 to 4 contain the MQTT Control Packet type and bit 3 to 0 flags specific to each MQTT Control Packet type. The following table shows the 14 different MQTT Control Packet types and their corresponding flags.
  • C = Client
  • S = Server
  • DUP = Duplicate message flag. Indicates to the receiver that this message may have already been received.
    • =1: Client or server re-delivers a PUBLISH, PUBREL, SUBSCRIBE or UNSUBSCRIBE message.
  • QoS = PUBLISH Quality of Service
    • = 0: At-most-once delivery (Fire and Forget): There is no guarantee that the messages are delivered. MQTT is depended to the delivery guarantees of the underlying network (TCP/IP)
    • =1: At-least-once delivery: Messages are guaranteed to arrive, but there may be duplicates.
    • =2: Exactly-once delivery: This is the highest level that also incurs most overhead in terms of control messages and the need for locally storing the messages.
  • RETAIN = 1: Instructs the server to retain the last received PUBLISH message and deliver it as the first message to new subscriptions.
NameValueDirection of flowDescriptionFixed header flagsBit 0Bit 1Bit 2Bit 3
Reserved0ForbiddenReserved
CONNECT1C → SClient request to connect to ServerReserved0000
CONNACK2S → CConnect acknowledgment0000
PUBLISH3C → S, S → CPublish messageReservedRETAINQoSQoSDUP
PUBACK4C → S, S → CPublish acknowledgmentReserved0000
PUBREC5C → S, S → CPublish received (assured delivery part 1)Reserved0100
PUBREL6C → S, S → CPublish release (assured delivery part 2)Reserved0100
PUBCOMP7C → S, S → CPublish complete (assured delivery part 3)Reserved0000
SUBSCRIBE8C → SClient subscribe requestReserved0100
SUBACK9S → CSubscribe acknowledgmentReserved0000
UNSUBSCRIBE10C → SUnsubscribe requestReserved0100
UNSUBACK11S → CUnsubscribe acknowledgmentReserved0000
PINGREQ12C → SPING requestReserved0000
PINGRESP13S → CPING responseReserved0000
DISCONNECT14C → SClient is disconnectingReserved0000
Reserved15ForbiddenReserved

The following message sequence chart shows the CONNECT and SUBSCRIBE setup

Remaining Length

The Remaining Length is the number of bytes remaining within the current packet, including data in the variable header and the payload. The Remaining Length does not include the bytes used to encode the Remaining Length. The Remaining Length is encoded using a variable length encoding scheme which uses a single byte for values up to 127. Larger values are handled as follows.

The least significant seven bits of each byte encode the data, and the most significant bit is used to indicate that there are following bytes in the representation. Thus each byte encodes 128 values and a “continuation bit”. The maximum number of bytes in the Remaining Length field is four.

The equation to calculate the remaining length is the following: a*128^0+b*128^1+c*128^2+d*128^3

Example Remaining Length = 364

From the equation you first need to think: what is the highest power of 128^x that is below 364. The answer ist 1 because 128^1 is 128 and 128^2 would be 16,384. Therefore we know that CB1 = 0 for Byte 1 and we will also need Byte 0 to complete the task and know that CB0 = 1.

Now we have to calculate b: What is the maximum factor b that can be multiplied by 128 to get as close as possible to 364. The answer is b=2, because 2*128 = 256 and 3*128 = 384. To fill the following table, we insert the number 2 as binary for the row of Byte 1 that is 0000010 (attention because bit 0 is on the left side).

The remaining length for Byte 0 is therefore 364 – 2*128^1 = 108.

Now we do the same math for Byte 0 where we have a power of 0 to 128. Therefore we know that a = 108 that equals 1101100 in binary format for the following table of the remaining length.

BitCBX0123456DEC
Byte 0 (a*128^0)10011011108
Byte 1 (b*128^1)001000002

Example Remaining Length = 25897

Now we do the second example a little faster with the remaining length of 25897.

  • 25897 -> c*128^2 -> c=1, CB2=0, CB1=1, CB0=1
  • 25897 – 16384 = 9513 -> b*128^1 -> b=74
  • 9513 – 9472 = 41 -> a*128^0 -> a=41
BitCBX0123456DEC
Byte 0 (a*128^0)1100101041
Byte 1 (b*128^1)1010100174
Byte 2 (c*128^2)010000001

In total there are 4 bytes reserved for the Remaining Length. The following table shows the size of Remaining Length field.

DigitsFromTo
10 (0x00)127 (0x7F) → 0111|1111
2128 (0x80, 0x01)16 383 (0xFF, 0x7F)
316 384 (0x80, 0x80, 0x01)2 097 151 (0xFF, 0xFF, 0x7F)
42 097 152 (0x80, 0x80, 0x80, 0x01)268 435 455 (0xFF, 0xFF, 0xFF, 0x7F)

Variable Header

Some types of MQTT Control Packets contain a variable header component. It resides between the fixed header and the payload. The content of the variable header varies depending on the Packet type. The Packet Identifier field of variable header is common in several packet types.

Packet Identifier Bytes

Bit76543210
Byte 1Packet Identifier MSB (most significant bit)
Byte 2Packet Identifier LSB (last significant bit)

The following table shows, which packet types uses a Packet Identifier.

Control PacketPacket Identifier field
CONNECTNo
CONNACKNo
PUBLISHYes (if QoS > 0)
PUBACKYes
PUBRECYes
PUBRELYes
PUBCOMPYes
SUBSCRIBEYes
SUBACKYes
UNSUBSCRIBEYes
UNSUBACKYes
PINGREQNo
PINGRESPNo
DISCONNECTNo

SUBSCRIBE, UNSUBSCRIBE, and PUBLISH (in cases where QoS > 0) Control Packets MUST contain a non-zero 16-bit Packet Identifier. Each time a Client sends a new packet of one of these types it MUST assign it a currently unused Packet Identifier.

If a Client re-sends a particular Control Packet, then it MUST use the same Packet Identifier in subsequent re-sends of that packet.

The Packet Identifier becomes available for reuse after the Client has processed the corresponding acknowledgment packet. The following table shows the corresponding acknowledgment packet for the packet types.

Packet TypeAcknowledgment Packet
PUBLISH (QoS = 1)PUBACK
PUBLISH (QoS = 2)PUBCOMP
SUBSCRIBESUBACK
UNSUBSCRIBEUNSUBACK

A PUBLISH Packet MUST NOT contain a Packet Identifier if its QoS value is set to 0.
A PUBACK, PUBREC or PUBREL Packet MUST contain the same Packet Identifier as the PUBLISH Packet that was originally sent. Similarly SUBACK and UNSUBACK MUST contain the Packet Identifier that was used in the corresponding SUBSCRIBE and UNSUBSCRIBE Packet.

Payload

Some MQTT Control Packets contain a payload as the final part of the packet. In the case of the PUBLISH packet this is the Application Message. The following table show the Control Packets that contain a Payload.

Control PacketPayload
CONNECTRequired
CONNACKNone
PUBLISHOptional
PUBACKNone
PUBRECNone
PUBRELNone
PUBCOMPNone
SUBSCRIBERequired
SUBACKRequired
UNSUBSCRIBERequired
UNSUBACKNone
PINGREQNone
PINGRESPNone
DISCONNECTNone

MQTT Message Formats

CONNECT

The CONNECT message contains many session-related information as optional header fields.

01234567
Byte 1Message Type = 1---MQTT fixed header
Byte 2Remaining Length
Byte 3Protocol name UTF-8 encoded (e.g. Light_Protocol) prefixed with 2 bytes string length (MSB first)MQTT variable header
...
Byte nProtocol version (value 0x03 for MQTT version 3)
Byte n+1Username FlagPassword FlagWill RetainWill QoSWill FlagClean SessionReserved
Byte n+2Keep Alive Timer MSB
Byte n+3Keep Alive Timer LSB
Byte n+4Client IdentifierOptional payload
Will Topic
Will Message
Username
Byte mPassword
  • Protocol Name: UTF-8 encoded protocol name string. Example “Light_Protocol”.
  • Protocol Version: Value 3 for MQTT V3.
  • Username Flag: If set to 1 indicates that payload contains an username.
  • Password Flag: If set to 1 indicates that payload contains a password. If username flag is set, password flag and password must be set as well.
  • Will Retain: If set to 1 indicates to server that it should retain a Will message for the client which is published in case the client disconnects unexpectedly.
  • Will QoS: Specifies the QoS level for a Will message.
  • Will Flag: Indicates that the message contains a Will message in the payload along with Will retain and Will QoS flags.
  • Clean Session: If set to 1 the server discards any previous information about the (re)-connecting client (clean new session). If set to 0 the server keeps the subscriptions of a disconnecting client including storing QoS level 1 and 2 messages for this client. When the client reconnects, the server publishes the stored messages to the client.
  • Keep Alive Timer: Used by the server to detect broken connections to the client.
  • Client Identifier: The client identifier (between 1 and 23 characters) uniquely identifies the client to the server. The client identifier must be unique across all clients connecting to a server.
  • Will Topic: Will topic to which a will message is published if the will flag is set.
  • Will Message: Will message to be published if will flag is set.
  • Username and Password: Username and password if the corresponding flags are set.

CONNACK

01234567
Byte 1Message Type = 2 MQTT fixed header
Byte 2Remaining Length = 2
Byte 3Reserved (not used) MQTT variable header
Byte 4Connect Return Code
  • Reserved: Reserved field for future use.
  • Connect Return Code:
    • 0: Connection Accepted
    • 1: Connection Refused, reason = unacceptable protocol version
    • 2: Connection Refused, reason = identifier rejected
    • 3: Connection Refused, reason = server unavailable
    • 4: Connection Refused, reason = bad user name or password
    • 5: Connection Refused, reason = not authorized
    • 6-255: Reserved for future use

PUBLISH

01234567
Byte 1Message Type = 3DUPQoS LevelRETAINMQTT fixed header
Byte 2Remaining Length
Byte 3Topic Name String Length (MSB)MQTT variable header
Byte 4Topic Name String Length (LSB)
Byte 5Topic Name
...
Byte n
Byte n+1Message ID (MSB)
Byte n+2Message ID (LSB)
Byte n+3Publish MessagePayload
Byte m
  • Topic Name with Topic Name String Length: Name of topic to which the message is published. The first 2 bytes of the topic name field indicate the topic name string length.
  • Message ID: A message ID represent if QoS is 1 (At least once delivery, acknowledgment delivery) or 2 (Exactly-once delivery).
  • Publish Message: Message as an array ob bytes. The structure of the published message is application-specific.

SUBSCRIBE

01234567
Byte 1Message Type = 8DUPQoS Level-MQTT fixed header
Byte 2Remaining Length
Byte 3Message ID (MSB)MQTT variable header
Byte 4Message ID (LSB)
Byte 5Topic Name String Length (MSB)List of topics
Byte 6Topic Name String Length (LSB)
Byte 7Topic Name
...
Byte nReserved (not in use)QoS Level
  • Message ID: The message ID field is used for acknowledgment of the SUBSCRIBE message since these have a QoS level of 1.
  • Topic Name with Topic Name String Length: Name of topic to which the client subscribes. The first 2 bytes of the topic name field indicate the topic name string length. Topic name strings can contain wildcard characters. Multiple topic names along with their requested QoS level may appear in a SUBSCRIBE message.
  • QoS Level: QoS level at which the clients wants to receive messages from the given topics.

If you are interested in the full message format description, you find the whole specification here.

MQTT security (example of Mosquitto as broker)

Like in every other connection between different devices the level security you need is depended on your use case. From my side I would recommend a basic level of security because the effort you have to do is nearly zero. Our objective is to protect the data, which is transferred between publisher, broker and subscriber.

The MQTT protocol defines that the security mechanisms are initiated by broker and applied by the clients. In total there are 3 ways to verify the identity of a client by the broker.

Identify a client via the client ID

The qualification of the identification via client ID is, that every MQTT client has to provide a client id. When a client subscribes to a topic or different topics, the client ID is linked to the topic and to the TCP connection. Due to persistent connection, the broker remembers the client ID and the corresponding subscribed topic.

Username and password

The security by username and password is the most used one in a MQTT connection because it is easy to implement. The broker requests a valid username and password from client before a connection is permitted. The client transmit the username and password as plain text. If the username and the password are valid, the connection between client and server is established. However if the username and password are invalid, the connection is chocked off by the server.

The downside is that the transmission of the username and password is not secured without an additional transport encryption like SSL for example. An additional benefit for this security mechanisms is, that the username can also be used as authentication of accessing topics.

There is also the possibility of accessing the server as anonymous client. Therefore there are multiple options if an access is restricted by the broker depending on the anonymous access option and the username and password file. The following tables shows all combination and if the access is restricted or not.

Anonymous accessPassword file specifiedAccess restricted
TrueNoNo
TrueYesIf the client sends a username/password, then it must be valid otherwise an authentication error is returned. If it does not send one, then none is required and a normal connection result.
FalseNoThe client must send a username and password, but it is not checked. If the client does not send a username/password, then and authentication error code is generated.
FalseYesYes

Certificates

There is also the possibility of security by certificates. This is the most secured method of client authentication but also the most difficult because of certificate management.

There are two main cryptographic protocols which you can use to secure your MQTT connection:

  • Transport Layer Security (TLS)
  • Secure Sockets Layer (SSL)

Both provide a secure communication channel between a client and a server. Therefore a handshake mechanism is used to negotiate various parameters to create a secure connection. After the handshake is complete, an encrypted communication between client and server is established and no attacker can eavesdrop any part of the communication. There is a drawback to using MQTT over TLS: Security comes at a cost in terms of CPU usage and communication overhead.

2 thoughts on “MQTT Tutorial for Arduino, ESP8266 and ESP32”

  1. Having such a detailed explanation with no example attached does not really work. This tutorial would have been a whole lot more helpful if there were some examples added.

    Reply

Leave a Comment