QoS in Plain Terms
MQTT defines three delivery guarantees between a client and a broker. The right choice depends on what the data is and how expensive a missed or duplicate message is.
- QoS 0 — At most once. Fire and forget. The client publishes; the broker may or may not receive it. No acknowledgement, no retry. Best for high-frequency telemetry (temperature every second) where occasional loss is acceptable and bandwidth is constrained.
-
QoS 1 — At least once. The broker sends a
PUBACK. The client retries until it receives one. The message is guaranteed to arrive, but it may arrive more than once. Your subscriber must handle duplicates — use a timestamp or sequence ID in the payload. - QoS 2 — Exactly once. A four-way handshake (PUBLISH → PUBREC → PUBREL → PUBCOMP) guarantees delivery with no duplicates. Correct for commands and financial events. Too expensive for sensor telemetry on LTE-M — the handshake adds two round-trip latencies and significant battery drain.
Last Will and Testament — The Underused Feature
Last Will and Testament (LWT) is one of the most useful MQTT features for industrial monitoring, and consistently the most overlooked. When a client registers an LWT during connection, the broker stores a will message. If the client disconnects ungracefully — network failure, power loss, firmware crash — the broker automatically publishes the will message to the specified topic.
Without LWT, a dashboard or monitoring system has no way to distinguish between "device
is alive and quiet" and "device is dead." With LWT, a retained online: false
message appears on the status topic the moment the device disappears, and any subscriber
or alerting system immediately knows.
import paho.mqtt.client as mqtt
import json, time, ssl
BROKER = "your-broker.example.com"
PORT = 8883 # TLS
CLIENT_ID = "sensor-node-001"
STATUS_TOPIC = f"devices/{CLIENT_ID}/status"
TELEMETRY_TOPIC = f"devices/{CLIENT_ID}/telemetry"
def build_client() -> mqtt.Client:
client = mqtt.Client(client_id=CLIENT_ID,
clean_session=False) # persistent session
# Last Will and Testament — broker publishes this on ungraceful disconnect
will_payload = json.dumps({"online": False, "ts": 0, "reason": "lost"})
client.will_set(STATUS_TOPIC,
payload=will_payload,
qos=1,
retain=True) # retained: new subscribers see last state immediately
# TLS — use CA cert to verify broker identity
client.tls_set(ca_certs="/etc/ssl/certs/ca-certificates.crt",
tls_version=ssl.PROTOCOL_TLSv1_2)
client.username_pw_set(username="device-user",
password="device-secret")
return client
def on_connect(client, userdata, flags, rc):
if rc == 0:
# Publish online status with retain so dashboards pick it up immediately
online_payload = json.dumps({"online": True, "ts": int(time.time())})
client.publish(STATUS_TOPIC, online_payload, qos=1, retain=True)
else:
print(f"Connection failed, rc={rc}")
def publish_telemetry(client: mqtt.Client, reading: dict):
payload = json.dumps(reading)
# Periodic sensor data — QoS 0, low overhead
client.publish(TELEMETRY_TOPIC, payload, qos=0)
def publish_alert(client: mqtt.Client, alert: dict):
payload = json.dumps(alert)
# Alerts must arrive — QoS 1
client.publish(f"devices/{CLIENT_ID}/alerts", payload, qos=1)
Persistent Sessions — Broker-Side Queuing
When clean_session=False, the broker stores any QoS 1 or 2 messages
published to topics the client is subscribed to, for delivery when the client reconnects.
This is essential for devices that go offline regularly — solar-powered field sensors,
devices in coverage-poor areas, anything that sleeps between readings.
The trade-off: the broker must maintain per-client state indefinitely, which consumes
memory proportional to the number of devices and their offline duration. On self-hosted
brokers (Mosquitto), set a sensible max_queued_messages to prevent runaway
memory use. Most managed MQTT services handle this automatically.
TLS Overhead on Constrained Devices
TLS is non-negotiable for industrial MQTT — unencrypted connections expose device credentials and sensor data. The question is how much overhead to expect.
- TLS handshake: 3–8 KB of data exchange. On LTE-M with typical 50 ms round-trip, the handshake adds 400–800 ms to connection establishment. This is a one-time cost per connection session, not per message.
- Per-message overhead: TLS record header adds ~25 bytes per message. At QoS 0 this is the only overhead. At QoS 1 the PUBACK also incurs TLS overhead but remains well within LTE-M capabilities.
- LoRaWAN gateways: The gateway typically terminates LoRa on the device side and runs MQTT to the cloud on the gateway's Linux side. TLS runs on the gateway — not the end node — so the constrained radio link is unaffected.
import time, random
MAX_BACKOFF = 120 # seconds
def connect_with_backoff(client: mqtt.Client, host: str, port: int):
"""
Exponential backoff reconnect loop.
On LTE-M the TLS handshake takes 400–800 ms — do not hammer the broker.
"""
delay = 1
while True:
try:
client.connect(host, port, keepalive=60)
client.loop_start()
return
except (ConnectionRefusedError, OSError) as e:
jitter = random.uniform(0, delay * 0.2)
print(f"Connect failed ({e}). Retry in {delay:.0f}s")
time.sleep(delay + jitter)
delay = min(delay * 2, MAX_BACKOFF)
Broker Selection at a Glance
- Mosquitto: Lightweight, single-binary, excellent for under 500 concurrent devices. No native clustering. Good starting point for on-premise industrial deployments.
- EMQX: Clustering, rule engine for routing messages to databases or HTTP endpoints, Sparkplug B support. Better suited to large-scale industrial deployments.
- HiveMQ: Enterprise features, Sparkplug B native, good documentation. Higher cost. Often seen in automotive and manufacturing OEM integrations.
- AWS IoT Core / Azure IoT Hub: Managed, no operational overhead, scales automatically. MQTT with additional proprietary topic conventions. Good choice when your data pipeline is already cloud-native.