Ryan Malloy d5d2ea3d32 Diagrams: five hand-crafted SVGs explaining the protocol + architecture
The auto-extracted manual SVGs were unusable PDF text-glyph soup. These
are fresh, theme-aware (currentColor everywhere, accent via the
--sl-color-accent CSS var), and built to teach.

src/assets/diagrams/handshake-sequence.svg
  Sequence diagram with CLIENT and CONTROLLER swim lanes, five steps:
  ClientRequestNewSession -> ControllerAckNewSession (carries SessionID)
  -> derive SessionKey (inline note) -> ClientRequestSecureSession
  (encrypted, accent-coloured) -> ControllerAckSecureSession (encrypted)
  -> first OmniLink2Message. Plaintext arrows in currentColor, encrypted
  arrows in accent.

src/assets/diagrams/packet-structure.svg
  Bytes-on-the-wire box diagram: outer Packet header (seq u16 + type +
  reserved + encrypted payload) decomposed below into the inner Message
  (start byte 0x21, length, opcode, data, CRC u16 LE). Plain vs encrypted
  fields colour-coded with a legend.

src/assets/diagrams/session-key-derivation.svg
  Quirk #1 visual. Three rows of byte cells: ControllerKey (16 bytes,
  with bytes 0..10 in plain colour and 11..15 highlighted), SessionID
  (5 bytes), and the resulting SessionKey with the XOR boundary
  visible. XOR operator in the accent colour to draw the eye.

src/assets/diagrams/per-block-whitening.svg
  Quirk #2 visual. seq pill at the top, three blocks below (block 1,
  block 2, block N) each showing 16 byte cells with the first two
  highlighted in accent and labelled with the seq XOR mask. Drives home
  that it's the SAME mask on EVERY block.

src/assets/diagrams/architecture.svg
  Three groups (LIBRARY, HA INTEGRATION, TEST SURFACE) with boxes
  inside. Library shows the four protocol-layer modules + connection +
  client + models + events. HA shows coordinator + 8 platforms. Test
  surface shows MockPanel (accent-coloured), HA test harness, e2e tests,
  unit tests. One accent-coloured arrow runs from OmniConnection across
  to MockPanel labelled 'TCP/4369 (encrypted)'.

src/assets/diagrams/pca-file-format.svg
  Key chain: hardcoded keyPC01 -> decrypts PCA01.CFG (boxes for the
  CFG fields including the highlighted pca_key) -> arrow showing the
  extracted pca_key -> decrypts the .pca file (boxes for PCA03 magic,
  account info, model byte, body, and the highlighted ControllerKey)
  -> caption 'feeds session-key derivation (quirk #1)'.

Wired in via inline-SVG-via-?raw-import + set:html (so currentColor
adapts to the theme). Required converting four pages to .mdx:
  reference/protocol.mdx        + handshake + packet diagrams
  reference/file-format.mdx     + pca-file-format diagram
  explanation/quirks.mdx        + session-key + whitening diagrams
  explanation/architecture.mdx  + architecture diagram

Two MDX paper cuts during conversion: bare '<100ms' and '<50ms' in
architecture.mdx confused the JSX parser; backticked them as .

Build: 23 pages clean. Verified inline SVG ships in the rendered HTML
(grep for SVG title IDs returns 2/2 hits per relevant page). Container
rebuilt + redeployed. Protocol page is now 92750 bytes (was ~63000),
quirks page 84156 (was ~63000).
2026-05-10 17:32:49 -06:00

219 lines
9.8 KiB
Plaintext

---
title: The two non-public quirks
description: Why public Omni-Link clients silently fail on the first encrypted message — session key XOR mix and per-block pre-whitening before AES.
---
import SessionKey from '../../../assets/diagrams/session-key-derivation.svg?raw';
import Whitening from '../../../assets/diagrams/per-block-whitening.svg?raw';
The Omni-Link II protocol, as documented in the publicly-available spec, looks
like a textbook AES-128-ECB session over TCP: handshake, derive a key, encrypt
everything from then on. As implemented by HAI's PC Access 3.17, it isn't.
There are two quirks in the way the session key is derived and the way payload
blocks are encrypted that are not in any third-party Omni-Link writeup we
could find. Both are unambiguous in the decompiled C# (`clsOmniLinkConnection.cs`).
Both are load-bearing: if a client skips either, the panel accepts the
connection, completes the unencrypted handshake, and then drops the session
on the first encrypted message — `ControllerSessionTerminated`, no
diagnostic, no log.
## Why these quirks exist (informed speculation)
Both quirks have the texture of *defense by inconvenience*. Neither makes the
protocol meaningfully harder to attack — anyone with a packet capture and the
`ControllerKey` can reproduce both transformations in a few lines of code.
But both add just enough complexity that a casual reverse engineer reading
the public spec will write a client that doesn't work, and won't have an
obvious explanation for why.
It looks like the kind of thing where someone on the original team said
"let's not make it trivial for the obvious clones," and the implementation
has the slight inelegance of cargo-culted-from-one-block-to-all-blocks that
suggests it was added by hand rather than designed in. The first quirk may
also have been an attempt at session-key freshness — mix a controller-supplied
nonce so that two sessions with the same `ControllerKey` don't use literally
the same AES key. That's a reasonable goal; a 5-byte XOR is just an unusual
way to achieve it.
Whatever the origin, both quirks are stable across the firmware versions PC
Access 3.17 supports (the v2-on-TCP path), and both must be implemented
exactly to talk to the panel.
## Quirk #1 — session key XOR mix
<div style="margin: 1rem 0 1.5rem;" set:html={SessionKey} />
The `ControllerKey` is the 16-byte AES-128 key that lives in the panel's
NVRAM and inside the encrypted `.pca` config file. The naive expectation is
that this key is what AES uses for the session. It isn't.
From `clsOmniLinkConnection.cs:1886-1892` (the TCP path):
```csharp
SessionKey = new byte[16];
ControllerKey.CopyTo(SessionKey, 0);
for (int j = 0; j < 5; j++)
{
SessionKey[11 + j] = (byte)(ControllerKey[11 + j] ^ SessionID[j]);
}
AES = new clsAES(SessionKey);
```
The first 11 bytes of the session key are the `ControllerKey` verbatim. The
last 5 bytes are the `ControllerKey` XORed with a 5-byte `SessionID` nonce
that the controller sent in the unencrypted `ControllerAckNewSession` packet.
That's the entire key derivation. No PBKDF2, no HKDF, no PIN, no salt. Five
bytes of XOR.
The same five-byte block appears at `:1423-1429` for the UDP path. Identical.
The Python equivalent:
```python
def derive_session_key(controller_key: bytes, session_id: bytes) -> bytes:
assert len(controller_key) == 16
assert len(session_id) == 5
sk = bytearray(controller_key)
for j in range(5):
sk[11 + j] ^= session_id[j]
return bytes(sk)
```
A naive client that uses `ControllerKey` directly as the AES key will
encrypt `ClientRequestSecureSession` (the first encrypted packet) with the
wrong key. The panel decrypts it to garbage — ECB has no integrity check, so
no exception fires; the panel just sees that the SessionID echo doesn't match
what it sent — and drops the session with `ControllerSessionTerminated`.
PC Access surfaces this as `InvalidEncryptionKey`, which sounds like "your
ControllerKey is wrong" but really means "your *derived* key is wrong, which
in practice is always because you didn't apply the XOR mix."
## Quirk #2 — per-block XOR pre-whitening before AES
<div style="margin: 1rem 0 1.5rem;" set:html={Whitening} />
This is the headline.
Before AES-encrypting any payload block, the *first two bytes of every
16-byte block* get XORed with the packet's 16-bit sequence number. Same XOR
mask, every block of the packet. From `clsOmniLinkConnection.cs:396-401`:
```csharp
for (num = 0; num < PKT.Data.Length; num += 16)
{
PKT.Data[num] = (byte)(PKT.Data[num] ^ ((PKT.SequenceNumber & 0xFF00) >> 8));
PKT.Data[num + 1] = (byte)(PKT.Data[num + 1] ^ (PKT.SequenceNumber & 0xFF));
}
PKT.Data = AES.Encrypt(PKT.Data);
```
And the inverse on receive (`:413-417`):
```csharp
PKT.Data = AES.Decrypt(PKT.Data);
for (int i = 0; i < PKT.Data.Length; i += 16)
{
PKT.Data[i] = (byte)(PKT.Data[i] ^ ((PKT.SequenceNumber & 0xFF00) >> 8));
PKT.Data[i + 1] = (byte)(PKT.Data[i + 1] ^ (PKT.SequenceNumber & 0xFF));
}
```
So the on-the-wire encryption is "AES-128-ECB of (payload XOR-prewhitened
with the seq number, two bytes per block)". This is *not* CBC. It is *not*
CTR. It is an outer transformation applied to the plaintext before AES
sees it (and reversed after AES decryption on the wire), independent of
AES's mode.
The Python equivalent:
```python
def whiten(data: bytes, seq: int) -> bytes:
out = bytearray(data)
seq_hi = (seq >> 8) & 0xFF
seq_lo = seq & 0xFF
for i in range(0, len(out), 16):
out[i] ^= seq_hi
out[i + 1] ^= seq_lo
return bytes(out)
def encrypt_payload(payload: bytes, seq: int, session_key: bytes) -> bytes:
# payload is already zero-padded to a 16-byte multiple by the caller.
return aes_ecb_encrypt(whiten(payload, seq), session_key)
def decrypt_payload(ciphertext: bytes, seq: int, session_key: bytes) -> bytes:
return whiten(aes_ecb_decrypt(ciphertext, session_key), seq)
```
The `whiten` function is its own inverse — XOR is symmetric — so the same
helper works both directions.
Cryptographically this is weak. An attacker with a known-plaintext for one
block can recover both bytes of the seq XOR mask by XORing the plaintext
against the un-AES'd ciphertext. From there the AES-encrypted bits are
unprotected by the whitening. It feels like the original intent might have
been nonce-mixing — use the seq as a per-packet salt to defeat ECB's
identical-block-equals-identical-ciphertext property — and the implementation
got cargo-culted from one block (where it would have been roughly
defensible) to every block of the packet (where it isn't doing useful work
beyond the first one). Doesn't matter. It's the protocol. Implement it. Move on.
## Why public OSS Omni-Link clients miss these
The two non-trivial public Omni-Link II clients we checked are
[`jomnilinkII`](https://github.com/digitaldan/jomnilinkII) (Java) and
[`pyomnilink`](https://github.com/excalq/pyomnilink) (Python), plus a
handful of writeups on personal blogs. None of them describe either quirk.
We can't be sure from the outside why, but two plausible explanations:
1. **Inherited working code from a pre-quirk firmware era.** If an early
version of the panel firmware used `ControllerKey` directly as the
session key and didn't have the XOR pre-whitening, an OSS client
written against that firmware would just keep working as long as the
panel maintained backward compatibility on the wire — even though new
firmware added the quirks for new clients. We don't have the firmware
history to confirm or refute this.
2. **Serial-only / unencrypted paths.** Both quirks live in the
`clsOmniLinkConnection.EncryptPacket` / `DecryptPacket` methods, which
are only invoked on packet types `OmniLinkMessage` (0x10) and
`OmniLink2Message` (0x20). The *unencrypted* twin packet types (0x11,
0x21) bypass them entirely. A client that only ever talks to the panel
over the unencrypted v1 serial path would never need them.
Either way, the practical outcome is that an existing OSS client is not a
useful reference for someone trying to write a v2-on-TCP encrypted client
from scratch. The decompiled PC Access C# is.
## The mock panel as proof
The most direct way to prove our implementation of both quirks is correct is
to build a controller-side emulator that round-trips with the client.
`omni_pca.mock_panel.MockPanel` is exactly that: a TCP server that runs the
controller half of the handshake, derives the same `SessionKey`, applies
the same per-block XOR pre-whitening, and decodes / encodes real Omni-Link II
messages. The library's e2e test suite connects a real `OmniClient` to a
real `MockPanel` over a real TCP socket and exchanges real frames. Seventeen
of those tests cover the secure-session handshake, encrypted command
roundtrips, and the unsolicited push-event stream.
If either quirk were implemented incorrectly on either side, decryption
would produce garbage and the connection would drop. The fact that all
seventeen tests pass — including ones that subscribe to events and watch
them roundtrip cleanly through the encrypted channel — is bidirectional
validation that we have both quirks right.
That doesn't prove they're right against a *real* HAI panel. The user's
panel is currently offline (Ethernet module disabled at the panel firmware),
and the live-validation lap is on the backlog. But round-tripping with a
faithful emulator is meaningful evidence that the spec we extracted from
the C# is internally consistent — and that's the work that the public
clients didn't do.
## See also
- [Protocol reference](/reference/protocol/) — full byte-level handshake
including both quirks in their natural place in the flow.
- [Architecture overview](/explanation/architecture/) — how the mock panel
fits into the test stack.
- [The Journey](/journey/) — what it took to find the quirks in the first
place.