High Entropy Encoding as a new standard for Layered messages
A new interpretation of Information Theory
This is an early look at a coming paper in development, it builds on the large number of Substitution Lexicon articles , sub-lex in this collection, readers may want to explore those first .
Abstract
Human languages are often treated as low-entropy systems: ordered, predictable, and compressible. But this order is the product of consensus not a native property of language. In their most primitive forms, languages are high-entropy systems—disconnected symbols, ambiguous meanings, with no fixed structure. Only through shared interpretation and repeated use does their entropy decrease, making them accessible to others. Classical information theory, while useful, fails to capture this dynamic; it models language as symbol streams with statistical regularity, ignoring the relational scaffolding that gives language its meaning.
This paper introduces a communication model that intentionally reverses this process, embedding high-entropy content inside low-entropy carriers. By exploiting the predictable structure of language and media, this model enables hidden layers of meaning to coexist with public narratives. Unlike encryption, which produces detectable artifacts, these messages remain indistinguishable from ordinary content. The approach offers a new paradigm for privacy, plausible deniability, and context-sensitive communication a framework we refer to as High Entropy Encoding Systems (HEES).
1. Introduction
Encryption and linguistics are rarely discussed in the same sentence. Yet both operate within the domain of high entropy.
Human languages begin in disorder. To an uninitiated observer, an unfamiliar language is a dense, chaotic symbol system—seemingly without rules, patterns, or reference points. In this early state, language behaves like a high-entropy system: unstructured, ambiguous, and inaccessible. Over time, however, as relationships form between symbols and shared meaning, entropy is reduced. Language becomes increasingly legible through consensus, translation, and cultural context. This process of relational entropy collapse is how languages evolve from noise to structure.
Encryption, by contrast, starts with structured input and deliberately transforms it into high-entropy output. Encrypted messages are designed to resist interpretation without the key. They remain disorderly—by design—until a precise mathematical operation (a decryption function or hash match) is applied. However, unlike early human languages, encrypted messages are visibly high entropy. They produce detectable ciphertext: blocks of data that clearly signal obfuscation to any observer.
This distinction is critical.
Human language obscures meaning through ambiguity. Encryption obscures meaning through transformation.
⸻
1.1 Historical Entropy in Language
Human languages are not inherently low-entropy systems. They begin in a state of uncertainty—their symbols ambiguous, their syntax unknown. Meaning only emerges through shared relationships and repeated exposure. The evolution of written language—from early pictographs to phonetic alphabets—mirrors a gradual entropy reduction as cultural consensus crystallizes the rules of interpretation.
A striking example is the decipherment of Egyptian hieroglyphics. For centuries, the symbols were unreadable—a high-entropy system. The discovery of the Rosetta Stone offered a mapping to Greek, a lower-entropy language, enabling hieroglyphics to be reinterpreted through known relationships. This did not “decrypt” hieroglyphics in a computational sense—it reduced entropy by anchoring unfamiliar symbols to known linguistic structures.
⸻
1.2 Reversing the Process
High Entropy Encoding Systems (HEES) reverse this historical process. Rather than reducing entropy through shared understanding, HEES intentionally reintroduces entropy by embedding structured meaning into already-legible, low-entropy carriers—such as natural language, media, or interface design.
These carriers appear completely ordinary to the uninitiated, but hold layered meaning accessible only through a contextual or computational lens. In contrast to encryption, HEES does not produce ciphertext, and cannot be flagged through conventional detection systems.
HEES messages retain plausible surface meaning while simultaneously encoding a secondary, high-entropy payload. This makes them indistinguishable from ordinary content, yet precisely decodable by those with the correct relational model.
⸻
2. What Is Entropy?
2.1 Shannon’s Definition
In both physics and information theory, entropy describes uncertainty or disorder. Claude Shannon gave it a precise definition in 1948, measuring the unpredictability of a message source.
For a random variable X with possible outcomes x₁, x₂, …, xₙ, each with probability P(xᵢ), Shannon entropy is:
H(X) = -∑ P(xᵢ) · log₂ P(xᵢ)
This tells us how much information, on average, each symbol in a message carries. Entropy is highest when all outcomes are equally likely, and lowest when one outcome is overwhelmingly probable.
2.2 How It Works in Communication
Shannon entropy helps us understand systems like radio, binary data, or compression algorithms. It tells us:
How efficiently a message can be encoded
How much redundancy is needed to correct errors
What the limits of a channel are under noise
But Shannon’s framework is blind to meaning. It cares about the symbols, not what they represent.
2.3 What About Language?
Natural language feels structured and low entropy but that only holds after the structure has been learned. To an outsider, an unfamiliar language looks like noise.
Even within shared languages, entropy varies:
A poem can compress meaning into minimal structure
A legal clause might be predictable in form but obscure in meaning
A joke or meme might depend entirely on shared cultural background
So , language starts out high entropyjust like encrypted text or static. It only becomes low entropy when we’ve built relationships between the symbols and the ideas they represent.
We can express this entropy reduction process mathematically. Let Hₚ(L) be the perceived entropy of a language L from the perspective of an observer who has learned R relationships (e.g., symbol-to-meaning pairs, grammar rules, cultural mappings). We define:
Hₚ(L) = H₀ · e^(–αR)
Where:
H₀: the maximum entropy of the system with no known relationships
α: a decay constant reflecting how sensitive the language is to relational input
As R increases, entropy drops exponentially. Structured systems like alphabets may reduce quickly (high α), while ideographic or context-heavy systems may decay more slowly. This model gives us a way to quantify interpretability over time or exposure.
2.4 Hieroglyphics and the Rosetta Stone
One of the clearest historical examples of a high-entropy system becoming interpretable is the decipherment of Egyptian hieroglyphics.
For centuries, scholars attempted to decode the visual language using pattern recognition, symbolism, and speculation. But without a relational reference, these attempts mostly failed. Hieroglyphics remained a high-entropy system.
Until ,
The Rosetta Stone, discovered in 1799, provided that reference. Featuring parallel inscriptions in Greek, Demotic, and hieroglyphic scripts, it offered a shared linguistic anchor. Scholars already fluent in Greek could begin aligning unfamiliar glyphs with known words and grammatical patterns.
2.5 The Code Talkers
During World War II, the U.S. military employed Navajo speakers to transmit battlefield messages. These weren’t encrypted using machines. They were spoken in Navajo, often with additional internal codewords.
To enemy cryptographers, the messages were indecipherable. No one listening had access to the structure, grammar, or cultural context of the Navajo language. There was no shared relationship to the symbol system, so it behaved as a high-entropy code even though it was simply fluent, spoken language.
To other Navajo speakers, it was completely legible.
The security came not from transformation, but from cultural isolation.
2.6 Zero-Relationship Systems Remain High Entropy
A message without any shared relationship to the observer is indistinguishable from noise.
This is true for unfamiliar writing systems, undocumented spoken languages, invented symbol sets, or even poetic metaphors taken out of cultural context. A system can be beautifully ordered and internally consistent, but if no observer can map it to anything known, it stays high-entropy.
Structure alone doesn’t reduce entropy. Relationship does.
2.7 Zero Relationship by Design: Encryption as a Limiting Case
Encryption goes further. It is designed to enforce zero relationships on purpose.
There is no Rosetta Stone for AES-256. You either have the key, or you don’t. You can’t partially interpret an encrypted message. You can’t infer meaning from the structure. The ciphertext is deliberately indistinguishable from random data.
Encrypted messages are easy to detect precisely because they destroy observable relationships. They’re designed to look like noise statistically flat, patternless, high-entropy blocks of data. That’s their strength. But it’s also a signal: something is being hidden.
Natural language can’t operate that way. As the Navajo Code Talkers showed, even unfamiliar language can temporarily function as a secure medium, until relationships are discovered. Once those relationships are known, the system’s entropy drops, and its usefulness for concealment disappears.
That makes zero-relationship natural language systems virtually impossible to maintain over time. And it exposes the limits of conventional encryption in contexts where plausible deniability, not just secrecy, is required.
3.0 Sub-Lex a zero relationship encoding
Sub-Lex is the original and most developed implementation of the High Entropy Encoding Systems (HEES) framework. It demonstrates how positional encoding, drifted vector maps, and seeded character sets can be used to layer messages within natural or artificial narratives.
While this paper focuses on Sub-Lex’s technical implementation, the conceptual foundation of Sub-Lex—particularly its vector-based structure and post-symbolic design—is explored more broadly in other works, including the paper Beyond Natural Languages (Cannon, 2025) and accompanying technical articles. These works define HEES as a post-quantum framework for encoding messages in a way that preserves entropy, even in the presence of shared relational context.
In the sections that follow, we describe the evolution of Sub-Lex through its first two major versions, each of which balances message security, entropy retention, and plausible deniability through different architectural trade-offs.
3.1 Sub-Lex Version 1: Narrative Vector Encoding
The original implementation of HEES, Sub-Lex Version 1, embeds messages directly into human-readable narratives by leveraging a positional vector encoding system. Instead of traditional symbolic encryption, this system maps the position of characters within a host narrative to form a first-occurrence vector map.
A message string M = {m₁, m₂, …, mₙ} is decomposed into characters that must be matched in the narrative N = {n₁, n₂, …, nₖ}. The position vector V is then defined as:
Vᵢ = min{ j | nⱼ = mᵢ and j > Vᵢ₋₁ }
A drift seed key S introduces non-linearity by applying a deterministic function fₛ(i) to the resulting vector positions, producing a drifted vector:
V′ᵢ = Vᵢ + fₛ(i)
This “result table” must be shared via a secondary channel along with the seed key S, forming the minimum reconstruction set required to decode M.
Messages may consist of alphanumeric symbols, modified hex digits (e.g., {0–9, A–F}), or even full AES-256 ciphertext, enabling deep obfuscation without recognizable ciphertext patterns. As a result, Sub-Lex messages can be embedded in non-digital content (e.g., signage, graffiti, historical manuscripts) without arousing suspicion.
3.2 Sub-Lex Version 2: Drifted Encodings and Entangled Instructional Systems
The second generation of Sub-Lex abandons natural language narratives entirely in favor of a purely data-driven representation, using an internally randomized hexadecimal charset. A seed key S is used to:
Randomize the full hex charset Σ₁₆ = {0–F}, producing Σ₁₆′
Randomize the numeric subset Σ₁₀ = {0–9}, producing Σ₁₀′
Generate a deterministic drift function fₛ(i) as in Version 1
The encoded message becomes a sequence of indices into Σ₁₆′, making the output indistinguishable from noise. These values may be transmitted inside innocuous data fields or embedded into metadata.
A refinement of this version integrates with the Matrix protocol, leveraging its double ratchet mechanism to cause messages to expire and become unrecoverable. Without access to the original seed S, the drift function, and both randomized charsets, message reconstruction is computationally infeasible.
This version introduces a significant philosophical shift: the message is treated not as content but as an instruction setfor rebuilding the system that generated it. In effect, the encoded message is a non-symbolic entangled state, and its meaning cannot be localized to any single component.
Brute-force attacks yield a superposition of plausible results, similar to token prediction in large language models. HEES encodings resist deterministic cracking because they do not adhere to symbolic mappings.
3.3 Sub-Lex Version 3: Generative HEES Systems
[Note: Detailed analysis of Version 3 is available in the companion paper, Generative Encoding and Privacy first systems , Cannon 2026.]
3.3.1 Creator Flow: New Mapping
In new mapping mode, the user generates a fresh narrative embedding from scratch. The assistant coordinates the following steps:
Message Input
The user enters a message to encode — typically up to 120 characters. For a standard 600-character NLS post, using ~20% of the mapped space is considered safe to maintain plausible entropy distribution.
Public Narrative Description
The user is prompted to describe the visible topic of the post — the public-facing content that the narrative will convey.
Mapping Optimization
The assistant selects the most optimal mapping of characters into the available narrative space, arranging the message into positions that minimize detectability while preserving narrative coherence.
Optional Scrambling (Word Puzzle Style)
The user may choose to rearrange surface-level characters interactively — scrambling or shifting them manually (like a word puzzle) to obscure any visible regularity. The assistant adapts the narrative accordingly.
Narrative Generation
The assistant now generates a full-length narrative (e.g., 600 characters), constrained by:
Character positions from the payload map
The public prompt
Entropy and grammaticality shaping functions
Mapped Character Highlighting
Once the draft is generated, the assistant highlights the embedded characters to allow the user to review and fix any hallucination-driven misalignments.
Drift Value Insertion
After user approval, the assistant randomly inserts two Sub-Lex-style marker characters into the narrative. These encode a 4-digit drift value used to offset and obfuscate pattern distribution. The drift also acts as a noise vector to defeat correlation analysis.
Key Generation
A 128- or 256-byte key is generated. This includes:
The drift value
The character position mapping
Any custom scramble transformations
A hash or checksum of the narrative + mapping
This key is sufficient to recreate the full message mapping and may be saved, exported, or reused.
3.3.2 Creator Flow: Existing Mapping
In existing mapping mode, the user begins with a previously generated key and applies it to a new narrative.
Key Input
The assistant reconstructs the character map from the key and displays the locked positions.
Narrative Generation (with Locked Map)
The user proceeds just as in new mode, except the mapped positions are fixed. The assistant must build a new narrative that satisfies the same embedding map.
Override Option
Users may override locked positions to correct flow or for creative flexibility — but doing so will generate a new key, preserving immutability guarantees of the original message.
3.3.3 Reuse and Longevity
Generative HEES mappings can be:
Reused across multiple posts
Regenerated with small variations
Stored indefinitely if securely kept
While the entropy profile remains constant, the visible narrative can evolve. The drift value ensures each reuse appears unique at the surface level.
3.3.4 Optional Sub-Lex Layering
For short payloads (e.g., 16–32 characters), Sub-Lex vector encoding may be optionally layered inside the generated narrative instead of using full NLS mapping. This allows for hybrid systems:
Sub-Lex for compact vector-encoded secrets
NLS for narrative-shaped embeddings of larger messages or data fields
This dual-mode approach increases flexibility and preserves forward compatibility with future Sub-Lex interpreters.
4. Decoding HEES in Practice
In earlier articles, we explored decoding methods for Sub-Lex–encoded messages—specifically the vector-based table reconstruction that allows for secure but manual retrieval.
As Generative HEES systems evolved, so did our decoding strategies.
We discovered a method that not only supports narrative Sub-Lex and NLS-based generative encoding, but also leverages existing device ecosystems to deliver completely passive, undetectable decoding. It uses audio, particularly Bluetooth or device speech playback, to trigger HEES-aware listeners like Whisper.
This section presents a real-world case study demonstrating how HEES messages can be decoded entirely through the audio layer—with no app integration, no visible tools, and full platform deniability.
Case Study: Passive HEES Decoding via Whisper
Scenario
A layered HEES message is embedded into a social media post, readable like any normal narrative.
A trusted recipient opens the post on their phone.
The phone’s built-in assistant—Siri, Gemini, Google Assistant, or Alexa—is asked to read the post aloud using a simple voice command:
“Read the last message”
“What’s new from X?”
“Can you read that out loud?”
The audio plays through the device speaker or Bluetooth headphones, as it would for any screen reader or accessibility tool.
Whisper, running on a paired laptop, secured phone, or a tiny embedded device, listens to the readout.
Decoding Pipeline (Audio Layer)
Audio Capture
The Whisper model captures the raw speech from the device assistant or reader. This can happen in real-time over Bluetooth or from ambient speaker output.
Transcription
Whisper transcribes the narrative into clean plaintext with high accuracy (assuming normal TTS delivery).
This gives us the same natural language surface used for embedding.
Key Input
The user enters or preloads the HEES key—typically 256-bit—to guide the decoding process.
Map Reconstruction
The system:
Applies the drift function using any embedded Sub-Lex markers detected in the audio transcription.
Uses the key to reestablish the mapping.
Extracts the payload in order from the transcribed content.
Message Display
The final plaintext message is revealed to the user—decoded purely from what was heard, not what was downloaded, viewed, or decrypted.
Why This Works So Well
Key Features and Benefits
Platform-agnostic
Works seamlessly with voice assistants like Siri, Gemini, Alexa, Google Assistant, and more.
No app or API integration required
The HEES listener operates independently—no permissions, SDKs, or app installations needed.
No on-device processing
Decoding can occur remotely on a Whisper-enabled device, preserving security and battery life.
Plausible deniability
Users can claim they were “just listening” to audio—there’s no visible evidence of secret decoding.
Compatible with printed content
HEES messages on posters, signs, or flyers can be captured with OCR and read aloud for decoding.
Low-power deployment
Suitable for embedded or offline environments—audio triggers require minimal local computation.
Implications
This method turns any AI assistant or TTS service into a message relay. HEES messages can be broadcast via:
Smart speakers
Auto-read notifications
Accessibility screen readers
Bluetooth audio channels
Shared device playback (e.g., public phones, smart glasses)
All without detection, explicit decryption, or signal that private content is being transmitted.
A message can now pass through any platform’s voice assistant layer, carrying a HEES-encoded payload to any listening device, anywhere in the world.
4. Decoding HEES via Audio (Case Study: Whisper Listener)
In earlier work, we explored how Sub-Lex messages could be decoded using positional tables. With the rise of generative HEES systems, we’ve developed a new method—one that works across all narrative-based encodings, including both Sub-Lex and NLS mappings.
This method takes advantage of device speech playback and Bluetooth audio, allowing messages to be decoded off-device, using tools like Whisper. It leverages the fact that most phones and platforms (iOS, Android, desktop) can read aloud posts, messages, or notifications via built-in assistants like Siri, Gemini, Alexa, or Google Assistant.
Case Study: Whisper-Based Audio Decoding
A user opens a HEES-encoded post on a mobile device.
They ask the assistant to read the post aloud (via voice command or accessibility setting).
Whisper, running on a paired or nearby device, listens to the audio and transcribes the full message text.
The user inputs the HEES key, and the system:
Reconstructs the character map
Applies drift offset (from embedded markers)
Reveals the decoded message as plaintext
No decoding occurs on the mobile device. The phone simply speaks the post as any assistant would. The message is extracted entirely through passive audio.
Resistant to Cellular Exploits
This method is highly resistant to spyware such as Pegasus or Cellebrite because:
No decryption, decoding, or message reconstruction happens on the phone
The key is never typed, stored, or transmitted on the compromised device
The phone functions only as a speaker, not a participant in the decoding
Audio can be captured ambiently or via Bluetooth—never digitally traced
Even if the device is fully compromised, there is no forensic trace of HEES decoding activity.
Why This Matters
Additional Features and Benefits
Platform-agnostic
Works across all major devices, including iOS, Android, smart speakers, and wearables.
Undetectable flow
Involves no local processing, storage, or behavioral trace, making it highly resistant to forensic analysis.
Pairs with Sub-Lex or NLS
Fully supports both vector-based (Sub-Lex) and generative (NLS-style) HEES systems.
Usable in high-risk regions
Remains operational even when devices are assumed to be compromised, enabling resilient communication under surveillance.
Conclusion: Toward a Privacy-First Future
This article marks the beginning of our 2026 privacy-first campaign, and builds on the foundation laid by our earlier explorations of the Sub-Lex protocol. While those articles introduced high-entropy positional encoding, this piece expands the field by formalizing Generative HEES systems, and presenting a unified model for invisible, context-aware, layered messaging—with practical decoding strategies like audio-based extraction using Whisper.
We encourage readers interested in Sub-Lex, HEES, and high-entropy communication to explore the full collection of prior articles. Together, they offer a roadmap for a world where meaning can be layered, concealed, and transmitted—without detection or compromise.
In the coming months, we plan to publish at least three formal papers on the concepts outlined here:
A technical deep dive on audio-based decoding and off-device security
A formal taxonomy of Generative HEES systems
A specification for Sub-Lex v2 integration in record-keeping and protocol-layer security
We see 2026 as a pivotal year for privacy technologies. With growing concerns over personal privacy, the need for systems that are invisible by design—not just encrypted—has never been greater.
We are especially focused on opportunities in:
Medical and legal record storage (e.g., patient-owned access layers, role-based reveal)
Truly private social posts (layered messages visible only to intended readers)
Smart glasses and wearable interfaces, where privacy must extend into real-world AR displays
To support these efforts, we are developing an AT Protocol client that demonstrates generative AI authoring with built-in HEES encoding—proving that modern communication tools can support freedom, context, and consent.
All HEES protocols are inherently invisible to scraping, indexing, and surveillance.
Sub-Lex and generative encodings resist detection at the structural level, not just the cryptographic one.
Sub-Lex Version 2, in particular, is designed for embedding into other protocols or securing immutable records, not just messaging. And as wearable computing accelerates, privacy should be treated as a baseline system requirement, not an afterthought.
We look forward to continuing this campaign—research, tools, and advocacy—throughout 2026 and beyond.





