You understand the concepts — content addressing, canonical serialization, field compaction, the ten grain types. Now you want to build something. Where do you start?
The OMS v1.0 specification defines three conformance levels, each building on the last. This post walks through all three, then provides a step-by-step guide for implementing Level 1 — the minimum viable OMS implementation — using the spec's test vectors to verify correctness.
The Three Conformance Levels
Every OMS implementation MUST declare which conformance level it supports. The levels are cumulative: Level 2 includes everything in Level 1, and Level 3 includes everything in Level 2.
Level 1: Minimal Reader
Level 1 is sufficient for reading, verifying, and storing grains. It requires six capabilities:
- Deserialize version byte + canonical MessagePack payload — Parse the 9-byte fixed header and decode the MessagePack map that follows it.
- Compute and verify SHA-256 content addresses — Hash the entire blob (header + payload) and compare against a claimed content address.
- Support field compaction (short keys to full names) — When you encounter a key
"s"in the payload, you must know it means"subject". When you encounter"c", it means"confidence". The full mapping is in Section 6 of the spec. - Support all 10 grain types — Fact, Episode, Checkpoint, Workflow, ToolCall, Observation, and Goal. Your deserializer must handle the required fields for each.
- Ignore unknown fields — If you encounter a key that is not in the field compaction mapping, preserve it as-is. Do not error. This is how OMS achieves forward compatibility.
- Constant-time hash comparison — When verifying content addresses, use a constant-time comparison function, not string equality. This prevents timing attacks where an adversary measures response times to guess hash values byte by byte.
Level 1 is the right starting point for read-only tools: a grain viewer, a verification utility, a log ingestion pipeline that needs to validate integrity without writing new grains.
Level 2: Full Implementation
Level 2 adds writing capabilities and strict enforcement. All Level 1 requirements, plus:
- Serialize (full names to short keys) — The reverse of Level 1 compaction.
- Enforce canonical MessagePack rules — Sorted keys, smallest integer encoding, float64 only, NFC-normalized strings, null omission.
- Validate required fields per schema — A Fact must have
type,subject,relation,object,confidence,source_type, andcreated_at. Reject grains with missing required fields. - Pass all test vectors — The spec provides six test vectors with expected content addresses.
- Support multi-modal content references — Handle
content_refsandembedding_refsarrays, including their nested field compaction. - Implement Store protocol —
get,put,delete,list,existsoperations for grain storage. - Enforce invalidation_policy on supersession and contradiction — When a grain carries an
invalidation_policy, your store must check it before allowing supersession or contradiction. - Implement supersede as a distinct atomic operation — Not a raw
putfollowed by an index patch. AputMUST reject grains containingderived_fromclaims that imply supersession without going throughsupersede. - Fail-closed on unknown invalidation modes — Unrecognized
invalidation_policy.modevalues are treated as"locked". Never default to permissive. - Enforce the replaces non-supersession rule —
relation_type: "replaces"inrelated_tois advisory. It MUST NOT trigger index mutations on the target grain.
Level 3: Production Store
Level 3 is a complete, production-ready memory store. All Level 2 requirements, plus:
- At least one persistent backend — Filesystem, S3, database, or similar. In-memory-only does not qualify.
- AES-256-GCM encrypted grain envelopes — Per-grain encryption with authenticated encryption.
- Per-user key derivation (HKDF-SHA256) — Derive individual encryption keys from a master key and user_id, enabling O(1) GDPR erasure via key destruction.
- Blind-index tokens for encrypted search — HMAC-based tokens that allow querying encrypted fields without decryption.
- Hexastore index — Six permutations of the subject-predicate-object triple (SPO, SOP, PSO, POS, OPS, OSP) for efficient knowledge graph queries.
- Full-text search (FTS5 or equivalent) — Text search across grain content.
- Hash-chained audit trail — Every write operation is logged in a hash chain for tamper-evident auditing.
- Crash recovery and reconciliation — The store must survive unexpected termination without data corruption.
- Policy engine with compliance presets — GDPR, HIPAA, SOX, and other regulatory frameworks.
Start at Level 1, prove correctness with test vectors, then graduate to Level 2. Level 3 is for teams building production memory infrastructure.
Step-by-Step Level 1 Implementation
Let us build a Level 1 Minimal Reader from scratch. The goal: given a blob of bytes (a .mg grain), parse it, expand compacted field names, validate the type, compute the SHA-256 content address, and verify it matches a claimed address.
Step 1: Parse the 9-Byte Header
Every OMS blob starts with a fixed 9-byte header:
Byte 0: Version (must be 0x01)
Byte 1: Flags (bit field)
Byte 2: Type (memory type enum: 0x01=Fact, 0x02=Episode, ..., 0x07=Goal)
Bytes 3-4: Namespace hash (first 2 bytes of SHA-256(namespace), big-endian uint16)
Bytes 5-8: Created-at (uint32 epoch seconds, big-endian)
Your parser should:
- Check that the blob is at least 10 bytes (9-byte header + minimum 1-byte payload).
- Verify byte 0 is
0x01. If not, reject withERR_VERSION. - Extract the flags byte. The bits tell you about signing, encryption, compression, content refs, embedding refs, CBOR encoding, and sensitivity level.
- Extract the type byte. Values
0x01through0x07are the ten standard grain types. - Extract the namespace hash (bytes 3-4) and created-at timestamp (bytes 5-8).
Step 2: Decode the MessagePack Payload
Bytes 9 onward are the canonical MessagePack payload. Use a MessagePack library to decode this into a map (dictionary/object).
The payload MUST decode to a map. If it decodes to any other type, reject with ERR_NOT_MAP.
Step 3: Reverse Field Compaction
The decoded map uses short keys. Replace them with full names:
"t" → "type"
"s" → "subject"
"r" → "relation"
"o" → "object"
"c" → "confidence"
"st" → "source_type"
"ca" → "created_at"
"ns" → "namespace"
"adid" → "author_did"
"ctx" → "context"
...
The full mapping is in Section 6 of the spec. Unknown keys are preserved as-is — this is how OMS achieves forward compatibility with future spec versions.
Step 4: Validate the Type Field
After decompaction, check that the type field exists. If absent, reject with ERR_NO_TYPE. The value must be one of: "belief", "event", "state", "workflow", "action", "observation", or "goal".
For unknown type values, a Level 1 reader deserializes the grain as an opaque map without schema validation — it does not reject the grain.
Step 5: Compute SHA-256 Over the Entire Blob
The content address is the SHA-256 hash of the complete blob bytes — header plus payload. Not just the payload. The header is part of the hashed content.
import hashlib
content_address = hashlib.sha256(blob_bytes).hexdigest()The result is a 64-character lowercase hexadecimal string.
Step 6: Use Constant-Time Comparison for Hash Verification
When verifying that a computed content address matches a claimed one, do not use ==. Use a constant-time comparison function:
import hmac
is_valid = hmac.compare_digest(expected_hash, computed_hash)import "crypto/subtle"
isValid := subtle.ConstantTimeCompare([]byte(expected), []byte(computed)) == 1import crypto from "crypto";
const isValid = crypto.timingSafeEqual(
Buffer.from(expected), Buffer.from(computed)
);Verifying with Test Vector 1
The spec provides Vector 1 as a complete byte-level reference. Let us walk through it.
Input (a minimal Belief grain, using the v1.0 type name "fact"; v1.2 canonical name is "belief"):
{
"type": "belief",
"subject": "user",
"relation": "prefers",
"object": "dark mode",
"confidence": 0.9,
"source_type": "user_explicit",
"created_at": 1768471200000,
"namespace": "shared",
"author_did": "did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK"
}Expected content address:
3288d0d41cf49a1d428e404f0b6a6fe60388be9536937557f6139b813d53a520
The complete blob is 159 bytes. Here is the hex:
01 00 01 a4 d2 69 68 ba a0 89 a4 61 64 69 64 d9 38 64 69 64 3a 6b 65 79 3a
7a 36 4d 6b 68 61 58 67 42 5a 44 76 6f 74 44 6b 4c 35 32 35 37 66 61 69 7a
74 69 47 69 43 32 51 74 4b 4c 47 70 62 6e 6e 45 47 74 61 32 64 6f 4b a1 63
cb 3f ec cc cc cc cc cc cd a2 63 61 cf 00 00 01 9b c1 19 01 00 a2 6e 73 a6
73 68 61 72 65 64 a1 6f a9 64 61 72 6b 20 6d 6f 64 65 a1 72 a7 70 72 65 66
65 72 73 a1 73 a4 75 73 65 72 a2 73 74 ad 75 73 65 72 5f 65 78 70 6c 69 63
69 74 a1 74 a4 66 61 63 74
Header breakdown (first 9 bytes):
01— Version 100— Flags: all bits zero (public, MessagePack encoding, unsigned, no content refs, no embedding refs, no compression, no encryption)01— Type: Fact (0x01)a4 d2— Namespace hash: first 2 bytes of SHA-256("shared"), as uint16 big-endian69 68 ba a0— Created-at: 1768471200 decimal =0x6968baa0, which is 2026-01-15T10:00:00Z in epoch seconds, big-endian
Payload breakdown (bytes 9 onward):
89— MessagePack fixmap with 9 entries- Keys in lexicographic order:
"adid","c","ca","ns","o","r","s","st","t" - Confidence
0.9as float64:cb 3f ec cc cc cc cc cc cd(markercb+ 8 bytes IEEE 7543feccccccccccccd) - Created_at
1768471200000as uint64:cf 00 00 01 9b c1 19 01 00
Parse these 159 bytes, compute SHA-256, and verify 3288d0d41cf49a1d428e404f0b6a6fe60388be9536937557f6139b813d53a520.
Round-Trip Testing
Section 22.6 of the spec defines the round-trip conformance test. This is the single most important test for a Level 2 implementation:
- Serialize a grain to blob bytes.
- Hash the blob to get a content address.
- Compare the content address against the expected test vector value.
- Deserialize the blob back into a grain object.
- Serialize again. The result MUST match the original blob bytes exactly.
If step 5 produces different bytes, your implementation has a round-trip fidelity bug. Common causes: auto-inserting default values for absent fields, sorting arrays that should preserve insertion order, or failing to preserve unknown fields.
Implementation Libraries
The spec provides a recommended library for each major language, along with the mechanism each uses for sorted keys:
| Language | Library | Sorted Keys | Notes |
|---|---|---|---|
| Python | ormsgpack | OPT_SORT_KEYS | Rust-backed, fastest option |
| Python | msgpack | sort_keys=True | Pure Python fallback |
| Rust | rmp-serde | Via BTreeMap | Natural ordering from data structure |
| Go | msgpack/v5 | Manual sorting | You are responsible for sorting keys |
| JavaScript | @msgpack/msgpack | Pre-sort keys | Manual sorting required before encoding |
| Java | jackson-dataformat-msgpack | SORT_PROPERTIES_ALPHABETICALLY | Feature flag on ObjectMapper |
| C# | MessagePack-CSharp | Via SortedDictionary | Built-in support |
String Normalization
All strings — keys and values — must be NFC-normalized (Unicode Normalization Form Canonical Composition) before encoding. The spec provides library recommendations:
- Python:
unicodedata.normalize("NFC", s) - Go:
golang.org/x/text/unicode/normpackage - JavaScript:
String.prototype.normalize("NFC") - Java:
java.text.Normalizer.normalize(s, Normalizer.Form.NFC)
NFC normalization ensures that characters like e + combining acute accent (\u0301) are collapsed to the precomposed form \u00e9. Without normalization, the two representations produce different bytes and different content addresses. Most ASCII text is already NFC, but non-ASCII characters in user names or descriptions require explicit normalization.
Common Pitfalls
Based on the canonical serialization rules in Section 4, here are the bugs that trip up every first implementation:
1. Forgetting to sort keys. Many languages preserve insertion order by default (Python dicts, JavaScript objects). MessagePack will faithfully encode keys in whatever order you provide them. If that order is not lexicographic, your content address will differ from every other implementation.
2. Using float32 instead of float64. Some MessagePack libraries default to float32 when the value fits. OMS requires float64 always. A confidence value of 0.9 encoded as float32 produces 5 bytes (ca 3f 66 66 66). As float64, it produces 9 bytes (cb 3f ec cc cc cc cc cc cd). Same number, different bytes, different content address.
3. Not NFC-normalizing strings. Most test strings are ASCII, so tests pass without normalization. The bug only surfaces when a user enters a combining character — and at that point, you have grains with incorrect content addresses in your store.
4. Including null values. The spec requires null/None/nil map entries to be omitted entirely. If your serializer writes "ctx": null into the MessagePack output, the bytes change, the hash changes, and compatibility breaks.
5. Auto-inserting defaults on round-trip. If your deserializer fills in confidence: 0.0 for an absent field, and your serializer writes it back, you have injected a field that was not in the original blob. The round-trip test will fail.
6. Using non-canonical integer encoding. The integer 42 must be encoded as a positive fixint (1 byte: 0x2a), not as a uint8 (2 bytes: 0xcc 0x2a). Both decode to 42, but they produce different bytes.
Planning Your Implementation Path
If you are starting from scratch, here is a practical roadmap:
Week 1: Level 1 reader. Parse header, decode MessagePack, reverse field compaction, validate type, compute SHA-256. Verify against Vector 1.
Week 2: Canonical serializer. Implement the 10-step serialization algorithm from Section 4.9. Verify round-trip fidelity: serialize, hash, deserialize, serialize again — bytes must match.
Week 3: Schema validation + Store protocol. Enforce required fields per memory type. Add get, put, delete, list, exists.
Week 4: Invalidation policy. Implement supersede, enforce invalidation_policy, handle fail-closed for unknown modes.
At the end of week 4, you have a Level 2 implementation. Level 3 — encryption, hexastore indexing, full-text search, audit trails — depends on your storage backend and compliance requirements.
Conclusion
Building an OMS implementation is not about implementing the entire spec at once. Start with Level 1 — reading and verifying grains — then layer writing, schema validation, and store operations as your needs grow. The test vectors are your guardrails. The library table tells you which tools to use. The common pitfalls tell you what to watch for. The rest is engineering.