Skip to main content
Memory GrainMemory Grain
GitHub
All articles
field-compactionoptimizationbinary-formattechnical

Field Compaction: Shrinking Grains Without Losing Meaning

How OMS maps human-readable field names to compact short keys for efficient binary storage — a complete guide to Section 6 of the Open Memory Specification, covering core fields, type-specific fields, compaction rules, and nested compaction boundaries.

12 min read

A Belief grain in the Open Memory Specification has fields like subject, relation, object, confidence, source_type, created_at, author_did, and namespace. Those are clear, readable names — exactly what you want when designing a schema. But when you are serializing millions of grains into compact binary blobs, every byte counts.

The string "confidence" is 10 bytes in UTF-8. The string "c" is 1 byte. Multiply that saving across every field in every grain in a store with millions of entries, and field compaction becomes a significant optimization.

Section 6 of the OMS v1.2 specification defines a bijective mapping — a one-to-one, reversible correspondence — between human-readable field names and short keys. Serializers replace full names with short keys before encoding. Deserializers reverse the mapping after decoding. The grain's logical structure is preserved exactly; only the wire representation changes.

How Compaction Works

The concept is straightforward:

  1. Before serialization, every known field name is replaced with its short key from the field map.
  2. The grain is serialized (sorted, encoded as MessagePack) using the short keys.
  3. After deserialization, every short key is replaced with its full field name.

Because the mapping is bijective (every full name maps to exactly one short key, and vice versa), this transformation is perfectly reversible. No information is lost. The grain you get after deserialization is identical to the grain you started with before serialization.

Core Fields (Section 6.1)

The core field map applies to all ten grain types. Here is the complete table:

Full NameShort KeyTypeDescription
typetstringGrain type: "belief", "event", "state", "action", etc.
subjectsstringEntity being described (RDF subject)
relationrstringSemantic relationship (RDF predicate)
objectostringValue or target (RDF object)
confidencecfloat64Credibility score [0.0, 1.0]
source_typeststringProvenance origin (open enum)
created_atcaint64Creation timestamp (epoch ms)
temporal_typettstring"state" or "observation"
valid_fromvfint64Temporal validity start (epoch ms)
valid_tovtint64Temporal validity end (epoch ms)
system_valid_fromsvfint64When grain became active in system
system_valid_tosvtint64When grain was superseded in system
contextctxmapContextual metadata (string to string)
superseded_bysbstringContent address of superseding grain
importanceimfloat64Importance weighting [0.0, 1.0]
author_didadidstringDID of creating agent
namespacensstringMemory partition/category
user_iduserstringAssociated data subject (GDPR)
structural_tagstagsarray[string]Classification tags
derived_fromdfarray[string]Parent content addresses
consolidation_levelclint0=raw, 1=frequency, 2=pattern, 3=sequence
success_countscintFeedback: successful uses
failure_countfcintFeedback: failed uses
provenance_chainpcarray[map]Full derivation trail
origin_didodidstringOriginal source agent DID
origin_namespaceonsstringOriginal source namespace
content_refscrarray[map]References to external content
embedding_refserarray[map]References to vector embeddings
related_tortarray[map]Cross-links to related grains
_elided_emapSelective disclosure: elided field hashes
_disclosure_of_dostringContent address of original grain (if disclosed)
invalidation_policyipmapProtection policy governing supersession
supersession_justificationsjstringRequired when superseding a soft-locked grain
supersession_authsaarrayCOSE signatures authorizing quorum supersession

That is 33 core field mappings (the contradicted / ct field was removed in v1.2 — use verification_status in the index layer instead). Some short keys are mnemonic (t for type, s for subject, c for confidence), while others use abbreviations (adid for author_did, svf for system_valid_from). The mapping is normative — implementations MUST NOT invent their own short keys.

Type-Specific Fields

Beyond the core fields, each memory type defines additional fields with their own compaction mappings.

Event (Section 6.2)

Full NameShort KeyType
contentcontentstring
consolidatedconsolidatedbool

Note that Event fields retain their full names as short keys. The names are already concise enough that compaction provides no benefit.

State (Section 6.3)

Full NameShort KeyType
planplanarray[string]
historyhistoryarray[map]

Like Event, State fields are already short and retain their names.

Workflow (Section 6.4)

Full NameShort KeyType
stepsstepsarray[string]
triggertriggerstring

Again, these field names are short enough that the mapping is an identity function.

Action (Section 6.5)

Full NameShort KeyTypeNotes
tool_nametnstring
inputinpmapv1.2 (replaces arguments/args, removed)
contentcntanyv1.2 (replaces result/res, removed)
is_erroriserrboolv1.2 (replaces success/ok, removed; polarity inverted)
action_phaseaphasestringv1.2 new: "definition" | "call" | "result"
tool_call_idtcidstringv1.2 new
error_typeetypestringv1.2 new
errorerrstring
duration_msdurint
parent_task_idptidstring

Action fields see meaningful compaction. tool_name (9 bytes) becomes tn (2 bytes). input (5 bytes) becomes inp (3 bytes). Note: the old arguments/args, result/res, and success/ok short keys were removed in v1.2 — implementations emitting those keys are non-conformant.

Observation (Section 6.6)

Full NameShort KeyTypeNote
observer_idoidstring
observer_typeotypestring
frame_idfidstring
sync_groupsgstring

Observation grains from high-frequency sensor data (LiDAR, cameras, IMUs) and cognitive agents benefit from compaction because they are produced in large volumes. The v1.0 short keys sid and stype (for sensor_id and sensor_type) were removed in v1.2 — use oid and otype exclusively.

Goal (Section 6.7)

Goal has the most type-specific fields of any memory type, reflecting its rich lifecycle semantics:

Full NameShort KeyType
descriptiondescstring
goal_stategsstring
criteriacritarray[string]
criteria_structuredcrsarray[map]
prioritypriint
parent_goalspgsarray[string]
state_reasonsrstring
satisfaction_evidencesearray[string]
progressprogfloat64
delegate_todtostring
delegate_fromdfostring
expiry_policyepstring
recurrencerecstring
evidence_requiredevreqint
rollback_on_failurerofarray[string]
allowed_transitionsatrarray[string]

Sixteen Goal-specific mappings. Fields like satisfaction_evidence (23 bytes) compacting to se (2 bytes) and rollback_on_failure (19 bytes) compacting to rof (3 bytes) provide substantial savings on richly annotated goals.

Before and After: A Compaction Example

To see the impact, consider a Belief grain before and after field compaction.

Before compaction (human-readable):

{
  "type": "belief",
  "subject": "Alice",
  "relation": "works_at",
  "object": "ACME Corp",
  "confidence": 0.95,
  "source_type": "user_explicit",
  "created_at": 1768471200000,
  "author_did": "did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK",
  "namespace": "hr",
  "importance": 0.8,
  "structural_tags": ["employment", "current"]
}

After compaction (short keys):

{
  "t": "belief",
  "s": "Alice",
  "r": "works_at",
  "o": "ACME Corp",
  "c": 0.95,
  "st": "user_explicit",
  "ca": 1768471200000,
  "adid": "did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK",
  "ns": "hr",
  "im": 0.8,
  "tags": ["employment", "current"]
}

Counting just the key bytes (not the values, which are unchanged):

Full KeyBytesShort KeyBytesSaved
type4t13
subject7s16
relation8r17
object6o15
confidence10c19
source_type11st29
created_at10ca28
author_did10adid46
namespace9ns27
importance10im28
structural_tags15tags411
Total1002179

That is a 79-byte reduction in key overhead alone for a single grain with 11 fields. In a store with millions of grains, each averaging 10-15 fields, the cumulative savings are significant — often reducing total key bytes by 70-80%.

Compaction Rules (Section 6.8)

The spec defines four normative rules for how compaction must be applied:

  1. Serializers MUST replace full field names with short keys before encoding. This is not optional. A compliant serializer always compacts. If it emits "confidence" instead of "c", the grain will have a different content address from one that correctly compacts, breaking interoperability.

  2. Deserializers MUST replace short keys with full field names after decoding. The application layer always works with human-readable names. Compaction is invisible to consumers of the deserialized grain.

  3. Unknown keys MUST be preserved as-is in both directions. If a serializer encounters a field name that is not in the field map, it writes it unchanged. If a deserializer encounters a short key that is not in the field map, it passes it through unchanged. This enables forward compatibility — a future version of OMS could add new fields, and older implementations will preserve them without error.

  4. The field compaction mapping is normative and MUST NOT be modified by implementations. You cannot add custom short keys. You cannot change existing mappings. The mapping is part of the specification, and changing it would break interoperability.

Nested Compaction Boundaries

Field compaction applies at the top level of the grain map. But three specific fields also compact the maps nested inside their arrays:

  • content_refs (compacted key: cr) — each entry in this array has its keys compacted using the CONTENT_REF_FIELD_MAP: uri to u, modality to m, mime_type to mt, size_bytes to sz, checksum to ck, metadata to md.

  • embedding_refs (compacted key: er) — each entry uses the EMBEDDING_REF_FIELD_MAP: vector_id to vi, model to mo, dimensions to dm, modality_source to ms, distance_metric to di.

  • related_to (compacted key: rt) — each entry uses the RELATED_TO_FIELD_MAP: hash to h, relation_type to rl, weight to w.

Other array-of-maps fields are NOT compacted recursively. Specifically:

  • provenance_chain (compacted key: pc) — inner maps retain keys like source_hash, method, weight.
  • context (compacted key: ctx) — inner key-value pairs retain their original keys.
  • history (compacted key: history) — inner maps retain their original keys.

This boundary is defined in Section 4.7 (Nested Compaction) of the canonical serialization rules. The distinction matters for content addressing: compacting a provenance_chain entry's inner keys would produce different bytes from not compacting them, so implementations must agree on exactly which fields get nested compaction.

Here is what a content reference looks like before and after nested compaction:

Before nested compaction:

{
  "content_refs": [
    {
      "uri": "cas://sha256:abc123...",
      "modality": "image",
      "mime_type": "image/jpeg",
      "size_bytes": 1048576,
      "checksum": "sha256:abc123..."
    }
  ]
}

After top-level and nested compaction:

{
  "cr": [
    {
      "u": "cas://sha256:abc123...",
      "m": "image",
      "mt": "image/jpeg",
      "sz": 1048576,
      "ck": "sha256:abc123..."
    }
  ]
}

The top-level key content_refs became cr, and inside the array entry, uri became u, modality became m, and so on.

Compaction in the Serialization Pipeline

Field compaction is Step 2 of the 10-step canonical serialization algorithm (Section 4.9). Nested compaction is Step 3. Both happen before key sorting (Step 7).

This ordering matters. After compaction, the keys that get sorted are the short forms (c, ca, cr, ns, o, r, s, st, t), not the full names. The lexicographic order of short keys differs from the order of full names:

Short key order:  c, ca, cr, ns, o, r, s, st, t
Full name order:  confidence, content_refs, created_at, namespace, object, relation, source_type, subject, type

If an implementation sorted first and compacted second, the keys would be in the wrong order and the content address would differ. The spec's step ordering prevents this bug.

Why Not Just Use Short Keys Everywhere?

A natural question: if short keys are more efficient, why not use them as the canonical field names and skip the mapping?

The answer is readability and debuggability. When an engineer is inspecting a grain in a debugging tool, "subject": "Alice" is immediately clear. "s": "Alice" requires consulting the field map. When writing application code that creates grains, grain.confidence = 0.95 is self-documenting. grain.c = 0.95 is cryptic.

The compaction layer lets both worlds coexist. Humans work with full names. The wire format uses short keys. The mapping is mechanical and handled by the serialization library, invisible to application developers.

Conclusion

Field compaction is one of those specification features that is unglamorous but essential at scale. By replacing human-readable field names with minimal short keys, OMS reduces the per-grain overhead of key encoding by 70-80% without sacrificing readability at the application layer.

The rules are strict — serializers MUST compact, deserializers MUST expand, unknown keys MUST be preserved, and the mapping MUST NOT be modified. These constraints ensure that every compliant implementation produces identical bytes for the same grain, maintaining the deterministic serialization guarantee that underpins content addressing.

Combined with the canonical serialization rules (Section 4) and the nested compaction boundaries (Sections 4.7, 7.1, 7.2, 14.2), field compaction completes the picture of how OMS transforms a human-friendly data structure into a compact, deterministic, content-addressable binary blob.