Grain Type Deep Dive: State

Imagine an AI agent that has been researching a complex topic for twenty minutes. It has visited a dozen sources, compiled intermediate findings, formed tentative conclusions, and planned its next three steps. Then the process crashes. Without a save point, all that accumulated state is gone. The agent starts over from scratch.

This is the problem States solve. Defined in Section 8.3 of the Open Memory Specification, States are agent state snapshots — complete captures of where an agent is in a task, what it has found, what it plans to do next, and what it has already done. They are, quite literally, save points for AI agents.

This post covers the State grain type in depth: its required and optional fields, the context map that holds arbitrary agent state, the plan and history fields, how States enable fault tolerance, and the immutability model that tracks state evolution through supersession chains.

What is a State?

The spec defines a State as:

Agent state snapshot for save/restore.

Six words that capture the entire purpose. A State grain freezes the agent's current state into an immutable grain so that the agent (or a different agent) can later restore that state and continue from where things left off.

This is fundamentally different from the other OMS grain types:

Beliefs record what is known (declarative memory)
Events record what happened (interaction records)
Workflows record how to do things (procedural memory)
States record where you are right now (state snapshots)

If Beliefs are your notes, Events are your recordings, and Workflows are your playbooks, then States are your bookmarks — they mark a specific point in a process so you can return to it later.

Required Fields

The State type has just three required fields (Section 8.3):

Field	Type	Description
`type`	string	Must be `"state"`
`context`	map	Agent state snapshot (arbitrary key-value pairs)
`created_at`	int64	Creation timestamp in epoch milliseconds

The context field is the heart of a State grain. It is a map — meaning it holds arbitrary key-value pairs that represent whatever state the agent needs to preserve. The spec does not prescribe what goes in the context map. It could be the current task description, accumulated findings, configuration state, environment variables, or any other data the agent needs to resume.

Here is a minimal State grain:

{
  "type": "state",
  "context": {
    "current_task": "Research quantum computing advances in 2025",
    "sources_checked": "12",
    "key_findings": "3 papers identified on error correction"
  },
  "created_at": 1768471200000
}

The header byte for State is 0x03 (Section 3.1.1, Type enum), so this grain's 9-byte fixed header would begin with 01 00 03 — version 1, no flags set, State type.

The Context Map

The context map is deliberately unstructured. OMS does not define a schema for what goes inside it — that is left to the agent implementation. This design choice reflects the reality that different agents, different tasks, and different domains all require different state representations.

However, because the context map is part of a canonical serialization (Section 4), its contents must follow OMS rules:

Map keys are sorted lexicographically by UTF-8 byte representation (Section 4.1). This applies recursively to any nested maps within the context.
All strings are NFC-normalized (Section 4.4).
Null values are omitted (Section 4.5).
Integer encoding uses the smallest representation (Section 4.2).
Float values must be float64 (Section 4.3) and must not be NaN or Infinity.

Here is a more realistic context map for a research agent:

{
  "type": "state",
  "context": {
    "accumulated_findings": "Found 3 papers on topological qubits, 2 on surface codes",
    "current_phase": "literature_review",
    "current_task": "Survey quantum error correction methods published in 2025",
    "decisions_made": "Narrowed scope to superconducting architectures",
    "environment": "arxiv_api_v2",
    "query_count": "47",
    "sources_remaining": "IEEE Xplore, Nature Physics"
  },
  "created_at": 1768471200000,
  "namespace": "research",
  "user_id": "researcher-042"
}

Note that all values in the context map are strings in this example. While OMS maps support various MessagePack types as values, using strings for context values maximizes portability across different agent implementations.

Optional Fields

Beyond the required fields, States support four optional fields:

Field	Type	Description
`plan`	array[string]	Planned actions — what the agent intends to do next
`history`	array[map]	Action history — what the agent has already done
`user_id`	string	Associated data subject (for GDPR scoping)
`structural_tags`	array[string]	Classification tags

The `plan` Field

The plan field is an array of strings representing the agent's planned next actions, in order. Per Section 4.6, array elements preserve insertion order — so the first element is the next planned action, the second is the one after that, and so on.

{
  "plan": [
    "Search IEEE Xplore for surface code implementations",
    "Cross-reference findings with IBM Quantum roadmap",
    "Draft summary of error correction approaches",
    "Compare overhead estimates across methods"
  ]
}

This gives any restoring agent an explicit roadmap: pick up the plan from the first uncompleted step and continue executing.

The `history` Field

The history field is an array of maps recording what the agent has already done. Each map entry can contain whatever fields the agent finds useful — there is no prescribed schema for history entries.

{
  "history": [
    {
      "action": "search_arxiv",
      "query": "quantum error correction 2025",
      "results_count": "23",
      "timestamp": "1768470000000"
    },
    {
      "action": "filter_results",
      "criteria": "superconducting architectures only",
      "remaining": "8",
      "timestamp": "1768470300000"
    },
    {
      "action": "read_paper",
      "paper_id": "arxiv:2501.12345",
      "summary": "Novel surface code with 10x lower overhead",
      "timestamp": "1768470600000"
    }
  ]
}

Fault Tolerance: The Primary Use Case

The most immediate use case for States is fault tolerance. AI agents operate in environments where failures are not just possible but expected:

Cloud functions hit timeout limits
API rate limits interrupt multi-step processes
Network partitions disconnect agents from their tools
Memory pressure causes container restarts
Hardware failures take down compute nodes

Without States, any failure means starting over. With States, the agent saves its state periodically, and any failure only loses work back to the last saved State.

The State-Recovery Pattern

Here is how the pattern works in practice:

1. Periodic Save

During a long-running task, the agent creates State grains at regular intervals:

{
  "type": "state",
  "context": {
    "current_step": "3",
    "total_steps": "10",
    "intermediate_results": "Steps 1-3 complete, 47 records processed"
  },
  "plan": [
    "Process records 48-100",
    "Generate summary report",
    "Send notification"
  ],
  "history": [
    {"action": "process_batch", "range": "1-20", "status": "complete"},
    {"action": "process_batch", "range": "21-40", "status": "complete"},
    {"action": "process_batch", "range": "41-47", "status": "complete"}
  ],
  "created_at": 1768471200000,
  "structural_tags": ["data-pipeline", "batch-processing"]
}

2. Failure Occurs

The agent crashes after processing record 62. Without a State grain, 62 records of progress would be lost.

3. Recovery

A new agent instance (or the restarted original) loads the latest State grain, reads the context, plan, and history, and resumes from step 3. It only needs to reprocess records 48-62, not all 62.

The content address of the State grain (SHA-256 of its complete blob) serves as a stable reference point. The recovering agent can retrieve the exact State grain by its content address and verify its integrity — any tampering would change the hash.

Use Cases Beyond Fault Recovery

Long-Running Research Tasks

Research agents that need to survey large bodies of literature, analyze datasets, or compile reports benefit enormously from State grains. A research task might span hours or days, and the agent needs to save intermediate findings, track which sources have been consulted, and maintain its analysis state.

The context map holds the research state: which papers have been read, what key findings have emerged, what hypotheses are being tracked. The plan field lists remaining research steps. The history field records completed searches and analyses with their results.

Multi-Step Customer Onboarding

Customer onboarding workflows involve multiple stages: identity verification, preference collection, system configuration, training, and follow-up. If the process is interrupted — the customer drops off, the agent times out, or the system restarts — the State grain allows the next interaction to resume exactly where the previous one stopped.

{
  "type": "state",
  "context": {
    "customer_id": "cust-7891",
    "onboarding_stage": "preference_collection",
    "identity_verified": "true",
    "preferences_collected": "theme, language, notification_settings",
    "preferences_remaining": "data_retention, privacy_level"
  },
  "plan": [
    "Complete preference collection",
    "Configure system defaults",
    "Send welcome documentation",
    "Schedule 7-day follow-up"
  ],
  "created_at": 1768471200000,
  "user_id": "cust-7891",
  "structural_tags": ["onboarding", "stage-2"]
}

Multi-Agent Handoffs

When one agent needs to transfer a task to another — because of specialization, load balancing, or escalation — the State grain serves as the complete state transfer mechanism. Agent A creates a State grain capturing everything it knows and has done. Agent B loads that State grain and picks up seamlessly.

This is more reliable than ad-hoc state transfer because the State grain is:

Immutable: Agent B gets exactly the state Agent A saved, verified by content address
Complete: The context, plan, and history fields capture the full picture
Auditable: The provenance chain tracks who created the State grain and why

Autonomous System Mission Recovery

For autonomous vehicles, drones, or robots, State grains enable mission recovery after reboots or hardware faults. The context map captures mission state: current objective, position, environmental conditions, decisions made. The plan captures remaining waypoints or mission steps. After recovery, the system can evaluate whether to continue the mission from the State grain or abort based on changed conditions.

The Immutability Model

States follow the same immutability model as all OMS grains. This has an important consequence: each State is a new grain, not an update to the old one. When an agent saves its state three times during a task, it creates three distinct State grains, each with its own content address.

The supersession chain tracks the evolution of state over time:

State 1 (t=0):  context={step: "1"}, plan=["step 2", "step 3", "step 4"]
    |
    v  (superseded_by = hash of State 2)
State 2 (t=5):  context={step: "2"}, plan=["step 3", "step 4"]
    |
    v  (superseded_by = hash of State 3)
State 3 (t=10): context={step: "3"}, plan=["step 4"]

Each State is a complete snapshot — not a diff. The superseded_by field (set by the index layer, per Section 15.3) links each State to the one that replaced it. This creates a navigable history: you can find the latest State by following the chain to the end, or reconstruct the agent's position at any point by loading the State from that time.

Why Complete Snapshots, Not Diffs?

A State-as-diff approach would be more space-efficient but far less robust. If any State in the chain is lost or corrupted, all subsequent diffs become unusable. With complete snapshots:

Any single State is self-contained — you can restore from it without any other grain
Content addressing verifies integrity of each snapshot independently
Recovery does not require replaying a chain of deltas
Multi-agent handoffs only need a single grain transfer

The trade-off is storage space. Each State contains the full context map, full plan, and full history. For agents with large state, this can be significant. Device profiles (Section 18) provide guidance: Extended profile allows blobs up to 1 MB, Standard up to 32 KB, and Lightweight up to 512 bytes.

Serialization Details

When a State grain is serialized, the canonical algorithm (Section 4.9) applies:

Validate required fields: type, context (must be a map), created_at.
Compact field names: type becomes t, created_at becomes ca, context becomes ctx. The State-specific fields plan and history keep their full names (Section 6.3).
Sort map keys lexicographically — including the keys inside the context map (recursive sorting per Section 4.1).
Handle nested maps in history: Per Section 4.7, history entries are NOT compacted but their keys ARE sorted lexicographically.
NFC-normalize all strings.
Omit null values.
Encode as MessagePack (or CBOR if flag bit 5 is set).
Prepend the 9-byte fixed header with type byte 0x03.
Hash with SHA-256.

The key serialization subtlety for States is the interaction between compacted and non-compacted nested structures. The top-level field names are compacted (context becomes ctx), but the contents of context, plan, and history are not compacted — they use whatever keys the agent chose. However, all map keys at every nesting level are sorted lexicographically.

States and Compliance

States can contain arbitrary data in the context map, which raises compliance considerations:

GDPR: If the context contains personal data, the user_id field enables scoping for right-to-erasure requests. Per the per-user encryption pattern (Section 20.3), State grains can be encrypted with user-specific keys, enabling crypto-erasure (destroying the key renders all grains unrecoverable).

Audit trails: Because each State is immutable and content-addressed, the full history of agent state evolution is tamper-evident. Combined with COSE Sign1 signing (Section 9), States can provide cryptographic proof of what state an agent was in at any given time.

Sensitivity classification: The header flags (bits 6-7) allow States to be classified: 00=public, 01=internal, 10=PII, 11=PHI. A State containing patient data in a healthcare context would use the PHI classification.

State Design Patterns

Layered States

For complex agents, consider a layered State strategy:

Frequent lightweight States: Save minimal context (current step, key metrics) every few minutes. Tags: ["checkpoint-light"].
Periodic full States: Save complete state including history every 15-30 minutes. Tags: ["checkpoint-full"].
Milestone States: Save at significant task boundaries (phase completion, major decisions). Tags: ["checkpoint-milestone"].

The structural_tags field enables filtering by State type during recovery. The recovering agent can first look for a milestone State, then fall back to a full State, then to a lightweight State.

Conditional State Saves

Not every moment deserves a State save. Good State triggers include:

Before making an irreversible action (API call, file write, message send)
After completing a significant sub-task
When accumulated state crosses a size threshold
At regular time intervals during long tasks
Before a known timeout boundary

The plan field is particularly valuable here — if the next planned action is risky or expensive, a State save before it ensures recovery to a known-good position.

Summary

States are the save-point mechanism that makes AI agents resilient. They capture complete agent state — the context of what the agent knows, the plan of what it intends to do, and the history of what it has already done — in an immutable, content-addressed grain.

The immutability model means each State is a complete snapshot, not a differential update. The supersession chain links snapshots into a navigable history of state evolution. And the content addressing guarantees that any State can be retrieved and verified independently.

Whether your agents are doing long-running research, multi-step customer onboarding, multi-agent handoffs, or autonomous mission execution, States provide the foundation for fault tolerance. When something goes wrong — and in distributed systems, something always goes wrong — the latest State is the starting line, not the finish line.