The .mg File Format: One File, Full Memory

Individual memory grains are the atomic units of agent knowledge. Each one is a self-contained binary blob — a fact, an episode, an observation — identified by its SHA-256 content address. But atoms are not what users see. Users see files. They copy files, email files, archive files, and back up files.

The .mg file format is what makes that possible. Defined in Section 11 of the Open Memory Specification, the .mg container packages any number of grains into a single portable file with a fixed-size header, a random-access index, optional compression, and a SHA-256 integrity checksum. It is the physical manifestation of Design Principle 10: "One file, full memory."

The mental model

Section 11.1 of the spec offers a clean analogy for understanding where .mg files fit:

.sqlite = database file (many rows)
.git    = repository (many objects)
.mg     = memory file (many grains)

A .sqlite file is the portable unit of a relational database — you can copy it, open it on any platform, and query it without a running server. A .git directory is the portable unit of version history — clone it and you have every object, every commit, every branch. A .mg file is the portable unit of agent memory — every grain, indexed and checksummed, in a single file that any OMS implementation can read.

Individual grains live in blob storage by content hash. The .mg file is what users see, copy, share, and archive. It is the interchange format at the file level, just as the individual grain blob is the interchange format at the knowledge level.

File layout

The .mg file has four regions, laid out sequentially:

.mg File Structure:

+----------+------------------+
| Header   | Magic: "MG\x01"  |  3 bytes
|          | Flags: uint8     |  1 byte
|          | Grain count: u32 |  4 bytes
|          | Field map ver: u8|  1 byte
|          | Compression: u8  |  1 byte
|          | Reserved: 6 bytes|  6 bytes
+----------+------------------+  = 16 bytes
| Index    | Grain offsets    |  4 bytes x grain_count (u32 each)
|          | (enables random access)
+----------+------------------+
| Grains   | grain 0 bytes    |  variable
|          | grain 1 bytes    |  variable
|          | ...              |
|          | grain N-1 bytes  |  variable
+----------+------------------+
| Footer   | SHA-256 checksum |  32 bytes (over header + index + grains)
+----------+------------------+

The header is always exactly 16 bytes. The index is 4 * grain_count bytes. The grains region is variable. The footer is always exactly 32 bytes. This means you can compute the start of any region from the header alone — no scanning, no variable-length prefixes, no ambiguity.

Header fields: 16 bytes that describe everything

The 16-byte header packs five fields plus reserved space for future use.

Magic bytes: `0x4D 0x47 0x01`

The first three bytes are the ASCII characters "MG" followed by version byte 0x01. This serves dual purposes: file type identification (any tool can check the first three bytes to confirm this is an .mg file) and version detection (the third byte distinguishes v1 from future versions). The magic bytes are the same concept as the %PDF header in PDF files or the GIF89a header in GIF images — a universally recognized signature.

Flags: one byte, four defined bits

The flags byte is a bitfield with four defined bits:

Bit	Meaning
0	`sorted` — grains are sorted by `created_at` (ascending)
1	`deduplicated` — no duplicate content addresses
2	`compressed` — grain region is zstd-compressed (single block)
3	`field_map_included` — file includes custom FIELD_MAP for application-defined fields
4-7	Reserved

Bit 0 (sorted) tells consumers they can binary-search by timestamp without scanning all grains. Bit 1 (deduplicated) guarantees no content-address collisions within the file — useful for merge operations that would otherwise need to check for duplicates. Bit 2 (compressed) indicates that the grain region is a single compressed block. Bit 3 (field_map_included) signals that the file carries a custom field mapping for application-defined short keys beyond the standard compaction table.

Grain count: `u32` (4 bytes)

A 32-bit unsigned integer giving the number of grains in the file. This sets the size of the index region (grain_count * 4 bytes) and tells consumers how many grains to expect. The maximum is 4,294,967,295 grains per file — far beyond any practical collection.

Field map version: `u8` (1 byte)

Identifies which version of the field compaction mapping (Section 6 of the spec) was used to encode the grains. This allows future spec revisions to add new short keys without breaking existing files — a reader encountering an unknown field map version can fall back to the standard mapping or reject the file cleanly.

Compression codec: `u8` (1 byte)

Specifies which compression algorithm was used on the grain region:

Value	Codec
`0x00`	None (uncompressed)
`0x01`	zstd (default, level 3)
`0x02`	lz4 (low-latency)
`0x03`-`0xFF`	Reserved

Zstandard at level 3 is the default — it offers strong compression ratios with reasonable speed, making it suitable for archives and file transfers. LZ4 is the alternative for scenarios where decompression speed matters more than size — real-time streaming, low-latency agent startup, embedded systems. Uncompressed is available for debugging, testing, or when the grains are already small enough that compression overhead is not worth it.

Reserved: 6 bytes

Six bytes reserved for future header extensions. Implementations MUST write these as zeros and MUST ignore non-zero values in these bytes when reading.

Random access via the offset index

The index region immediately follows the header. It contains one u32 offset for each grain, pointing to where that grain's bytes start within the grain region. This makes random access straightforward — you do not need to scan through all preceding grains to find the one you want.

Section 11.4 provides a Python example showing how to read an arbitrary grain:

# Read grain #42 from a .mg file
header_size = 16
offset_start = header_size + (42 * 4)
offset = int.from_bytes(data[offset_start:offset_start+4], 'big')
next_offset = int.from_bytes(data[offset_start+4:offset_start+8], 'big')
grain_bytes = data[offset:next_offset]

The logic is simple: compute where grain 42's offset lives in the index (16 + 42 * 4 = 184), read four bytes for the start offset, read the next four bytes for the end offset (which is the start of grain 43), and slice the grain region between those two positions. This is O(1) in the number of grains — it does not matter whether the file contains 10 grains or 10 million.

This trade-off deserves attention. If you need random access to individual grains in a large collection and cannot afford to decompress the entire grain region, use an uncompressed .mg file. If you are archiving, transferring, or backing up and will process grains sequentially anyway, compressed files give you better size efficiency. The choice is encoded in the header — consumers know immediately which path they are on.

The footer is a SHA-256 checksum computed over the concatenation of header, index, and grains:

SHA-256(header [16 bytes] || index [grain_count * 4 bytes] || grains [variable])

This single hash verifies the integrity of the entire file. If any byte in the header, index, or grain region has been modified — whether by corruption, truncation, or tampering — the checksum will not match.

The footer checksum complements the per-grain content addresses. Each grain carries its own SHA-256 identity (the content address computed over its individual blob bytes), which verifies that grain in isolation. The footer checksum verifies the collection: the correct grains are present, in the correct order, with the correct index pointing to the correct offsets. Together, these two layers provide both per-grain and per-file integrity.

Wire framing for streaming

Not all grain transfer happens through files. Section 11.6 defines a wire framing format for streaming scenarios — WebSocket connections, Server-Sent Events, Kafka topics, raw TCP streams — where grains arrive one at a time rather than in a pre-built file.

The wire format uses length-prefixed framing:

+------+------------------+
| u32  | grain 0 bytes    |  length-prefixed frame
+------+------------------+
| u32  | grain 1 bytes    |  length-prefixed frame
+------+------------------+
| 0x00000000             |  zero-length sentinel = end of stream
+------+------------------+

Each frame is a u32 length prefix followed by that many bytes of grain data. A zero-length frame (0x00000000) signals the end of the stream. This is the simplest possible framing protocol — no headers, no metadata, no negotiation. The receiver reads four bytes, allocates that many bytes for the grain, reads the grain, and repeats until it sees the sentinel.

The spec is explicit that wire framing is NOT saved to disk. It is a transport-layer concern. When grains arrive over a stream and need to be persisted, the receiver builds a .mg file (with header, index, and footer) from the received grains. The wire format and the file format serve different purposes and should not be conflated.

Design Principle 10 in action

The .mg file format is the direct implementation of Design Principle 10 from Section 1.2 of the spec: "One file, full memory — a .mg container file is the portable unit for full knowledge export."

This principle has deep practical consequences. It means that an agent's entire knowledge base can be exported as a single file. That file can be copied to a USB drive, uploaded to cloud storage, emailed to a colleague, or archived for regulatory compliance. The recipient needs only an OMS-compliant reader to access every grain inside — no database, no server, no API credentials.

Consider the alternatives:

A SQL database dump requires a running database server to import and query.
A collection of JSON files requires convention on naming, structure, and relationships.
A proprietary export format requires the originating vendor's tools to read.

The .mg file requires nothing but a parser that understands the format described in Section 11. The 16-byte header is self-describing. The index enables random access. The footer verifies integrity. The grains are self-contained, content-addressed binary blobs. Everything a consumer needs to read, verify, and use the memory is in the file itself.

Practical use cases

Memory backup and export

The most straightforward use case: an agent's knowledge needs to be backed up. The agent's store writes all grains into a .mg file — sorted by created_at (flag bit 0), deduplicated by content address (flag bit 1), compressed with zstd (flag bit 2). The result is a compact, verifiable archive that can be stored offline, replicated to a backup region, or handed to the user as a portable export of their agent's knowledge.

Cross-platform agent migration

An organization switches from one AI platform to another. Their agents have accumulated months of knowledge: facts, episodes, tool call records, workflow definitions, goals. Without a portable format, this knowledge is locked in the old platform's proprietary storage.

With .mg files, migration is a file operation. Export the grains from the old platform as .mg files. Import them into the new platform. Every grain's content address can be verified — the receiving system knows it got exactly what was exported, with no corruption or alteration. The grains' content is platform-independent: the same canonical MessagePack bytes, the same SHA-256 hashes, the same semantic structure, regardless of which system produced or consumes them.

Compliance archives

Regulated industries must retain records for years — sometimes decades. Financial services under SOX, healthcare under HIPAA, government agencies under FOIA. The .mg file provides a self-contained, integrity-verified archive format. The footer checksum proves the file has not been tampered with since creation. Individual grain content addresses prove each knowledge record is intact. COSE signatures on individual grains (if present) prove who created each record. Structural tags like reg:sox or phi:medication enable compliance-aware processing years after the grains were created.

Offline agent deployment

An autonomous agent deployed in a disconnected environment — a research station, a submarine, a disaster response site — cannot call back to a cloud knowledge store. It needs to carry its knowledge with it. A .mg file loaded at deployment time provides the agent's full memory: all accumulated facts, relevant episodes, tool call history, active goals. The agent operates on this local knowledge base, producing new grains as it works. When connectivity is restored, the new grains are exported as a .mg file and merged back into the central store.

From atoms to archives

The .mg file format completes the OMS storage story. Individual grains are the atoms — self-contained, content-addressed, cryptographically verifiable. The .mg container is the molecule — a structured collection of atoms with an index for fast access and a checksum for integrity. Together, they provide a complete pipeline from individual knowledge records to portable, shareable, archivable files.

Design Principle 10 is not just a statement about file formats. It is a statement about user agency. When an agent's memory lives in a portable file that the user controls — not locked in a vendor's database, not trapped behind a proprietary API, not scattered across incompatible storage systems — the user has genuine ownership of their agent's knowledge. They can back it up, migrate it, audit it, archive it, or hand it to a regulator. The .mg file makes that possible.