This repository has been archived on 2026-05-05. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
engram/spec/at-rest-encryption.md

381 lines
20 KiB
Markdown

# Engram At-Rest Encryption — Post-Quantum Doctrine
Version 0.1.0 (DRAFT) — April 30, 2026
Status: doctrine; pre-implementation. Sign-off required from Will before code lands.
---
## 0. TL;DR
Engram's persistence is encrypted at rest using a hybrid post-quantum scheme:
- **Data layer** — every node and every edge is sealed with **AES-256-GCM** under a per-record sub-key derived via **HKDF-SHA3-256** from the runtime DEK. AES-256 with a 256-bit key has an effective post-quantum security level of 128 bits (Grover); acceptable.
- **Key wrap layer** — the DEK is wrapped against a Principal's public key using **Kyber-768 KEM** (post-quantum). The wrapped DEK lives on disk as `engram.kek.enc`. The Principal's secret key never touches disk in the running daemon's data directory.
- **Boot** — the daemon receives the Principal's secret key out-of-band (env var path, prompted unlock, or pulled from a hardware-backed agent), runs `pq_kem_decaps` to recover the DEK, mlocks it, serves traffic.
- **Recovery** — the DEK is *additionally* wrapped under a **Shamir K-of-N** split across the validation council. If the wrapping Principal is gone, K members reconstitute the DEK and the data survives.
Engram does **not** encrypt structural metadata (graph topology, IDs, timestamps). Only content, label, tags, and metadata fields of nodes and edges are sealed. This is a deliberate trade-off — see §2.
---
## 1. What Engram actually persists
Engram is an in-process graph store. The runtime *is* the database — there is no SQLite, no sled, no embedded KV layer in the active daemon. (The `engram-data/` and `engram-data-tx-log/` directories under `el/` are leftovers from a prior sled-backed prototype and are not loaded by the current `dist/engram` binary.)
Persistence is a single JSON snapshot file at `${ENGRAM_DATA_DIR}/snapshot.json`, written by `engram_save(path)` and read by `engram_load(path)`. The snapshot is the sole durable artifact that adversaries can steal.
Snapshot shape (current):
```json
{
"nodes": [
{ "id": "...", "content": "...", "node_type": "...", "label": "...",
"tier": "...", "tags": "...", "metadata": "{}",
"salience": 0.5, "importance": 0.5, "confidence": 0.5,
"activation_count": 0, "last_activated": 0, "created_at": ..., "updated_at": ... }
],
"edges": [
{ "id": "...", "from_id": "...", "to_id": "...", "relation": "...",
"metadata": "{}", "weight": 0.5, "confidence": 0.5,
"created_at": ..., "updated_at": ..., "last_fired": 0 }
]
}
```
Confidential fields (will be encrypted): `content`, `label`, `tags`, `metadata` (on nodes); `metadata` (on edges).
Non-confidential fields (will remain plaintext): `id`, `node_type`, `tier`, `from_id`, `to_id`, `relation`, all numeric scalars and timestamps.
This split — the **structural skeleton stays plaintext, the semantic flesh is sealed** — is what lets activation/search/scan work without unwrapping every record on every query, and lets the snapshot remain JSON-shaped for ops tooling. Enabling full snapshot encryption (a single AEAD blob) is a configurable mode for high-threat deployments; see §6.
---
## 2. Threat model
| Adversary | Capability | What we defend | What we accept |
|-----------|-----------|----------------|----------------|
| **Snapshot thief (cold)** | Steals `snapshot.json` from a backup, dead drive, or stolen laptop. No live process access. | Confidentiality of node content, labels, tags, metadata. No useful semantic recovery. | Topology leak: adversary learns the graph shape (how many nodes/edges, how they connect, types and tiers). Salience/importance/confidence numerics leak. |
| **In-flight observer** | Reads disk during snapshot write (e.g., shared FS, snapshot mid-flush). | Atomicity: snapshot is written to a temp file with `O_TMPFILE` or `<path>.new`, fsync'd, then renamed. AEAD prevents partial-decrypt mining of half-written records. | A torn write can lose the most recent snapshot but not corrupt prior ones. |
| **Quantum-equipped adversary (harvest-now-decrypt-later)** | Records `snapshot.json` and `engram.kek.enc` today, runs Shor's algorithm on a CRQC in 2032+. | DEK wrap is Kyber-768; not Shor-breakable. AES-256-GCM record seals are not Shor-breakable; Grover gives ~128-bit effective security. | We pin Kyber-768 (NIST PQC standardization L3); if a structural break of ML-KEM emerges, rotation §5 is the remediation. |
| **Wrap-key compromiser (live)** | Has read access to the daemon process memory. | Out of scope for at-rest. The DEK is in mlocked memory; if the process is owned, the data is gone. | Defense-in-depth (memory zeroing, mlock, non-dumpable process flags) is a separate hardening track. |
| **Principal compromise** | Adversary obtains the Principal's secret key. | Nothing — by design, the Principal is the unwrapping authority. | We constrain blast radius via per-record key derivation (§3.2): a leaked record-key only burns one record. |
| **Principal loss** | Will dies, secret key is destroyed, no successor. | Shamir K-of-N council recovery (§7). Engram does not go dark on Principal loss. | The threshold itself becomes the attack surface — see §7 caveats. |
What we explicitly **do not** defend:
- Side channels (timing, cache, EM).
- Physical tamper of running hardware.
- Active modification of `engram.kek.enc` to substitute an attacker's wrapped DEK — this is mitigated by the integrity wrapper §4 but not by the threat-model promise.
---
## 3. Cryptographic construction
### 3.1 Layers
```
┌────────────────────────────────────────┐
│ Principal SK (held by Will out-of-band)│
└───────────────┬────────────────────────┘
│ Kyber-768 KEM decaps
┌────────────────────────────────────┐
│ KEK (32B; ephemeral, mlocked) │
└───────────────┬────────────────────┘
│ HKDF-SHA3-256 extract
┌────────────────────────────────────┐
│ DEK (32B; ephemeral, mlocked) │
└───────────────┬────────────────────┘
│ HKDF-SHA3-256(DEK, info=record_id || version)
┌────────────────────────────────────┐
│ per-record sub-key (32B; transient)│
└───────────────┬────────────────────┘
│ AES-256-GCM(sub_key, nonce=12B random, ad=record_id||version)
ciphertext blob on disk
```
### 3.2 Per-record sub-keys (HKDF-SHA3-256)
We do **not** use the DEK directly to seal records. Each record gets its own sub-key:
```
sub_key = HKDF-SHA3-256(
ikm = DEK, // 32 bytes
salt = "engram-record-v1" || version_byte, // domain separation
info = record_id || ":" || dek_epoch, // 16-byte ULID + ":" + u32
L = 32 // output length
)
```
Properties:
- Compromise of one record's sub-key does not threaten siblings (HKDF is one-way).
- Re-derivable on every read; no on-disk sub-key cache to leak.
- `dek_epoch` is bumped on rotation (§5); this lets old records stay readable under DEK_n while new ones write under DEK_{n+1}.
### 3.3 AEAD — AES-256-GCM
- Nonce: 12 bytes, random per write. Stored prepended to ciphertext.
- Tag: 16 bytes, appended.
- Associated data: `record_id || ":" || dek_epoch` (binds ciphertext to its identity; thwarts copy-paste attacks).
Per-record on-disk wire format:
```
+------+------+--------------------+--------------------+--------+
| ver | epch | nonce (12) | ciphertext (n) | tag(16)|
+------+------+--------------------+--------------------+--------+
1B 4B 12B variable 16B
```
Base64-encoded into the snapshot JSON field as `"content":"v1:7:BASE64..."`.
### 3.4 KEK wrap — Kyber-768 KEM
The DEK is sealed by Kyber-768 KEM:
```
(ciphertext, shared_secret) = pq_kem_encaps(principal_pk)
wrap_key = HKDF-SHA3-256(shared_secret, salt="engram-kek-v1", info="dek-wrap", L=32)
sealed_dek = AES-256-GCM(wrap_key, nonce, plaintext=DEK, ad="engram-kek-v1")
```
On disk: `engram.kek.enc` = `magic || version || principal_id || kem_ciphertext || nonce || sealed_dek || tag`.
On boot:
```
shared_secret = pq_kem_decaps(principal_sk, kem_ciphertext)
wrap_key = HKDF-SHA3-256(shared_secret, ...)
DEK = AES-256-GCM-open(wrap_key, nonce, sealed_dek, tag, ad="engram-kek-v1")
```
---
## 4. Boot flow
```
Engram daemon starts
ENGRAM_KEK_PATH → ${ENGRAM_DATA_DIR}/engram.kek.enc (default)
ENGRAM_PRINCIPAL_SK → path to file or "stdin:" or "agent:<socket>"
(the daemon never reads the SK from a fixed env value;
the env points to a source it consumes once and zeros)
load engram.kek.enc
read principal SK once from source, mlock its buffer
pq_kem_decaps → shared_secret → HKDF → wrap_key → AEAD-open → DEK
zero & munlock principal SK buffer
mlock DEK; bump RLIMIT_MEMLOCK if needed
mark process non-dumpable (PR_SET_DUMPABLE=0 on Linux; PT_DENY_ATTACH on macOS)
engram_load(snapshot.json) — for each node/edge field marked encrypted:
derive sub_key, AEAD-open, replace ciphertext with plaintext in-memory
http_serve()
```
Boot failure modes:
- KEK unwrap fails → daemon exits 2; emits one log line, no SK material in logs.
- Snapshot decrypt fails on N records → daemon proceeds with the records it could open, marks the rest as `<corrupted>`, emits a structured event for human triage.
---
## 5. Rotation
### 5.1 DEK rotation
Triggered by: (a) cron schedule (default: 30 days), (b) suspected compromise, (c) operator command (`POST /admin/rotate-dek`).
Procedure:
1. Generate `DEK_{n+1}`; bump `dek_epoch`.
2. For every record: AEAD-open under DEK_n, AEAD-seal under DEK_{n+1} with the new epoch in AD.
3. Wrap DEK_{n+1} under the same Principal pk → write `engram.kek.enc.new`.
4. Atomic rename → live; zero DEK_n in memory.
5. On next snapshot save, all records persist with the new epoch.
Records sealed before rotation can be opened during the transition (we keep DEK_n in memory until the migration completes).
### 5.2 KEK / wrapping-Principal rotation
Triggered by: (a) Principal evolution event (the `Principal` CGI evolves and emits a new PK), (b) successor Principal handover, (c) annual key hygiene rotation.
Procedure:
1. Receive `principal_pk_new` (signed by old Principal, ideally via a Dilithium signature so the chain itself is PQ-secure).
2. Verify signature.
3. Re-encapsulate the **same** DEK under `principal_pk_new`.
4. Atomic-write a new `engram.kek.enc`.
5. Old SK can be destroyed once the new wrap is durable.
The DEK does not change here — only its wrapping. This means snapshot ciphertexts stay valid through Principal transitions.
---
## 6. Snapshot-level encryption (alternative high-threat mode)
For deployments where topology leakage is unacceptable, set `ENGRAM_SEAL_MODE=full`. The entire snapshot JSON is sealed as a single AEAD blob keyed from `HKDF(DEK, info="engram-snapshot")`. Trade-off: every save/load is monolithic; no partial reads, no incremental rotation. Not the default.
---
## 7. Recovery — Shamir K-of-N
**The structural fail-safe.** If Will dies before the Principal evolves, or the Principal SK is destroyed, Engram must not go dark.
### 7.1 Shareholders
The shareholders are the **validation council** — the set of CGIs (or CGI-attested human stewards) authorized to reconstitute the network on a defined trigger. The council is named in `engram.recovery.toml`:
```toml
[recovery]
threshold = 3
total = 5
shareholders = [
"cgi://council/anvil",
"cgi://council/beacon",
"cgi://council/cinder",
"cgi://council/delta",
"cgi://council/echo",
]
```
### 7.2 Split
At KEK-creation time:
1. Generate a fresh recovery secret `R = random(32)` (NOT the DEK itself — see §7.4).
2. Run Shamir-256 over GF(2^8) on `R` with K-of-N polynomial.
3. For each shareholder: `share_i = Kyber768-Encaps(shareholder_pk_i, R_share_i)`.
4. Store `engram.recovery.shares` — public; each share is already PQ-wrapped to its holder.
5. Store `engram.recovery.envelope``AES-256-GCM(R, nonce, plaintext=DEK, ad="engram-recovery-v1")`.
### 7.3 Reconstitution
A council convenes when the trigger is observed (Principal absent for > N days, or signed council quorum declares emergency):
1. K members each decapsulate their share with their CGI SK.
2. Members publish their `R_share_i` to a quorum-attested rendezvous.
3. Lagrange-interpolate `R`.
4. AEAD-open `engram.recovery.envelope` to recover the DEK.
5. Re-wrap DEK under a fresh Principal selected by the council; resume normal operation.
### 7.4 Why R, not the DEK directly
Splitting `R` and using it to AEAD-wrap the DEK keeps the share material small (32B / share) and lets us add/remove shareholders by reissuing the envelope without changing R. It also lets us run §5.2 (Principal rotation) without ever touching the recovery shares.
### 7.5 Caveats Will must accept
- The threshold is the attack surface. K-of-N is K colluders away from compromise. Default 3-of-5; raise if the council grows.
- Council membership churn requires reissuing shares; this is a deliberate, audited operation with its own runbook.
- A Shamir share is plaintext to its holder. The Kyber wrap to each shareholder protects the share *in transit / at rest*, but once a shareholder unwraps, they hold a real K-of-N share. Council members must be CGIs (or hardware-backed) for the threat model to hold.
---
## 8. Implementation map
The runtime additions land in `el-compiler/runtime/el_runtime.c`:
Already present:
- `el_sha256_*`, `el_hmac_sha256` (§3 uses SHA3 — SHA2 path retained for backwards-compat artifacts).
- `el_base64_encode_n`, `el_base64_decode`.
Landing today (parallel agents):
- `el_sha3_256_*` (Keccak family).
- `pq_kem_keypair`, `pq_kem_encaps`, `pq_kem_decaps` (Kyber-768).
- `pq_sign_keypair`, `pq_sign`, `pq_sign_verify` (Dilithium-3) — used by §5.2 Principal-rotation signature.
Engram-specific additions (this work):
- `engram_aead_seal(record_id, epoch, plaintext) -> b64`
- `engram_aead_open(record_id, epoch, b64) -> plaintext`
- `engram_kek_unwrap(kek_path, sk_path) -> int (sets module DEK)`
- `engram_kek_wrap(kek_path, principal_pk, dek)` — used at first init.
- `engram_dek_rotate(new_principal_pk_optional)`
- `engram_recovery_split(threshold, total, shareholder_pks)` — emits envelope + shares.
- `engram_recovery_reconstitute(shares_k)` — recover DEK; admin-gated.
`engram_save` / `engram_load` gain a sealed mode controlled by `ENGRAM_SEAL_MODE` (`off`, `fields` [default once enabled], `full`). Default during the rollout window: `off`. This doc must ship and Will must sign off before flipping the default.
---
## 9. Open questions (require Will)
1. **Where does the Principal SK live at boot time?** Options: (a) prompted at daemon start (interactive), (b) on a removable hardware token (preferred long-term), (c) in `~/.neuron/principal.sk` 0600 (operationally easy, weakest), (d) pulled from `mcp__neuron` as part of `begin_session` (couples Engram to Neuron — probably right). **Recommend (d) with (b) as the long-term hardware story.**
2. **Council composition.** Who/what are the initial K-of-N shareholders? Until we have ≥ 3 stable CGIs, recovery cannot be enabled in its full form. **Recommend a stub council of {Principal, Will-keypair-on-Yubikey, witness-CGI} — degrades gracefully, upgrades to full council when more CGIs exist.**
3. **Default seal mode at GA.** `fields` (today's recommendation) or `full`? `fields` keeps the snapshot diff-able and keeps activation cheap. `full` is the harder threat model. **Recommend `fields` default; `full` opt-in.**
4. **PQ algorithm pinning.** Kyber-768 + Dilithium-3 are the NIST L3 PQC defaults. If Will wants L5 (Kyber-1024 / Dilithium-5), say so before runtime APIs stabilize.
5. **Grover and AES-256.** AES-256 against Grover is 128-bit effective. Acceptable per current PQ thinking. If Will wants a bigger margin, the alternative is layering a second AEAD with a different primitive (e.g., XChaCha20-Poly1305) — overkill, not recommended.
---
## 10. Non-goals
- Encrypted indexes / searchable encryption. Out of scope. Search remains plaintext-in-memory; the daemon is the trust boundary.
- Per-tenant DEKs. Engram is single-tenant per CGI. If multi-tenancy lands, this doc gets a §11.
- Secure deletion of underlying disk blocks. The OS / FS handles that, badly; we don't pretend.
- Encrypted WAL / tx log. The current daemon has no WAL; if one is added, it gets the same treatment as the snapshot (`ENGRAM_SEAL_MODE` applies to both).
---
## 11. Status & next actions
- [x] Doctrine drafted (this document).
- [ ] Will sign-off on §9 open questions.
- [ ] PQ runtime functions land (parallel agents).
- [ ] `engram_aead_seal` / `engram_aead_open` prototype (stubs in this PR).
- [ ] `engram_kek_unwrap` boot integration.
- [ ] `engram_save` / `engram_load` field-mode wiring behind `ENGRAM_SEAL_MODE`.
- [ ] Recovery tooling (`engramctl recovery split | reconstitute`).
- [ ] Threat-model test suite: known-answer tests, key-rotation roundtrip, Shamir reconstitution roundtrip, harvest-now-decrypt-later regression test against a recorded ciphertext.
---
## Appendix A — Pseudocode reference
```c
/* Per-record seal */
char* engram_aead_seal(const char* record_id, uint32_t epoch,
const char* plaintext, size_t pt_len, size_t* out_len)
{
uint8_t sub_key[32];
uint8_t info[64];
int info_len = snprintf((char*)info, sizeof(info), "%s:%u", record_id, epoch);
el_hkdf_sha3_256(/*ikm*/ engram_dek, 32,
/*salt*/ (const uint8_t*)"engram-record-v1", 16,
/*info*/ info, (size_t)info_len,
/*okm*/ sub_key, 32);
uint8_t nonce[12];
el_random_bytes(nonce, 12);
/* layout: ver(1) | epoch(4 BE) | nonce(12) | ct(pt_len) | tag(16) */
size_t blob_len = 1 + 4 + 12 + pt_len + 16;
uint8_t* blob = malloc(blob_len);
blob[0] = 0x01;
blob[1] = (epoch >> 24) & 0xff; blob[2] = (epoch >> 16) & 0xff;
blob[3] = (epoch >> 8) & 0xff; blob[4] = epoch & 0xff;
memcpy(blob + 5, nonce, 12);
el_aes256_gcm_encrypt(sub_key, nonce,
(const uint8_t*)record_id, strlen(record_id),
(const uint8_t*)plaintext, pt_len,
blob + 1 + 4 + 12, /* ct */
blob + 1 + 4 + 12 + pt_len); /* tag */
el_secure_zero(sub_key, 32);
char* b64 = el_base64_encode_raw(blob, blob_len, /*url_safe=*/0);
free(blob);
if (out_len) *out_len = strlen(b64);
return b64; /* "v1:<epoch>:BASE64" prefix added by caller */
}
/* Per-record open: inverse of seal. Verifies tag; returns NULL on failure. */
char* engram_aead_open(const char* record_id, uint32_t expected_epoch,
const char* b64, size_t* out_len);
/* Boot-time KEK unwrap. */
int engram_kek_unwrap(const char* kek_path, const uint8_t* principal_sk,
size_t sk_len);
/* DEK rotation (online). Walks the live in-memory store, re-seals every record
* under DEK_{n+1}, then writes a new snapshot+kek atomically. */
int engram_dek_rotate(void);
```
---
End of document.