From db6e51c3466c448363127c3a9906998ebccd9be6 Mon Sep 17 00:00:00 2001 From: Tim Lingo <1timlingo@gmail.com> Date: Sat, 27 Jun 2026 11:44:46 -0500 Subject: [PATCH] proposal: prevent engram corruption at the source (boot-seeding upsert + UTF-8 + read-back audit) Soul-core proposal for Will's review (untested El). Root cause: boot-time writes re-insert instead of update (dup spam), no UTF-8 validation on write. Full spec in docs repo CORRUPTION-PREVENTION-SPEC.md. Co-Authored-By: Claude Opus 4.8 (1M context) --- PREVENTION-FIXES-FOR-REVIEW.md | 52 ++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) create mode 100644 PREVENTION-FIXES-FOR-REVIEW.md diff --git a/PREVENTION-FIXES-FOR-REVIEW.md b/PREVENTION-FIXES-FOR-REVIEW.md new file mode 100644 index 0000000..903ca62 --- /dev/null +++ b/PREVENTION-FIXES-FOR-REVIEW.md @@ -0,0 +1,52 @@ +# Prevention fixes for review (engram corruption at the source) — for Will + +**Context:** a scan of Tim's engram found 21% of nodes corrupt. Root cause is **boot-time writes that +re-insert instead of update**, plus no UTF-8 validation. New users start clean but would rot the same way. +Full spec: `docs/research-archive/p0-prototypes/CORRUPTION-PREVENTION-SPEC.md`. These are **soul core**, so +this is a proposal for your review/build/test — nothing applied. Prepared by Neuron-in-the-CLI (untested El; +needs your engram-internals knowledge to finalize). + +## The one question that unblocks everything +**Does `engram_node_full(content, type, label, …)` upsert by label, or always insert a new node?** +- `conv_history_persist` (chat.el:786) reuses label `"conv:history"` and is described as "upsert by label", + and does NOT show up as heavy duplicates. +- `mem_boot_count_inc` (memory.el:127) reuses label `"soul:boot_count"` but **accumulated ~120 copies**. +- Both call `engram_node_full` with a fixed label + changing content. If it upserts by label, boot_count + shouldn't accumulate; since it does, either it inserts, or conv:history avoids dups another way. +- **Your answer decides the fix:** (a) if there's an upsert/update-by-label primitive, the fixes are + one-line swaps; (b) if not, we add a `find-by-label → update-or-insert` helper and use it everywhere. + +`engram_node_full` / `engram_get_node_by_label` / any update primitive live in the engram repo — couldn't +inspect their semantics from the soul repo. + +## FIX 1 — Idempotent boot seeding (biggest cause, ~75% of corruption) +- **`mem_boot_count_inc` (memory.el:127-140)** — the code comment admits it: *"Each boot creates a new + 'soul:boot_count:N' node. Old ones accumulate as history."* → change to **update the single + `soul:boot_count` node** (find-by-label → set content to new count), not create a new one. +- **Identity/safety belief seeding** (the `safety:*-boundary`, `safety:anti-hallucination` beliefs that hit + ~81 copies each) — wherever these are seeded on boot, make them **upsert by label** so re-seeding updates + the one node instead of adding a copy. (Reuse the `chat.el` "upsert by label" approach.) +- **Test:** boot the clean profile (:7798) 5×; each `safety:*-boundary` belief and `soul:boot_count` exists + exactly **once**; counter shows the latest value. + +## FIX 2 — UTF-8 validation/sanitization on every engram write +- No UTF-8 validation found on the write path; invalid bytes got persisted (garbled nodes). +- **Fix:** validate/normalize to valid UTF-8 before `engram_node_full` persists (reject or sanitize). +- **Test:** write a node with invalid bytes → stored clean (or rejected); snapshot parses with zero + replacement characters. + +## FIX 3 — Confirm read-back-verify covers ALL write paths +- Already present: `api_persisted` ("read-back-after-write guard", neuron-api.el:90) + safety.el:410. Good. +- **Review the deliberate exception** at neuron-api.el:198 ("NOT read-back-verify here … can return a STALE + hit for a just-written node") and close it safely so every write path verifies. +- **Test:** save → read back → matches; force a failed write → returns `api_not_persisted`, not false success. + +## FIX 4 — Cap/prune time-series events (housekeeping, NOT corruption) +- The ~120 `session-start` InternalStateEvent nodes (soul.el:294) are **legitimate per-boot history** — do + **not** dedup them. But keep them bounded (keep last N / summarize older) so the engram doesn't grow forever. +- **Test:** after many boots, event count stays bounded; older history still summarized. + +## Sequence +Confirm the upsert question → implement Fix 1 (biggest win) → Fix 2 → Fix 3 → Fix 4 → build + test on the +clean profile (:7798) before prod. Legacy cleanup of existing corrupt data is a **separate, secondary** +safety net (and its dedup must be time-series-aware + merge edges, not blind-delete).