chat.el recorded the soul's utterance via engram_node(content, "episodic", ...), putting a TIER into the node_type slot (nodes showed node_type="episodic"). Now uses engram_node_full(..., "Conversation", "soul:utterance", ..., "Episodic", tags). The core wrapper fix is in the el repo (PR #52). HANDOFF-engram-write-corruption.md has the full root-cause analysis, coercion mechanism, caller audit, validation, deploy runbook (elc build + restart), and the data-prune proposal (~107 corrupt nodes, all unrecoverable genesis/binary detritus → prune; backup taken). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
6.8 KiB
Handoff: Engram EL write-path field corruption + silent writes
For: Will (backend / EL soul) From: Tim (via Claude Code) Date: 2026-06-08 Status: Root cause confirmed; source fixes applied locally (NOT built/deployed); data analyzed; prune proposed (NOT applied).
TL;DR
The EL wrapper engram_node_full had a stale signature that didn't match the C primitive. Because el_val_t is an untyped machine word, the compiler coerced caller args to the wrong declared types and forwarded them by position into a C function whose positions mean different things → tier got ints, importance/confidence got strings, label got a float, etc. One caller (chat.el) also put a tier into the node_type slot.
Source fixes are done. You need to: review, build with elc, restart the soul, verify, and apply the prune (daemon stopped). Details below.
1. Root cause (confirmed)
C contract (el/lang/el-compiler/runtime/el_seed.h:204):
__engram_node_full(content, node_type, label, salience, importance, confidence, tier, tags)
Old wrapper (el/lang/runtime/engram.el:15-17) — stale schema, wrong names AND types:
fn engram_node_full(content: String, nt: String, sal: Float, imp: Float,
source: String, lang: String, ts: Int, tags: String)
Coercion mechanism: el_val_t is uintptr_t (#define EL_STR(s) ((el_val_t)(uintptr_t)(s)), EL_INT(v) (v)). The EL compiler binds each caller arg to the wrapper's declared param type (String→Float / String→Int coercion at the boundary), then the wrapper forwards positionally. Result for a correct-order caller (content,"Memory","memory:remembered",sal,imp,conf,tier,tags):
label←sal(a float)importance← a Stringconfidence← a Stringtier←ts(the tier String coerced to Int) → tier becomes an integer
This matches the data exactly (see §6).
2. Fix applied — wrapper (el/lang/runtime/engram.el)
Corrected to match the C contract 1:1 (no coercion, no reorder):
fn engram_node_full(content: String, node_type: String, label: String,
salience: Float, importance: Float, confidence: Float,
tier: String, tags: String) -> String {
// validation (see §4), then:
return __engram_node_full(content, node_type, label, salience, importance, confidence, tier, tags)
}
3. Fix applied — caller audit
Audited every caller (chat.el, awareness.el, soul.el, memory.el, routes.el, neuron-api.el).
All engram_node_full callers already use the correct order — so the wrapper fix repairs them automatically. One real caller bug fixed:
neuron/chat.el:512 was:
engram_node(clean_response, "episodic", el_from_float(0.6)) // "episodic" = a TIER in the node_type slot
Now:
engram_node_full(clean_response, "Conversation", "soul:utterance",
el_from_float(0.6), el_from_float(0.6), el_from_float(0.8),
"Episodic", utterance_tags)
4. Fix applied — validation (defense in depth, engram.el)
Added engram_valid_node_type / engram_valid_tier allowlists. Both engram_node and engram_node_full now reject invalid values with __println + return "" (fail loud, never silently write a malformed node).
- node_type allowlist: Memory, Knowledge, Belief, Project, Tag, BacklogItem, Artifact, Conversation, ExecutionContext, InternalStateEvent, Self, Entity, Process, ConfigEntry, Concept, Imprint (union of the spec list + types actually present in the store — trim if some are illegitimate).
- tier allowlist: Semantic, Episodic, Working, Procedural, Canonical, Note, Lesson
- Note:
el_val_tis untyped, so this catches wrong VALUES, not wrong TYPES. Type safety comes from the corrected signatures.
All edits above are in the working tree on Tim's machine but NOT compiled/deployed and NOT compile-verified (no
elcon that box).
5. DEPLOY RUNBOOK (your build env)
- Pull the edited files:
el/lang/runtime/engram.el,neuron/chat.el. - Build:
elc(entryneuron/soul.el, import chain) →neuron/dist/*.c, then link as inel/lang/install.sh($(CC) $(CFLAGS) -o dist/neuron-fresh dist/*.c .../el_runtime.c -lcurl -lpthread). Confirmengram.elrecompiles into the import chain. - Restart the soul. Note: on Tim's box it's run by
/tmp/soul-keepalive.sh(an auto-restart loop) → stop that loop before killingneuron-fresh, or it'll respawn the old binary. - Verify (prove end-to-end): write a node via the live API (POST
/api/memoriesor the remember path) with an obvious throwaway label, then read it back and confirmnode_type+tierare correct AND that it persisted (node_count increments; survives a snapshot save). There is no delete endpoint — clean up via the snapshot.
6. Data analysis + prune proposal (NOT applied)
- Snapshot:
~/.neuron/engram/snapshot.json. Backup made:~/.neuron/engram/snapshot.backup-20260608.json. - ~107 corrupt nodes (node_type/tier not in the valid sets). node_type junk values:
'','1','2','ntn-genesis','claude-opus-4-8', binary. tier junk: same +'/Users/timlingo'. - 0 are field-repairable. They're all genesis-bootstrap / binary detritus where every field (id/label/tier/tags) is corrupted together — 69× "You are ntn-genesis, a CGI.", 62× "ntn-genesis", ~70 binary garbage, plus a proxy URL + an API path that leaked into labels. No signal to reconstruct → prune, don't fabricate.
- Proposal:
~/.neuron/engram/snapshot.pruned.json— 3,631 clean nodes (107 junk removed), edges intact (no dangling). Byte-verified: no clean node contains binary content, so re-encoding is lossless. - NOT applied because the live daemon is actively rewriting
snapshot.json(two reads returned different counts). Applying requires stopping the soul + keepalive, swapping in the pruned snapshot, then restarting. Do this in your controlled env with the backup retained.
7. Security heads-up (please action)
ANTHROPIC_API_KEYis stored in plaintext in/tmp/soul-keepalive.sh— rotate it and move to a secret store.- Internal infra leaked into node fields (
http://localhost:7771,/api/graph/edges?limit=5000) — symptom of the same write bug; the prune removes those nodes.
8. Backlog of related gaps (separate from this fix)
- Soul chat loop reports no tools (
NONE) /NO_SHELL— it narratescurl/sqlite3without executing. The capture REST path works, but the chat agent can't call it. - No
PUT/DELETEon knowledge nodes (method not allowed) — needed for UI edit/delete. - No source-conversation edge on captured nodes — blocks "see source chat" in the UI.
- Writes have been frozen since ~2026-04-29 (newest knowledge node) — nothing is being added in the current running state.