Confirms two distinct write failures (capture=wrapper bug; backlog=axon :7771 unbuilt Rust), soul runs in file-snapshot mode (not engram :8742 live), engram :8742 CRUD works but minimal, + a verification plan to run after the soul rebuild. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
9.9 KiB
Handoff: Engram EL write-path field corruption + silent writes
For: Will (backend / EL soul) From: Tim (via Claude Code) Date: 2026-06-08 Status: Root cause confirmed; source fixes applied locally (NOT built/deployed); data analyzed; prune proposed (NOT applied).
TL;DR
The EL wrapper engram_node_full had a stale signature that didn't match the C primitive. Because el_val_t is an untyped machine word, the compiler coerced caller args to the wrong declared types and forwarded them by position into a C function whose positions mean different things → tier got ints, importance/confidence got strings, label got a float, etc. One caller (chat.el) also put a tier into the node_type slot.
Source fixes are done. You need to: review, build with elc, restart the soul, verify, and apply the prune (daemon stopped). Details below.
1. Root cause (confirmed)
C contract (el/lang/el-compiler/runtime/el_seed.h:204):
__engram_node_full(content, node_type, label, salience, importance, confidence, tier, tags)
Old wrapper (el/lang/runtime/engram.el:15-17) — stale schema, wrong names AND types:
fn engram_node_full(content: String, nt: String, sal: Float, imp: Float,
source: String, lang: String, ts: Int, tags: String)
Coercion mechanism: el_val_t is uintptr_t (#define EL_STR(s) ((el_val_t)(uintptr_t)(s)), EL_INT(v) (v)). The EL compiler binds each caller arg to the wrapper's declared param type (String→Float / String→Int coercion at the boundary), then the wrapper forwards positionally. Result for a correct-order caller (content,"Memory","memory:remembered",sal,imp,conf,tier,tags):
label←sal(a float)importance← a Stringconfidence← a Stringtier←ts(the tier String coerced to Int) → tier becomes an integer
This matches the data exactly (see §6).
2. Fix applied — wrapper (el/lang/runtime/engram.el)
Corrected to match the C contract 1:1 (no coercion, no reorder):
fn engram_node_full(content: String, node_type: String, label: String,
salience: Float, importance: Float, confidence: Float,
tier: String, tags: String) -> String {
// validation (see §4), then:
return __engram_node_full(content, node_type, label, salience, importance, confidence, tier, tags)
}
3. Fix applied — caller audit
Audited every caller (chat.el, awareness.el, soul.el, memory.el, routes.el, neuron-api.el).
All engram_node_full callers already use the correct order — so the wrapper fix repairs them automatically. One real caller bug fixed:
neuron/chat.el:512 was:
engram_node(clean_response, "episodic", el_from_float(0.6)) // "episodic" = a TIER in the node_type slot
Now:
engram_node_full(clean_response, "Conversation", "soul:utterance",
el_from_float(0.6), el_from_float(0.6), el_from_float(0.8),
"Episodic", utterance_tags)
4. Fix applied — validation (defense in depth, engram.el)
Added engram_valid_node_type / engram_valid_tier allowlists. Both engram_node and engram_node_full now reject invalid values with __println + return "" (fail loud, never silently write a malformed node).
- node_type allowlist: Memory, Knowledge, Belief, Project, Tag, BacklogItem, Artifact, Conversation, ExecutionContext, InternalStateEvent, Self, Entity, Process, ConfigEntry, Concept, Imprint (union of the spec list + types actually present in the store — trim if some are illegitimate).
- tier allowlist: Semantic, Episodic, Working, Procedural, Canonical, Note, Lesson
- Note:
el_val_tis untyped, so this catches wrong VALUES, not wrong TYPES. Type safety comes from the corrected signatures.
All edits above are in the working tree on Tim's machine but NOT compiled/deployed and NOT compile-verified (no
elcon that box).
5. DEPLOY RUNBOOK (your build env)
- Pull the edited files:
el/lang/runtime/engram.el,neuron/chat.el. - Build:
elc(entryneuron/soul.el, import chain) →neuron/dist/*.c, then link as inel/lang/install.sh($(CC) $(CFLAGS) -o dist/neuron-fresh dist/*.c .../el_runtime.c -lcurl -lpthread). Confirmengram.elrecompiles into the import chain. - Restart the soul. Note: on Tim's box it's run by
/tmp/soul-keepalive.sh(an auto-restart loop) → stop that loop before killingneuron-fresh, or it'll respawn the old binary. - Verify (prove end-to-end): write a node via the live API (POST
/api/memoriesor the remember path) with an obvious throwaway label, then read it back and confirmnode_type+tierare correct AND that it persisted (node_count increments; survives a snapshot save). There is no delete endpoint — clean up via the snapshot.
6. Data analysis + prune proposal (NOT applied)
- Snapshot:
~/.neuron/engram/snapshot.json. Backup made:~/.neuron/engram/snapshot.backup-20260608.json. - ~107 corrupt nodes (node_type/tier not in the valid sets). node_type junk values:
'','1','2','ntn-genesis','claude-opus-4-8', binary. tier junk: same +'/Users/timlingo'. - 0 are field-repairable. They're all genesis-bootstrap / binary detritus where every field (id/label/tier/tags) is corrupted together — 69× "You are ntn-genesis, a CGI.", 62× "ntn-genesis", ~70 binary garbage, plus a proxy URL + an API path that leaked into labels. No signal to reconstruct → prune, don't fabricate.
- Proposal:
~/.neuron/engram/snapshot.pruned.json— 3,631 clean nodes (107 junk removed), edges intact (no dangling). Byte-verified: no clean node contains binary content, so re-encoding is lossless. - NOT applied because the live daemon is actively rewriting
snapshot.json(two reads returned different counts). Applying requires stopping the soul + keepalive, swapping in the pruned snapshot, then restarting. Do this in your controlled env with the backup retained.
7. Security heads-up (please action)
ANTHROPIC_API_KEYis stored in plaintext in/tmp/soul-keepalive.sh— rotate it and move to a secret store.- Internal infra leaked into node fields (
http://localhost:7771,/api/graph/edges?limit=5000) — symptom of the same write bug; the prune removes those nodes.
8. Backlog of related gaps (separate from this fix)
- Soul chat loop reports no tools (
NONE) /NO_SHELL— it narratescurl/sqlite3without executing. The capture REST path works, but the chat agent can't call it. - No
PUT/DELETEon knowledge nodes (method not allowed) — needed for UI edit/delete. - No source-conversation edge on captured nodes — blocks "see source chat" in the UI.
- Writes have been frozen since ~2026-04-29 (newest knowledge node) — nothing is being added in the current running state.
ADDENDUM — Phase 0 live runtime findings (2026-06-08, verified against the running system)
Validated the write path end-to-end against neuron-fresh :7770 + engram :8742. Confirms the diagnosis and corrects two common assumptions.
Ports: engram :8742 ✓ listening (healthy: {"status":"ok","engine":"engram-runtime-native"}), neuron-fresh :7770 ✓, :7771 NOT listening.
Two distinct write failures (not one):
/api/neuron/knowledge/capture+ memory remember — handled in-process by the soul (neuron-api.elhandle_api_capture_knowledge/ remember →engram_node_full(...)). Live test:POST …/knowledge/capturereturned{"id":"2ccfc147…","ok":true}but that id is absent from/api/graph/nodesandsnapshot.json→ the node corrupted/vanished. This is exactly theengram_node_fullwrapper bug this PR fixes. It is NOT a:7771issue. → fixed by el PR #52 + soul rebuild./api/backlog,/api/memories,/api/knowledge,/api/artifacts,/api/projects,/api/imprints—routes.elproxies these toaxonviaaxon_get/axon_post(baseSOUL_AXONor defaulthttp://localhost:7771).axon=protocols/axon, an unbuilt Rust crate, not running → "Failed to connect to localhost port 7771." → needs axon stood up (separate Rust workstream) OR routes repointed.
Architecture clarifications (so nobody chases the wrong port again):
- The soul runs in file-snapshot mode (no
ENGRAM_URLin/tmp/soul-keepalive.sh) → it uses~/.neuron/engram/snapshot.json, notengram :8742live. So writing to:8742does NOT make data visible to the soul the app talks to. engram :8742is its own EL service (engram/src/server.el) with a working CRUD API:POST/GET/DELETE /api/nodes,/api/edges,/api/save,/api/load,/api/activate,/api/search. Verified create+delete ({"ok":true}). But itsroute_create_nodeonly readscontent/node_type/salience— no label/tier/tags/metadata — so it can't setmetadata.tier_source: canonical.- Minor EL bug in
engram/src/server.el route_create_node:if str_eq(node_type,""){ let node_type = "Memory" }shadows (new local) instead of reassigning → the default never applies; same forsalience. Worth fixing while in there.
Verification plan (run after the soul rebuild lands):
POST /api/neuron/knowledge/capture {content,title,tier:canonical}→ capture the returned id.GET /api/neuron/knowledge/search?q=<term>→ confirm the node comes back with correctnode_type/metadata.tier_source.- Confirm it survives a snapshot save (present in
snapshot.json). Only then is the write "real." - Backlog: once
axon :7771is up, repeat forPOST /api/backlog.
Net: "make writes persist" needs (a) this wrapper fix built into the soul (capture) and (b) axon :7771 running (backlog/artifacts/etc.). Neither was doable on Tim's box (no elc; axon is unbuilt Rust — out of scope per the no-Rust guardrail). No live writes/restarts were performed; engram probe node was created and deleted to verify the API.