Fix chat.el node_type-slot bug + add engram write-corruption handoff
Neuron Soul CI / build (pull_request) Successful in 3m15s

chat.el recorded the soul's utterance via engram_node(content, "episodic", ...),
putting a TIER into the node_type slot (nodes showed node_type="episodic"). Now uses
engram_node_full(..., "Conversation", "soul:utterance", ..., "Episodic", tags).

The core wrapper fix is in the el repo (PR #52). HANDOFF-engram-write-corruption.md
has the full root-cause analysis, coercion mechanism, caller audit, validation,
deploy runbook (elc build + restart), and the data-prune proposal (~107 corrupt
nodes, all unrecoverable genesis/binary detritus → prune; backup taken).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Tim Lingo
2026-06-08 16:14:20 -05:00
parent cc07648ae1
commit 799ca3758b
2 changed files with 110 additions and 1 deletions
+101
View File
@@ -0,0 +1,101 @@
# Handoff: Engram EL write-path field corruption + silent writes
**For:** Will (backend / EL soul)
**From:** Tim (via Claude Code)
**Date:** 2026-06-08
**Status:** Root cause confirmed; source fixes applied locally (NOT built/deployed); data analyzed; prune proposed (NOT applied).
---
## TL;DR
The EL wrapper `engram_node_full` had a **stale signature** that didn't match the C primitive. Because `el_val_t` is an untyped machine word, the compiler coerced caller args to the wrong declared types and forwarded them **by position** into a C function whose positions mean different things → `tier` got ints, `importance/confidence` got strings, `label` got a float, etc. One caller (`chat.el`) also put a *tier* into the `node_type` slot.
Source fixes are done. **You need to:** review, build with `elc`, restart the soul, verify, and apply the prune (daemon stopped). Details below.
---
## 1. Root cause (confirmed)
**C contract** (`el/lang/el-compiler/runtime/el_seed.h:204`):
```
__engram_node_full(content, node_type, label, salience, importance, confidence, tier, tags)
```
**Old wrapper** (`el/lang/runtime/engram.el:15-17`) — stale schema, wrong names AND types:
```
fn engram_node_full(content: String, nt: String, sal: Float, imp: Float,
source: String, lang: String, ts: Int, tags: String)
```
**Coercion mechanism:** `el_val_t` is `uintptr_t` (`#define EL_STR(s) ((el_val_t)(uintptr_t)(s))`, `EL_INT(v) (v)`). The EL compiler binds each caller arg to the wrapper's *declared* param type (String→Float / String→Int coercion at the boundary), then the wrapper forwards **positionally**. Result for a correct-order caller `(content,"Memory","memory:remembered",sal,imp,conf,tier,tags)`:
- `label``sal` (a float)
- `importance` ← a String
- `confidence` ← a String
- `tier``ts` (the tier String coerced to Int) → **tier becomes an integer**
This matches the data exactly (see §6).
---
## 2. Fix applied — wrapper (`el/lang/runtime/engram.el`)
Corrected to match the C contract 1:1 (no coercion, no reorder):
```
fn engram_node_full(content: String, node_type: String, label: String,
salience: Float, importance: Float, confidence: Float,
tier: String, tags: String) -> String {
// validation (see §4), then:
return __engram_node_full(content, node_type, label, salience, importance, confidence, tier, tags)
}
```
## 3. Fix applied — caller audit
Audited every caller (`chat.el`, `awareness.el`, `soul.el`, `memory.el`, `routes.el`, `neuron-api.el`).
**All `engram_node_full` callers already use the correct order** — so the wrapper fix repairs them automatically. **One real caller bug** fixed:
`neuron/chat.el:512` was:
```
engram_node(clean_response, "episodic", el_from_float(0.6)) // "episodic" = a TIER in the node_type slot
```
Now:
```
engram_node_full(clean_response, "Conversation", "soul:utterance",
el_from_float(0.6), el_from_float(0.6), el_from_float(0.8),
"Episodic", utterance_tags)
```
## 4. Fix applied — validation (defense in depth, `engram.el`)
Added `engram_valid_node_type` / `engram_valid_tier` allowlists. Both `engram_node` and `engram_node_full` now **reject invalid values with `__println` + return `""`** (fail loud, never silently write a malformed node).
- node_type allowlist: Memory, Knowledge, Belief, Project, Tag, BacklogItem, Artifact, Conversation, ExecutionContext, InternalStateEvent, Self, Entity, Process, ConfigEntry, Concept, Imprint *(union of the spec list + types actually present in the store — trim if some are illegitimate).*
- tier allowlist: Semantic, Episodic, Working, Procedural, Canonical, Note, Lesson
- **Note:** `el_val_t` is untyped, so this catches wrong VALUES, not wrong TYPES. Type safety comes from the corrected signatures.
> All edits above are in the working tree on Tim's machine but **NOT compiled/deployed** and **NOT compile-verified** (no `elc` on that box).
---
## 5. DEPLOY RUNBOOK (your build env)
1. Pull the edited files: `el/lang/runtime/engram.el`, `neuron/chat.el`.
2. Build: `elc` (entry `neuron/soul.el`, import chain) → `neuron/dist/*.c`, then link as in `el/lang/install.sh` (`$(CC) $(CFLAGS) -o dist/neuron-fresh dist/*.c .../el_runtime.c -lcurl -lpthread`). Confirm `engram.el` recompiles into the import chain.
3. Restart the soul. **Note:** on Tim's box it's run by `/tmp/soul-keepalive.sh` (an auto-restart loop) → stop that loop before killing `neuron-fresh`, or it'll respawn the old binary.
4. **Verify (prove end-to-end):** write a node via the live API (POST `/api/memories` or the remember path) with an obvious throwaway label, then read it back and confirm `node_type` + `tier` are correct AND that it persisted (node_count increments; survives a snapshot save). There is **no delete endpoint** — clean up via the snapshot.
---
## 6. Data analysis + prune proposal (NOT applied)
- Snapshot: `~/.neuron/engram/snapshot.json`. **Backup made:** `~/.neuron/engram/snapshot.backup-20260608.json`.
- **~107 corrupt nodes** (node_type/tier not in the valid sets). node_type junk values: `''`, `'1'`, `'2'`, `'ntn-genesis'`, `'claude-opus-4-8'`, binary. tier junk: same + `'/Users/timlingo'`.
- **0 are field-repairable.** They're all genesis-bootstrap / binary detritus where *every* field (id/label/tier/tags) is corrupted together — 69× "You are ntn-genesis, a CGI.", 62× "ntn-genesis", ~70 binary garbage, plus a proxy URL + an API path that leaked into labels. No signal to reconstruct → **prune, don't fabricate.**
- **Proposal:** `~/.neuron/engram/snapshot.pruned.json` — 3,631 clean nodes (107 junk removed), edges intact (no dangling). Byte-verified: no *clean* node contains binary content, so re-encoding is lossless.
- **NOT applied** because the live daemon is **actively rewriting `snapshot.json`** (two reads returned different counts). Applying requires stopping the soul + keepalive, swapping in the pruned snapshot, then restarting. Do this in your controlled env with the backup retained.
---
## 7. Security heads-up (please action)
- `ANTHROPIC_API_KEY` is stored **in plaintext** in `/tmp/soul-keepalive.sh` — rotate it and move to a secret store.
- Internal infra leaked into node fields (`http://localhost:7771`, `/api/graph/edges?limit=5000`) — symptom of the same write bug; the prune removes those nodes.
## 8. Backlog of related gaps (separate from this fix)
- Soul chat loop reports **no tools** (`NONE`) / `NO_SHELL` — it narrates `curl`/`sqlite3` without executing. The capture REST path works, but the chat agent can't call it.
- **No `PUT`/`DELETE`** on knowledge nodes (`method not allowed`) — needed for UI edit/delete.
- No **source-conversation** edge on captured nodes — blocks "see source chat" in the UI.
- Writes have been **frozen since ~2026-04-29** (newest knowledge node) — nothing is being added in the current running state.
+9 -1
View File
@@ -509,7 +509,15 @@ fn handle_dharma_room_turn(body: String) -> String {
// Record what the soul said not where it was or with whom. Experience
// accumulates in the engram through the content of what was said.
let snap_path: String = state_get("soul_snapshot_path")
let discard_id: String = engram_node(clean_response, "episodic", el_from_float(0.6))
// Record what the soul said as a Conversation node with an Episodic tier. (Was:
// engram_node(content, "episodic", ...) which wrongly put a TIER into the node_type
// slot that's why nodes showed node_type="episodic". Use the full, correct contract.)
let utterance_tags: String = "[\"soul-utterance\",\"episodic\"]"
let discard_id: String = engram_node_full(
clean_response, "Conversation", "soul:utterance",
el_from_float(0.6), el_from_float(0.6), el_from_float(0.8),
"Episodic", utterance_tags
)
if !str_eq(snap_path, "") {
let discard_save: String = engram_save(snap_path)
}