fix(soul): ratio guard against genesis seeding over a populated engram #21
Reference in New Issue
Block a user
Delete Branch "feat/connectors-soul"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Data-safety guard for the genesis boot path (2026-06-15 engram-clobber work).
Problem
Genesis boot seeds a fresh identity and saves it over snapshot.json whenever the
in-memory graph looks empty. A fixed node-count threshold (50) missed partial
loads, letting a sparse boot clobber the populated 47MB engram down to ~63 nodes.
This change
Replace the fixed threshold with a ratio guard: refuse to seed when the on-disk
snapshot is large (>200KB) but the loaded graph is sparse (< disk/16000 nodes).
Known limitation (read before merging)
This gates only the seed / pre-serve-save path. Root cause is deeper: engram_save
in el_runtime.c is non-atomic — fopen(p,"wb") truncates the file to 0 before
writing 47MB, so a concurrent load can read an empty file → genesis → and if the
guard reads guard_disk in that same window it also sees empty and passes. The real
fix is an ATOMIC engram_save (temp + fsync + rename) in el_runtime.c, which I have
diagnosed and proposed to Will but not yet implemented (held pending his review).
Until that lands, the live founder engram is protected by: this guard + the
neuron-daemons.sh TRIPWIRE (restore-from-golden) + a read-only snapshot stopgap +
a locked golden lifeline.
🤖 Generated with Claude Code
Review: fix(soul) ratio guard against genesis seeding over a populated engram
The core fix is sound and directly addresses the 06-14 clobber. The ratio heuristic (
disk_len / 16000 bytes-per-node) is a meaningful improvement over the flat< 50threshold — it scales with graph size rather than failing silently for large-but-partially-loaded graphs. The PR description is honest about the known limitation (non-atomic engram_save race window), which is the right thing to surface now rather than paper over.Blocker — guard bypassed in HTTP-engram mode
When
ENGRAM_URLis set, the guard computes:A
genesisinstance running over HTTP Engram would still enter theif is_genesis && safe_to_seedblock, callinit_soul_edges, and write the identity-only graph to the localsnapshot.json. The comment in the HTTP branch says "HTTP Engram owns persistence — we do not save back," but that intent is not enforced. Fix: addusing_http_engram ||as the first clause ofsafe_to_seed, or skip the seed/save block explicitly when in HTTP mode.Warning — integer division underestimates the threshold
guard_disk_len / 16000truncates. A 250,000-byte file produces a threshold of 15 nodes (250,000 / 16,000 = 15 exactly), so a legitimate 15-node load would compute15 < 15 = falseand incorrectly pass. Use the inverse:engram_node_count() * 16000 < guard_disk_len— same semantics, no truncation.Warning — safety bell not applied on dharma-room paths
safety_augment_system()is correctly wired intohandle_chat(line 164) andhandle_chat_agentic(line 754), but three paths that also call the LLM skip it entirely:handle_dharma_room_turn,handle_dharma_room_turn_agentic, andhandle_chat_as_soul. A user in distress whose message arrives via a dharma room turn receives no crisis resource injection and no soft-directive acknowledgment.Warning — handle_api_define_process missing read-back check
Every other write handler in
neuron-api.elcallsapi_persisted(id)after theengram_node_fullwrite.handle_api_define_processdoes not — it returns{"id":"...","ok":true}regardless of whether the node actually persisted. One-liner fix: addif !api_persisted(id) { return api_not_persisted(id) }before the return.Nit — fixed temp path in connectd_post
/tmp/neuron-connectors-req.jsonis shared across all calls toconnectd_post. The El runtime is currently single-threaded so this is safe, but it is a latent issue. Same pattern incall_mcp_bridge(/tmp/neuron-mcp-call.json). Worth including a time-based suffix now while it is cheap to fix.Nit — double engram_save on genesis boot
The first
if is_genesis && safe_to_seedblock callsengram_save(snapshot)afterinit_soul_edges. The second block immediately after callsengram_save(snap)(same path). Both saves happen beforehttp_serve, so all boot-time changes (identity context, boot counter, session-start event) were written before the first save anyway. The second save is redundant — drop the first one.The race window (non-atomic save → concurrent load reads empty file → guard reads empty → passes) is correctly documented in the PR description as a known limitation pending an atomic save implementation in el_runtime.c. That is the right framing; this guard is an adequate stopgap.
The
MEMORY_RECALL_BUG.mdfile is useful diagnostic context but probably belongs in an issue rather than the committed tree.Overall: the primary fix is correct and the improvement over the flat threshold is real. The HTTP-engram mode bypass is the only blocker before merge.
3a23661ea0to8b692e4666