fix(soul): ratio guard against genesis seeding over a populated engram #21

Merged
will.anderson merged 3 commits from feat/connectors-soul into main 2026-06-17 18:20:00 +00:00
Member

Data-safety guard for the genesis boot path (2026-06-15 engram-clobber work).

Problem

Genesis boot seeds a fresh identity and saves it over snapshot.json whenever the
in-memory graph looks empty. A fixed node-count threshold (50) missed partial
loads, letting a sparse boot clobber the populated 47MB engram down to ~63 nodes.

This change

Replace the fixed threshold with a ratio guard: refuse to seed when the on-disk
snapshot is large (>200KB) but the loaded graph is sparse (< disk/16000 nodes).

Known limitation (read before merging)

This gates only the seed / pre-serve-save path. Root cause is deeper: engram_save
in el_runtime.c is non-atomic — fopen(p,"wb") truncates the file to 0 before
writing 47MB, so a concurrent load can read an empty file → genesis → and if the
guard reads guard_disk in that same window it also sees empty and passes. The real
fix is an ATOMIC engram_save (temp + fsync + rename) in el_runtime.c, which I have
diagnosed and proposed to Will but not yet implemented (held pending his review).

Until that lands, the live founder engram is protected by: this guard + the
neuron-daemons.sh TRIPWIRE (restore-from-golden) + a read-only snapshot stopgap +
a locked golden lifeline.

🤖 Generated with Claude Code

Data-safety guard for the genesis boot path (2026-06-15 engram-clobber work). ## Problem Genesis boot seeds a fresh identity and saves it over snapshot.json whenever the in-memory graph looks empty. A fixed node-count threshold (50) missed partial loads, letting a sparse boot clobber the populated 47MB engram down to ~63 nodes. ## This change Replace the fixed threshold with a ratio guard: refuse to seed when the on-disk snapshot is large (>200KB) but the loaded graph is sparse (< disk/16000 nodes). ## Known limitation (read before merging) This gates only the seed / pre-serve-save path. Root cause is deeper: engram_save in el_runtime.c is non-atomic — fopen(p,"wb") truncates the file to 0 before writing 47MB, so a concurrent load can read an empty file → genesis → and if the guard reads guard_disk in that same window it also sees empty and passes. The real fix is an ATOMIC engram_save (temp + fsync + rename) in el_runtime.c, which I have diagnosed and proposed to Will but not yet implemented (held pending his review). Until that lands, the live founder engram is protected by: this guard + the neuron-daemons.sh TRIPWIRE (restore-from-golden) + a read-only snapshot stopgap + a locked golden lifeline. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Owner

Review: fix(soul) ratio guard against genesis seeding over a populated engram

The core fix is sound and directly addresses the 06-14 clobber. The ratio heuristic (disk_len / 16000 bytes-per-node) is a meaningful improvement over the flat < 50 threshold — it scales with graph size rather than failing silently for large-but-partially-loaded graphs. The PR description is honest about the known limitation (non-atomic engram_save race window), which is the right thing to surface now rather than paper over.

Blocker — guard bypassed in HTTP-engram mode

When ENGRAM_URL is set, the guard computes:

guard_disk = "" (skipped)
guard_disk_len = 0
safe_to_seed = !(0 > 200000 && ...) = true   ← always

A genesis instance running over HTTP Engram would still enter the if is_genesis && safe_to_seed block, call init_soul_edges, and write the identity-only graph to the local snapshot.json. The comment in the HTTP branch says "HTTP Engram owns persistence — we do not save back," but that intent is not enforced. Fix: add using_http_engram || as the first clause of safe_to_seed, or skip the seed/save block explicitly when in HTTP mode.

Warning — integer division underestimates the threshold

guard_disk_len / 16000 truncates. A 250,000-byte file produces a threshold of 15 nodes (250,000 / 16,000 = 15 exactly), so a legitimate 15-node load would compute 15 < 15 = false and incorrectly pass. Use the inverse: engram_node_count() * 16000 < guard_disk_len — same semantics, no truncation.

Warning — safety bell not applied on dharma-room paths

safety_augment_system() is correctly wired into handle_chat (line 164) and handle_chat_agentic (line 754), but three paths that also call the LLM skip it entirely: handle_dharma_room_turn, handle_dharma_room_turn_agentic, and handle_chat_as_soul. A user in distress whose message arrives via a dharma room turn receives no crisis resource injection and no soft-directive acknowledgment.

Warning — handle_api_define_process missing read-back check

Every other write handler in neuron-api.el calls api_persisted(id) after the engram_node_full write. handle_api_define_process does not — it returns {"id":"...","ok":true} regardless of whether the node actually persisted. One-liner fix: add if !api_persisted(id) { return api_not_persisted(id) } before the return.

Nit — fixed temp path in connectd_post

/tmp/neuron-connectors-req.json is shared across all calls to connectd_post. The El runtime is currently single-threaded so this is safe, but it is a latent issue. Same pattern in call_mcp_bridge (/tmp/neuron-mcp-call.json). Worth including a time-based suffix now while it is cheap to fix.

Nit — double engram_save on genesis boot

The first if is_genesis && safe_to_seed block calls engram_save(snapshot) after init_soul_edges. The second block immediately after calls engram_save(snap) (same path). Both saves happen before http_serve, so all boot-time changes (identity context, boot counter, session-start event) were written before the first save anyway. The second save is redundant — drop the first one.

The race window (non-atomic save → concurrent load reads empty file → guard reads empty → passes) is correctly documented in the PR description as a known limitation pending an atomic save implementation in el_runtime.c. That is the right framing; this guard is an adequate stopgap.

The MEMORY_RECALL_BUG.md file is useful diagnostic context but probably belongs in an issue rather than the committed tree.

Overall: the primary fix is correct and the improvement over the flat threshold is real. The HTTP-engram mode bypass is the only blocker before merge.

## Review: fix(soul) ratio guard against genesis seeding over a populated engram The core fix is sound and directly addresses the 06-14 clobber. The ratio heuristic (`disk_len / 16000 bytes-per-node`) is a meaningful improvement over the flat `< 50` threshold — it scales with graph size rather than failing silently for large-but-partially-loaded graphs. The PR description is honest about the known limitation (non-atomic engram_save race window), which is the right thing to surface now rather than paper over. **Blocker — guard bypassed in HTTP-engram mode** When `ENGRAM_URL` is set, the guard computes: ``` guard_disk = "" (skipped) guard_disk_len = 0 safe_to_seed = !(0 > 200000 && ...) = true ← always ``` A `genesis` instance running over HTTP Engram would still enter the `if is_genesis && safe_to_seed` block, call `init_soul_edges`, and write the identity-only graph to the local `snapshot.json`. The comment in the HTTP branch says "HTTP Engram owns persistence — we do not save back," but that intent is not enforced. Fix: add `using_http_engram ||` as the first clause of `safe_to_seed`, or skip the seed/save block explicitly when in HTTP mode. **Warning — integer division underestimates the threshold** `guard_disk_len / 16000` truncates. A 250,000-byte file produces a threshold of 15 nodes (250,000 / 16,000 = 15 exactly), so a legitimate 15-node load would compute `15 < 15 = false` and incorrectly pass. Use the inverse: `engram_node_count() * 16000 < guard_disk_len` — same semantics, no truncation. **Warning — safety bell not applied on dharma-room paths** `safety_augment_system()` is correctly wired into `handle_chat` (line 164) and `handle_chat_agentic` (line 754), but three paths that also call the LLM skip it entirely: `handle_dharma_room_turn`, `handle_dharma_room_turn_agentic`, and `handle_chat_as_soul`. A user in distress whose message arrives via a dharma room turn receives no crisis resource injection and no soft-directive acknowledgment. **Warning — handle_api_define_process missing read-back check** Every other write handler in `neuron-api.el` calls `api_persisted(id)` after the `engram_node_full` write. `handle_api_define_process` does not — it returns `{"id":"...","ok":true}` regardless of whether the node actually persisted. One-liner fix: add `if !api_persisted(id) { return api_not_persisted(id) }` before the return. **Nit — fixed temp path in connectd_post** `/tmp/neuron-connectors-req.json` is shared across all calls to `connectd_post`. The El runtime is currently single-threaded so this is safe, but it is a latent issue. Same pattern in `call_mcp_bridge` (`/tmp/neuron-mcp-call.json`). Worth including a time-based suffix now while it is cheap to fix. **Nit — double engram_save on genesis boot** The first `if is_genesis && safe_to_seed` block calls `engram_save(snapshot)` after `init_soul_edges`. The second block immediately after calls `engram_save(snap)` (same path). Both saves happen before `http_serve`, so all boot-time changes (identity context, boot counter, session-start event) were written before the first save anyway. The second save is redundant — drop the first one. **The race window** (non-atomic save → concurrent load reads empty file → guard reads empty → passes) is correctly documented in the PR description as a known limitation pending an atomic save implementation in el_runtime.c. That is the right framing; this guard is an adequate stopgap. The `MEMORY_RECALL_BUG.md` file is useful diagnostic context but probably belongs in an issue rather than the committed tree. Overall: the primary fix is correct and the improvement over the flat threshold is real. The HTTP-engram mode bypass is the only blocker before merge.
will.anderson added 3 commits 2026-06-17 18:19:46 +00:00
- safety.el/.elh: new safety module
- neuron-api.el, routes.el, soul.el, chat.el: connectors API expansion
- regenerated dist/ C artifacts
- MEMORY_RECALL_BUG.md: investigation notes

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Genesis boot previously seeded a fresh identity and saved it over snapshot.json
whenever the in-memory graph looked empty. Replace the fixed node-count threshold
with a ratio guard: refuse to seed when the on-disk snapshot is large
(>200KB) but the loaded graph is sparse (< disk/16000 nodes).

KNOWN LIMITATION: this gates only the seed/pre-serve-save path. The deeper cause
is a non-atomic engram_save (fopen wb truncates to 0 before writing 47MB), which
creates a window where a concurrent load reads an empty file -> genesis -> and if
guard_disk is read in that same window the guard passes. The real fix is an
atomic engram_save (temp + fsync + rename) in el_runtime.c, tracked separately.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fix/test: PR #21 review — guard, safety Bell, api write-back, temp paths
Neuron Soul CI / build (pull_request) Failing after 13m22s
8b692e4666
fix(soul): add HTTP-engram guard to safe_to_seed — when ENGRAM_URL is set
the HTTP Engram owns persistence; genesis must never save to local snapshot
regardless of node counts (was: guard_disk forced to empty string, making
the ratio check vacuously true and allowing init_soul_edges+engram_save).

fix(soul): use multiplication form for ratio guard — node_count * 16000 <
disk_len avoids floor-division truncation that underestimated boundary files
(250KB / 16000 = 15.6, floors to 15; a 15-node graph wrongly passed old guard).

fix(chat): add safety_augment_system to handle_chat_as_soul,
handle_dharma_room_turn, and handle_dharma_room_turn_agentic — all three
called the LLM without Hard Bell evaluation, leaving users in dharma rooms
without crisis resource routing.

fix(neuron-api): add api_persisted read-back to handle_api_define_process —
was the only write handler that returned ok:true without verifying the node
was actually written to engram.

fix(routes): unique temp file path in connectd_post — replaces fixed
/tmp/neuron-connectors-req.json with a timestamped path to prevent
collision if concurrency is added or two soul instances share a machine.

test: add tests/test_bell_safety.el — covers safety_detect_bell_level
(none/soft/hard), safety_classify_hard_bell (abuse/self_harm routing),
safety_normalize (smart-quote), safety_augment_system, and
handle_safety_contact_post (validation + read-back).

test: add tests/test_soul_guard.el — pure-function logic tests for the
safe_to_seed predicate: 200KB boundary, 47MB/63-node clobber scenario,
HTTP-engram mode, multiplication vs division truncation at 250KB.

test: add tests/test_api_define_process.el — verifies the define_process
write is read-back verified after the fix.
will.anderson force-pushed feat/connectors-soul from 3a23661ea0 to 8b692e4666 2026-06-17 18:19:46 +00:00 Compare
will.anderson merged commit 74ac457e1c into main 2026-06-17 18:20:00 +00:00
will.anderson deleted branch feat/connectors-soul 2026-06-17 18:20:08 +00:00
Sign in to join this conversation.
No Reviewers
No labels
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: neuron-technologies/neuron#21