Atomic engram_save + anti-clobber floor (validated, darwin build recipe) #57

Closed
tim.lingo wants to merge 1 commits from fix/engram-save-atomic-darwin into chore/live-darwin-runtime
Member

Preserves + makes reviewable the validated engram-clobber fix. Base is the
live-darwin-runtime snapshot (the un-versioned source the live :7770 soul is built
from), so the diff is ONLY the engram_save change + a darwin build script.

The fix (engram_save)

  • Atomic write: serialize to .tmp, fflush+fsync, rename() over target (atomic
    on POSIX). No reader ever sees a truncated/0-byte snapshot — that empty-window race
    was the root of the genesis -> nodes=1 -> 63-node-clobber loop.
  • Sparse-write floor: refuse to overwrite a >200KB snapshot with one < 1/16 its size.
    A partial load can never clobber a healthy graph, whatever the upstream cause.

Validation (in isolation, nothing live touched)

  • Standalone clang harness: 11/11 (sparse refused + original intact, atomic rename,
    no .tmp leftover, sub-200KB engrams unaffected).
  • Rebuilt the darwin soul via the new scripts/build-soul-darwin.sh, booted on an
    isolated port against a golden copy: loaded 5113 nodes, round-tripped the full 47MB
    snapshot, no .tmp leftover, live ~/.neuron untouched.

scripts/build-soul-darwin.sh

Replicates elb on macOS/arm64 with clang (elb ships Linux-only via CI). Lets us build
and test the darwin soul locally. Key: -Wno-implicit-function-declaration (dist modules
use C89 implicit cross-module decls Apple clang rejects), and link *.o once.

NOT this PR

Landing this on main is a separate reconciliation (the live runtime diverges from main —
it lacks main's engram_wm_*/engram_load_merge/http_serve_async). Do not blind-merge.
Opened by Neuron on Tim's machine; validated, not yet deployed to live :7770.

Preserves + makes reviewable the validated engram-clobber fix. Base is the live-darwin-runtime snapshot (the un-versioned source the live :7770 soul is built from), so the diff is ONLY the engram_save change + a darwin build script. ## The fix (engram_save) - Atomic write: serialize to <path>.tmp, fflush+fsync, rename() over target (atomic on POSIX). No reader ever sees a truncated/0-byte snapshot — that empty-window race was the root of the genesis -> nodes=1 -> 63-node-clobber loop. - Sparse-write floor: refuse to overwrite a >200KB snapshot with one < 1/16 its size. A partial load can never clobber a healthy graph, whatever the upstream cause. ## Validation (in isolation, nothing live touched) - Standalone clang harness: 11/11 (sparse refused + original intact, atomic rename, no .tmp leftover, sub-200KB engrams unaffected). - Rebuilt the darwin soul via the new scripts/build-soul-darwin.sh, booted on an isolated port against a golden copy: loaded 5113 nodes, round-tripped the full 47MB snapshot, no .tmp leftover, live ~/.neuron untouched. ## scripts/build-soul-darwin.sh Replicates elb on macOS/arm64 with clang (elb ships Linux-only via CI). Lets us build and test the darwin soul locally. Key: -Wno-implicit-function-declaration (dist modules use C89 implicit cross-module decls Apple clang rejects), and link *.o once. ## NOT this PR Landing this on main is a separate reconciliation (the live runtime diverges from main — it lacks main's engram_wm_*/engram_load_merge/http_serve_async). Do not blind-merge. Opened by Neuron on Tim's machine; validated, not yet deployed to live :7770.
tim.lingo added 1 commit 2026-06-16 23:23:57 +00:00
Kills the engram-clobber loop at its source. engram_save did a bare fopen("wb")
that truncates snapshot.json to 0 bytes before the 47MB write — a booting soul's
engram_load could read that empty window -> genesis -> nodes=1 -> a 63-node save
overwrote the populated file. Two guards:
 1. Atomic write: serialize to <path>.tmp, fflush+fsync, rename() over target
    (atomic on POSIX) — no reader ever sees a truncated/0-byte snapshot.
 2. Sparse-write floor: refuse to overwrite a >200KB snapshot with one < 1/16 its
    size — a partial load can never clobber a healthy graph, whatever the cause.

Validated in isolation: standalone clang harness 11/11; rebuilt the darwin soul
(scripts/build-soul-darwin.sh) and booted it on an isolated port against a golden
copy — loaded 5113 nodes and round-tripped the full 47MB snapshot, no .tmp leftover,
live ~/.neuron untouched. Adds scripts/build-soul-darwin.sh (local elb replacement).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
tim.lingo closed this pull request 2026-06-17 17:33:39 +00:00

Pull request closed

Please reopen this pull request to perform a merge.
Sign in to join this conversation.
No Reviewers
No labels
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: neuron-technologies/el#57