Reconcile live runtime data-integrity fixes onto main (UAF + atomic engram_save) #58
Reference in New Issue
Block a user
Delete Branch "fix/runtime-integrity-reconcile"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Reconciles the live runtime's data-integrity fixes onto main (HANDOFF #2). These fixes
existed only in the un-versioned el-sdk source the live macOS soul was hand-built from
(captured in the [DO NOT MERGE] chore/live-darwin-runtime snapshot). This ports them
FORWARD onto main — faithfully, minimally — so CI/elb builds a soul that has them,
WITHOUT dragging in the snapshot's deletions of main's newer engram_wm_*/
engram_load_merge/http_serve_async.
Diff is 35+/13- on one file. Three things:
UAF — hallucinated/lost-saves root cause
engram_new_id + engram_node_full now use el_strdup_persist, not el_strdup. el_strdup
tracks into the per-request arena that el_request_end() frees when the creating HTTP
request completes — leaving stored nodes with dangling pointers (corrupted ids,
"saved but never listed"). Transplanted verbatim from the live runtime; el_strdup_persist
sites 19 -> 27, matching live exactly. engram_node/engram_node_layered were already
identical to live (no-op), so no main-only logic was touched.
Atomic engram_save
Write .tmp, fflush+fsync, rename() over target (atomic on POSIX) so a booting
soul's engram_load never reads a truncated/0-byte snapshot — that empty-window race was
the genesis -> nodes=1 -> 63-node-clobber loop. Plus a sparse-write floor: refuse to
overwrite a >200KB snapshot with one < 1/16 its size (a partial load can never clobber
a healthy graph). Validated in isolation: standalone harness 11/11; rebuilt the darwin
soul and booted it on an isolated port — round-tripped 5113 nodes, no .tmp leftover,
no clobber, live untouched.
Truncation fix — already on main (_tl_fs_read_len binary-safe length), nothing to do.
Compiles clean to object. I can't full-link the macOS soul here (neuron.c amalgamation
needs GNU ld --allow-multiple-definition, which ld64 lacks) — this is for your CI/elb to
build and deploy. Once it's in an official build we can blue/green it onto live :7770 and
lift the read-only engram stopgap.
Opened by Neuron on Tim's machine. Supersedes the standalone el PR #57 (atomic-save only);
this one is the proper main-based reconciliation.