Fix engram_node_full field corruption + add validation #52

Merged
will.anderson merged 13 commits from fix/engram-node-full-field-corruption into dev 2026-06-10 22:01:42 +00:00
Member

Stale wrapper signature didn't match the C primitive; el_val_t is untyped so args were coerced + forwarded by position (tier got ints, fields got junk — ~100 corrupt nodes). Corrects the wrapper 1:1 and adds node_type/tier allowlist validation (fail-loud, no silent malformed writes). Full write-up + deploy runbook + data-prune proposal in the neuron repo: HANDOFF-engram-write-corruption.md.

Stale wrapper signature didn't match the C primitive; el_val_t is untyped so args were coerced + forwarded by position (tier got ints, fields got junk — ~100 corrupt nodes). Corrects the wrapper 1:1 and adds node_type/tier allowlist validation (fail-loud, no silent malformed writes). Full write-up + deploy runbook + data-prune proposal in the neuron repo: HANDOFF-engram-write-corruption.md.
Owner

Code Review — PR #52: Fix engram_node_full field corruption + add validation

Reviewed by: automated review (Claude Code)
Files changed: 1 (lang/runtime/engram.el)
Diff: +42 / -3


Summary

This PR fixes a real, production-impacting bug: the El wrapper for __engram_node_full had a stale 8-argument signature that didn't match the C primitive. Because el_val_t is an untyped machine word, the EL compiler accepted the mismatched call and forwarded arguments by position into the C function — placing floats in string slots, an integer in the tier slot, and so on. The result was ~100 corrupt graph nodes. The fix is correct: the new wrapper signature matches el_seed.h exactly (content, node_type, label, salience, importance, confidence, tier, tags), and the added validation guards are a genuinely useful defense-in-depth layer. This is a high-priority fix that should land quickly.


What was done well

lang/runtime/engram.el lines 9–27 — Validation helpers

The allowlist approach is exactly right for an untyped runtime. Because el_val_t can't encode type failures at compile time, the only reliable safeguard is a runtime value check. Listing every legal node_type and tier string explicitly (rather than, say, a prefix check) is correct — it makes the contract visible in code and forces a deliberate update when new types are added.

lang/runtime/engram.el lines 38–56 — Corrected engram_node_full signature

The old signature:

fn engram_node_full(content: String, nt: String, sal: Float, imp: Float,
                    source: String, lang: String, ts: Int, tags: String)

The new signature:

fn engram_node_full(content: String, node_type: String, label: String,
                    salience: Float, importance: Float, confidence: Float,
                    tier: String, tags: String)

This matches the infra.c stub signature exactly (content, node_type, label, salience, importance, confidence, tier, tags) which I verified in lang/tests/suite/infra.c:214-220. The 1:1 alignment eliminates the coercion-by-position bug completely.

The multi-line comment block at lines 38–47 is excellent. It documents exactly why the old signature was dangerous (untyped machine word, positional forwarding), what the old bug looked like field-by-field, and what the contract is. This level of explicitness is the right call for a footgun of this kind.

engram_node also gets validation (lines 33–37) — good catch that the simpler wrapper was equally unprotected.

Failure is loud, not silent. Returning "" and printing [engram] REJECTED ... is the right pattern here: callers already check if str_eq(node_id, "") { return err_json(...) } (see ui/vessels/el-graph/src/editor.el:33-34), so the empty-string sentinel propagates correctly up the stack without swallowing the error.

The only active call site (editor.el:32) is already compatible with the corrected signature. engram_node_full(content, eff_type, eff_label, sal, sal, int_to_float(1), "Working", "") maps exactly to the new parameter positions. The default eff_type = "Entity" is in the allowlist, and "Working" is in the tier allowlist. No callers are broken.


Issues / Needs Work

[non-blocking, medium] No tests added for the new validators — lang/runtime/engram.el:14-27

engram_valid_node_type and engram_valid_tier are pure functions operating on strings — they are trivially testable with the existing test.el framework. The test suite in lang/tests/ has test_core.el, test_string.el, etc. but no test_engram.el. This matters because:

  1. The next person who adds a new node type (e.g. "Wonder", "Imprint" — the latter already exists in the live Neuron system per the handoff doc) needs a failing test to remind them to update this file.
  2. The existing allowlist is already slightly stale: "Imprint" is listed in engram_valid_node_type but is not present in the tier list. If "Imprint" is also a valid tier value in practice, the tier list needs updating.

Suggested test additions (using the existing harness pattern from test.el):

fn test_valid_node_types(_: String) -> String {
    assert_true(engram_valid_node_type("Memory"), "Memory is valid")
    assert_true(engram_valid_node_type("Imprint"), "Imprint is valid")
    assert_false(engram_valid_node_type("Working"), "Working is NOT a node_type")
    assert_false(engram_valid_node_type(""), "empty string is invalid")
    return ""
}

fn test_valid_tiers(_: String) -> String {
    assert_true(engram_valid_tier("Working"), "Working is valid")
    assert_false(engram_valid_tier("Memory"), "Memory is NOT a tier")
    assert_false(engram_valid_tier(""), "empty string is invalid")
    return ""
}

[non-blocking, low] engram_valid_tier does not validate against empty string — lang/runtime/engram.el:21-24

If a caller passes tier = "" (e.g. from a JSON field that wasn't populated), engram_valid_tier("") returns false and the write is rejected — which is technically correct behavior. However, there's no explicit empty-string check with a distinct rejection message. The current behavior conflates "wrong tier value" and "missing tier" in the log output, which will make debugging harder. Consider:

if str_eq(tier, "") {
    __println("[engram] REJECTED node write — tier is empty (node_type=" + node_type + ", label=" + label + ")")
    return ""
}

[non-blocking, low] engram_node validator does not include label in its rejection log — lang/runtime/engram.el:33-37

The engram_node wrapper only logs node_type, but engram_node_full correctly also logs label. Minor inconsistency — not a bug since engram_node has no label argument, but worth verifying this message is still actionable for debugging.

[non-blocking, informational] Caller-side allowlist divergence risk

editor.el:31 hardcodes the default fallback "Entity" as a node type. This value is correctly in the engram_valid_node_type allowlist today. But since the allowlist and the default are in two separate files, they can drift. When new node types are added to the allowlist, or defaults change in callers, there's no automated cross-check. A compile-time constant or shared manifest would be a stronger solution long-term — but this is out of scope for a hotfix PR.


Build result

Cloned fix/engram-node-full-field-corruption successfully into /tmp/el-pr52/. No Makefile present at the root. emacs is not installed on this machine, so bytecode compilation was not attempted. The El compiler (lang/el-compiler/) is a self-hosted EL program and requires the EL runtime to run. Static review of the changed .el file found no syntax anomalies: function declarations, if guards, string concatenation, and return statements are all syntactically consistent with the patterns used throughout the rest of the runtime.

The C test infra stub in lang/tests/suite/infra.c:214-220 confirms the 8-arg signature is: (content, node_type, label, salience, importance, confidence, tier, tags) — exactly matching the new wrapper. The fix is structurally correct.


Verdict

APPROVE (with requested follow-up)

The core bug fix is correct, the comment documentation is excellent, and the existing call site is already aligned with the new signature. The missing test coverage is the only real gap — it's not blocking for a hotfix but should be addressed in a follow-up before this area is touched again. The full handoff doc referenced in the PR description (HANDOFF-engram-write-corruption.md) should also include a note that engram_valid_node_type and engram_valid_tier must be kept in sync with any new node type additions to the C runtime.

## Code Review — PR #52: Fix engram_node_full field corruption + add validation **Reviewed by:** automated review (Claude Code) **Files changed:** 1 (`lang/runtime/engram.el`) **Diff:** +42 / -3 --- ### Summary This PR fixes a real, production-impacting bug: the El wrapper for `__engram_node_full` had a stale 8-argument signature that didn't match the C primitive. Because `el_val_t` is an untyped machine word, the EL compiler accepted the mismatched call and forwarded arguments by position into the C function — placing floats in string slots, an integer in the tier slot, and so on. The result was ~100 corrupt graph nodes. The fix is correct: the new wrapper signature matches `el_seed.h` exactly (`content, node_type, label, salience, importance, confidence, tier, tags`), and the added validation guards are a genuinely useful defense-in-depth layer. This is a high-priority fix that should land quickly. --- ### What was done well **`lang/runtime/engram.el` lines 9–27 — Validation helpers** The allowlist approach is exactly right for an untyped runtime. Because `el_val_t` can't encode type failures at compile time, the only reliable safeguard is a runtime value check. Listing every legal `node_type` and `tier` string explicitly (rather than, say, a prefix check) is correct — it makes the contract visible in code and forces a deliberate update when new types are added. **`lang/runtime/engram.el` lines 38–56 — Corrected `engram_node_full` signature** The old signature: ``` fn engram_node_full(content: String, nt: String, sal: Float, imp: Float, source: String, lang: String, ts: Int, tags: String) ``` The new signature: ``` fn engram_node_full(content: String, node_type: String, label: String, salience: Float, importance: Float, confidence: Float, tier: String, tags: String) ``` This matches the infra.c stub signature exactly (`content, node_type, label, salience, importance, confidence, tier, tags`) which I verified in `lang/tests/suite/infra.c:214-220`. The 1:1 alignment eliminates the coercion-by-position bug completely. **The multi-line comment block at lines 38–47** is excellent. It documents exactly *why* the old signature was dangerous (untyped machine word, positional forwarding), what the old bug looked like field-by-field, and what the contract is. This level of explicitness is the right call for a footgun of this kind. **`engram_node` also gets validation (lines 33–37)** — good catch that the simpler wrapper was equally unprotected. **Failure is loud, not silent.** Returning `""` and printing `[engram] REJECTED ...` is the right pattern here: callers already check `if str_eq(node_id, "") { return err_json(...) }` (see `ui/vessels/el-graph/src/editor.el:33-34`), so the empty-string sentinel propagates correctly up the stack without swallowing the error. **The only active call site (`editor.el:32`) is already compatible** with the corrected signature. `engram_node_full(content, eff_type, eff_label, sal, sal, int_to_float(1), "Working", "")` maps exactly to the new parameter positions. The default `eff_type = "Entity"` is in the allowlist, and `"Working"` is in the tier allowlist. No callers are broken. --- ### Issues / Needs Work **[non-blocking, medium] No tests added for the new validators — `lang/runtime/engram.el:14-27`** `engram_valid_node_type` and `engram_valid_tier` are pure functions operating on strings — they are trivially testable with the existing `test.el` framework. The test suite in `lang/tests/` has `test_core.el`, `test_string.el`, etc. but no `test_engram.el`. This matters because: 1. The next person who adds a new node type (e.g. `"Wonder"`, `"Imprint"` — the latter already exists in the live Neuron system per the handoff doc) needs a failing test to remind them to update this file. 2. The existing allowlist is already slightly stale: `"Imprint"` is listed in `engram_valid_node_type` but is not present in the tier list. If `"Imprint"` is also a valid tier value in practice, the tier list needs updating. Suggested test additions (using the existing harness pattern from `test.el`): ``` fn test_valid_node_types(_: String) -> String { assert_true(engram_valid_node_type("Memory"), "Memory is valid") assert_true(engram_valid_node_type("Imprint"), "Imprint is valid") assert_false(engram_valid_node_type("Working"), "Working is NOT a node_type") assert_false(engram_valid_node_type(""), "empty string is invalid") return "" } fn test_valid_tiers(_: String) -> String { assert_true(engram_valid_tier("Working"), "Working is valid") assert_false(engram_valid_tier("Memory"), "Memory is NOT a tier") assert_false(engram_valid_tier(""), "empty string is invalid") return "" } ``` **[non-blocking, low] `engram_valid_tier` does not validate against empty string — `lang/runtime/engram.el:21-24`** If a caller passes `tier = ""` (e.g. from a JSON field that wasn't populated), `engram_valid_tier("")` returns `false` and the write is rejected — which is technically correct behavior. However, there's no explicit empty-string check with a distinct rejection message. The current behavior conflates "wrong tier value" and "missing tier" in the log output, which will make debugging harder. Consider: ``` if str_eq(tier, "") { __println("[engram] REJECTED node write — tier is empty (node_type=" + node_type + ", label=" + label + ")") return "" } ``` **[non-blocking, low] `engram_node` validator does not include `label` in its rejection log — `lang/runtime/engram.el:33-37`** The `engram_node` wrapper only logs `node_type`, but `engram_node_full` correctly also logs `label`. Minor inconsistency — not a bug since `engram_node` has no label argument, but worth verifying this message is still actionable for debugging. **[non-blocking, informational] Caller-side allowlist divergence risk** `editor.el:31` hardcodes the default fallback `"Entity"` as a node type. This value is correctly in the `engram_valid_node_type` allowlist today. But since the allowlist and the default are in two separate files, they can drift. When new node types are added to the allowlist, or defaults change in callers, there's no automated cross-check. A compile-time constant or shared manifest would be a stronger solution long-term — but this is out of scope for a hotfix PR. --- ### Build result Cloned `fix/engram-node-full-field-corruption` successfully into `/tmp/el-pr52/`. No `Makefile` present at the root. `emacs` is not installed on this machine, so bytecode compilation was not attempted. The El compiler (`lang/el-compiler/`) is a self-hosted EL program and requires the EL runtime to run. Static review of the changed `.el` file found no syntax anomalies: function declarations, `if` guards, string concatenation, and `return` statements are all syntactically consistent with the patterns used throughout the rest of the runtime. The C test infra stub in `lang/tests/suite/infra.c:214-220` confirms the 8-arg signature is: `(content, node_type, label, salience, importance, confidence, tier, tags)` — exactly matching the new wrapper. The fix is structurally correct. --- ### Verdict **APPROVE** (with requested follow-up) The core bug fix is correct, the comment documentation is excellent, and the existing call site is already aligned with the new signature. The missing test coverage is the only real gap — it's not blocking for a hotfix but should be addressed in a follow-up before this area is touched again. The full handoff doc referenced in the PR description (`HANDOFF-engram-write-corruption.md`) should also include a note that `engram_valid_node_type` and `engram_valid_tier` must be kept in sync with any new node type additions to the C runtime.
will.anderson approved these changes 2026-06-09 19:19:55 +00:00
Dismissed
will.anderson left a comment
Owner

Reviewed — code looks good, CI verified locally. Approved to merge.

Reviewed — code looks good, CI verified locally. Approved to merge.
will.anderson changed target branch from main to dev 2026-06-09 19:23:53 +00:00
will.anderson added 12 commits 2026-06-09 19:23:53 +00:00
Merge pull request 'promote: dev → stage' (#12) from dev into stage
El SDK CI - stage / build-and-test (push) Successful in 4m3s
El SDK Release / build-and-release (pull_request) Failing after 37s
e7e0f7d3e5
Merge pull request 'promote: dev → stage (runtime fix)' (#16) from dev into stage
El SDK CI - stage / build-and-test (push) Successful in 3m43s
El SDK Release / build-and-release (pull_request) Failing after 45s
f0c731d2db
Merge pull request 'promote: dev → stage (return type fix)' (#19) from dev into stage
El SDK CI - stage / build-and-test (push) Failing after 4m1s
El SDK Release / build-and-release (pull_request) Failing after 42s
17b1aa0736
Merge pull request 'promote: dev → stage (__-prefixed runtime fix)' (#22) from dev into stage
El SDK CI - stage / build-and-test (push) Successful in 3m48s
El SDK Release / build-and-release (pull_request) Failing after 1m4s
979a5677d5
Merge pull request 'promote: dev → stage (__http_do_map_to_file)' (#25) from dev into stage
El SDK CI - stage / build-and-test (push) Successful in 3m44s
El SDK Release / build-and-release (pull_request) Failing after 47s
3e29fc43ab
Merge pull request 'promote: dev → stage (elb build fix)' (#28) from dev into stage
El SDK CI - stage / build-and-test (push) Successful in 3m51s
El SDK Release / build-and-release (pull_request) Failing after 38s
fd208583fe
promote: dev → stage (elb build fix)
Merge pull request 'promote: dev → stage (CI rebuild fix + ci-base refresh)' (#32) from dev into stage
El SDK Release / build-and-release (pull_request) Failing after 35s
El SDK CI - stage / build-and-test (push) Successful in 3m47s
c0553459e1
promote: dev → stage (CI rebuild fix + ci-base refresh)
Merge pull request 'promote: dev → stage (elb gcc fix)' (#35) from dev into stage
El SDK Release / build-and-release (pull_request) Failing after 40s
El SDK CI - stage / build-and-test (push) Successful in 3m45s
9c7bde47dc
promote: dev → stage (elb gcc fix)
Merge pull request 'promote: dev → stage (elb linker fixes)' (#38) from dev into stage
El SDK Release / build-and-release (pull_request) Failing after 1m2s
El SDK CI - stage / build-and-test (push) Successful in 3m56s
8fa9c4ba20
promote: dev → stage (elb linker fixes)
Merge pull request 'promote: dev → stage (all elb linker fixes)' (#41) from dev into stage
El SDK Release / build-and-release (pull_request) Successful in 3m51s
El SDK CI - stage / build-and-test (push) Successful in 4m11s
d8e9fd12f4
promote: dev → stage (all elb linker fixes)
promote: stage → main (all elb linker fixes + ci-base rebuild)
Fix engram_node_full wrapper field corruption + add node_type/tier validation
El SDK Release / build-and-release (pull_request) Failing after 9s
dfe4e83ed1
The wrapper signature was stale and didn't match the C primitive
__engram_node_full(content, node_type, label, salience, importance, confidence, tier, tags).
Because el_val_t is an untyped machine word, the compiler coerced caller args to the
wrong declared param types and forwarded them BY POSITION — so tier received an int,
importance/confidence received strings, label received a float, etc. (~100 corrupt nodes).

- Correct the wrapper to match the C contract 1:1 (no coercion, no reorder).
- Add engram_valid_node_type / engram_valid_tier allowlists; engram_node and
  engram_node_full now reject invalid values with __println + return "" (fail loud,
  no silent malformed write).

See neuron repo: HANDOFF-engram-write-corruption.md for the full write-up + deploy runbook.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
will.anderson approved these changes 2026-06-09 19:24:22 +00:00
will.anderson left a comment
Owner

Re-approved after retargeting to dev (gitflow compliance). Ready to merge.

Re-approved after retargeting to dev (gitflow compliance). Ready to merge.
tim.lingo added 1 commit 2026-06-10 11:26:29 +00:00
fix(engram): allow SessionSummary node_type in validation allowlist
El SDK CI - dev / build-and-test (pull_request) Successful in 3m47s
c2afcbddf5
handle_api_consolidate writes a "SessionSummary" node, but engram_valid_node_type
omitted it — so once this validation ships, every consolidate() would be silently
REJECTED at the engram boundary. Add SessionSummary to the allowlist.

Found in Will's PR review of neuron #1 / el #52.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
will.anderson merged commit 15ea584671 into dev 2026-06-10 22:01:42 +00:00
Sign in to join this conversation.
No Reviewers
No labels
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: neuron-technologies/el#52