fix(chat): prevent double-escape corruption of messages/tools in agentic bridge #20

Merged
will.anderson merged 1 commits from fix/bridge-save-serialization into main 2026-06-17 18:08:18 +00:00
Owner

Summary

  • bridge_save was calling json_safe() on messages and tools_json before storing them as JSON string fields. Both values are already well-formed JSON arrays containing double quotes, so json_safe added a second escape layer. agentic_resume called json_get() which strips only one layer, leaving the messages array corrupted before passing it back to agentic_loop.
  • Fix: store messages as messages_raw and tools_json as tools_raw as inline raw JSON values (not string-escaped), and read them back with json_get_raw. Scalar strings (model, safe_sys, tools_log, tool_use_id) remain as string fields via json_safe/json_get since they do not contain nested JSON structure.
  • Backward compatibility: agentic_resume falls back to the old string-escaped fields if the raw fields are absent, so any session suspended before this fix can still be resumed.
  • Also fixes write_file returning a pre-escaped literal instead of calling json_safe consistently with every other tool result.

Test plan

  • Trigger an agentic session that suspends on an MCP tool (bridge save)
  • POST the tool result to resume -- verify the messages array is intact and the conversation continues correctly
  • Verify multi-turn agentic chaining (multiple tool suspensions in sequence) works end-to-end
  • Verify write_file tool result parses correctly in the tool loop
## Summary - `bridge_save` was calling `json_safe()` on `messages` and `tools_json` before storing them as JSON string fields. Both values are already well-formed JSON arrays containing double quotes, so `json_safe` added a second escape layer. `agentic_resume` called `json_get()` which strips only one layer, leaving the messages array corrupted before passing it back to `agentic_loop`. - Fix: store `messages` as `messages_raw` and `tools_json` as `tools_raw` as inline raw JSON values (not string-escaped), and read them back with `json_get_raw`. Scalar strings (`model`, `safe_sys`, `tools_log`, `tool_use_id`) remain as string fields via `json_safe`/`json_get` since they do not contain nested JSON structure. - Backward compatibility: `agentic_resume` falls back to the old string-escaped fields if the raw fields are absent, so any session suspended before this fix can still be resumed. - Also fixes `write_file` returning a pre-escaped literal instead of calling `json_safe` consistently with every other tool result. ## Test plan - [ ] Trigger an agentic session that suspends on an MCP tool (bridge save) - [ ] POST the tool result to resume -- verify the messages array is intact and the conversation continues correctly - [ ] Verify multi-turn agentic chaining (multiple tool suspensions in sequence) works end-to-end - [ ] Verify `write_file` tool result parses correctly in the tool loop
Author
Owner

Review: fix(chat): prevent double-escape corruption of messages/tools in agentic bridge

The core approach is correct — embedding messages and tools_json as raw JSON values in the bridge blob rather than string-escaping them eliminates the round-trip corruption that was silently destroying nested quotes on resume. The direction is right. Two blockers need addressing before merge.

Blockers

1. Double typed let declaration (chat.el, agentic_resume)

Lines 798-801 use let messages: String = ... twice in the same scope, and same for tools_json. This pattern does not appear elsewhere in the codebase — every other variable fallback uses untyped reassignment (let messages = ...). If El treats the second typed declaration as a new binding that shadows the first, the self-referential RHS (if str_eq(messages, "") { ... } else { messages }) reads from the first binding — which may work but is semantically unusual and depends on El's specific scoping semantics. If it does not permit typed re-declaration, the fallback path is dead code.

Use the established untyped reassignment pattern:

let messages: String = json_get_raw(blob, "messages_raw")
let messages = if str_eq(messages, "") { json_get(blob, "messages") } else { messages }
let tools_json: String = json_get_raw(blob, "tools_raw")
let tools_json = if str_eq(tools_json, "") { json_get(blob, "tools_json") } else { tools_json }

Also confirm: does json_get_raw return "" (not "null") when the key is absent? The guard assumes "".

2. No validation against empty messages/tools_json in bridge_save

The blob construction now does:

",\"messages_raw\":" + messages
",\"tools_raw\":" + tools_json

If either value is an empty string (bug in a caller, or a partially-initialised loop state), the result is invalid JSON. state_get + json_get_raw will then silently return "" on resume, and the agentic loop will start a fresh turn with no conversation context rather than surfacing an error.

Add a guard at the top of agentic_resume after the fallback resolution:

if str_eq(messages, "") || str_eq(tools_json, "") {
    return "{\"error\":\"corrupt bridge state\",\"reply\":\"\"}"
}

Warnings

session_list() fallback for empty session_index — the old route_sessions() (removed) hit engram directly. session_list() in sessions.el needs to handle an empty session_index state key on first boot or for users upgrading from pre-sessions-index builds. Worth verifying the fallback path exists there.

safe_sys double-escapehandle_chat_agentic passes safe_sys = json_safe(system) to bridge_save, which calls json_safe(safe_sys) again on it. On resume, json_get un-escapes once, restoring the singly-escaped form, which is then embedded directly in the API body. This is consistent but fragile — a comment confirming the invariant would prevent future breakage.

Minor

The write_file fix (json_safe("{\"ok\":true}")) is correct. A constant string through json_safe is harmless.

The DELETE and PATCH session routes added to routes.el look structurally correct and consistent with the existing GET/POST patterns.

Missing tests

  • Bridge save/resume round-trip with nested quotes in messages/tool inputs
  • Legacy blob fallback (messages/tools_json fields, no _raw variants)
  • Empty/corrupt state guard in agentic_resume
  • DELETE and PATCH /api/sessions/:id happy and sad paths
  • GET /api/sessions regression after removal of route_sessions()
## Review: fix(chat): prevent double-escape corruption of messages/tools in agentic bridge The core approach is correct — embedding `messages` and `tools_json` as raw JSON values in the bridge blob rather than string-escaping them eliminates the round-trip corruption that was silently destroying nested quotes on resume. The direction is right. Two blockers need addressing before merge. ### Blockers **1. Double typed `let` declaration (chat.el, agentic_resume)** Lines 798-801 use `let messages: String = ...` twice in the same scope, and same for `tools_json`. This pattern does not appear elsewhere in the codebase — every other variable fallback uses untyped reassignment (`let messages = ...`). If El treats the second typed declaration as a new binding that shadows the first, the self-referential RHS (`if str_eq(messages, "") { ... } else { messages }`) reads from the *first* binding — which may work but is semantically unusual and depends on El's specific scoping semantics. If it does *not* permit typed re-declaration, the fallback path is dead code. Use the established untyped reassignment pattern: ```el let messages: String = json_get_raw(blob, "messages_raw") let messages = if str_eq(messages, "") { json_get(blob, "messages") } else { messages } let tools_json: String = json_get_raw(blob, "tools_raw") let tools_json = if str_eq(tools_json, "") { json_get(blob, "tools_json") } else { tools_json } ``` Also confirm: does `json_get_raw` return `""` (not `"null"`) when the key is absent? The guard assumes `""`. **2. No validation against empty messages/tools_json in bridge_save** The blob construction now does: ``` ",\"messages_raw\":" + messages ",\"tools_raw\":" + tools_json ``` If either value is an empty string (bug in a caller, or a partially-initialised loop state), the result is invalid JSON. `state_get` + `json_get_raw` will then silently return `""` on resume, and the agentic loop will start a fresh turn with no conversation context rather than surfacing an error. Add a guard at the top of `agentic_resume` after the fallback resolution: ```el if str_eq(messages, "") || str_eq(tools_json, "") { return "{\"error\":\"corrupt bridge state\",\"reply\":\"\"}" } ``` ### Warnings **session_list() fallback for empty session_index** — the old `route_sessions()` (removed) hit engram directly. `session_list()` in sessions.el needs to handle an empty `session_index` state key on first boot or for users upgrading from pre-sessions-index builds. Worth verifying the fallback path exists there. **safe_sys double-escape** — `handle_chat_agentic` passes `safe_sys = json_safe(system)` to `bridge_save`, which calls `json_safe(safe_sys)` again on it. On resume, `json_get` un-escapes once, restoring the singly-escaped form, which is then embedded directly in the API body. This is consistent but fragile — a comment confirming the invariant would prevent future breakage. ### Minor The `write_file` fix (`json_safe("{\"ok\":true}")`) is correct. A constant string through `json_safe` is harmless. The DELETE and PATCH session routes added to routes.el look structurally correct and consistent with the existing GET/POST patterns. ### Missing tests - Bridge save/resume round-trip with nested quotes in messages/tool inputs - Legacy blob fallback (`messages`/`tools_json` fields, no `_raw` variants) - Empty/corrupt state guard in `agentic_resume` - DELETE and PATCH /api/sessions/:id happy and sad paths - GET /api/sessions regression after removal of `route_sessions()`
will.anderson added 1 commit 2026-06-17 18:07:57 +00:00
fix(chat): harden bridge_save/agentic_resume against empty and corrupt state
Neuron Soul CI / build (pull_request) Failing after 13m18s
8db3c8c7f7
BLOCKER 1: use untyped reassignment (let x = ...) for the fallback bindings
in agentic_resume instead of re-declaring typed let bindings (let x: Type = ...)
for the same variable in the same scope. The typed form risks shadowing semantics
that differ from the established pattern used everywhere else in the loop
(e.g. agentic_loop line 720).

BLOCKER 2: add empty-string guards in both bridge_save and agentic_resume.
bridge_save now returns false without writing state if messages or tools_json
is empty — preventing syntactically invalid JSON blobs. agentic_resume now
returns an error envelope after the fallback resolution if either field is
still empty, rather than passing empty strings into agentic_loop which would
silently start a fresh turn with no context.

Also add tests:
- test_bridge_serialization.el: covers bridge_save empty-guard, golden-path
  raw-JSON round-trip, agentic_resume unknown/corrupt/missing-fields paths,
  and legacy string-escaped fallback path
- test_sessions_routes.el: covers DELETE and PATCH /api/sessions/:id routes
  (valid args, unknown id, empty body) and GET /api/sessions regression after
  removal of the duplicate route_sessions() handler
will.anderson force-pushed fix/bridge-save-serialization from 459a9cbff2 to 8db3c8c7f7 2026-06-17 18:07:57 +00:00 Compare
will.anderson merged commit 2797909633 into main 2026-06-17 18:08:18 +00:00
will.anderson deleted branch fix/bridge-save-serialization 2026-06-17 18:08:24 +00:00
Sign in to join this conversation.
No Reviewers
No labels
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: neuron-technologies/neuron#20