- Cache bell node result in engram_compile state (engram_compile_bell_node)
so handle_chat affective_prefix reads the cached value instead of firing
a duplicate engram query for distress signals (Issue 2)
- Cache primary activation result in engram_compile state
(engram_compile_activation_json) using nodes0 from engram_compile_multi
- Replace redundant engram_activate_json(message, 2) in strengthen_chat_nodes
with state_get(engram_compile_activation_json) - eliminates a third
activation query per turn (Issue 7)
- engram_compile already has object-boundary truncation and cross-set
dedup via engram_nodes_merge/engram_dedup_nodes (Issues 1, 6, 9)
Merge propose/agent-workspace-root-read (Tim's PR #28):
- path_within_root now appends '/' to root before prefix check (closes proj_evil bypass)
- edit_file in dispatch_tool now checks agent_workspace_root() and resolves path
- handle_chat_agentic reads agent_workspace_root from request body, only persists if non-empty
- Safety screen preserved after workspace root read (conflict resolved)
Merge improve/safety-crisis-detection (PR #31): reads layered_cycle_safety_system_addendum
from state and appends to system prompt on each turn (cleared after use to prevent bleed).
Safety ts extraction falls back to updated_at. Affective prefix now wires into system build.
Conflict with PR #33 resolved: capability_rules and session_preload both preserved.
- engram_compile: BellEvent nodes do not carry created_at in the engram
node JSON; extract the unix timestamp from the embedded ' | ts:NNNNN'
pattern in the content string instead. Fall back to created_at/updated_at
if the marker is absent. Guard str_to_int against empty string so the 72h
recency check never silently treats every node as epoch-0 stale.
- auto_persist: append the current unix timestamp to the BellEvent label
('bell:soft:1749876543') to make it unique per turn. The previous label
('bell:soft') was the same for every soft bell, causing engram to treat
all subsequent writes as updates to the same node.
- Remove dead soft_bell block in layered_cycle that wrote soul_safety_system_augment
to state but was never read; safety augmentation now goes through the correct
layered_cycle_safety_system_addendum state key read by build_system_prompt
- build_system_prompt now reads layered_cycle_safety_system_addendum and appends
it to the system prompt, clearing the key after consumption
- Timestamp extraction for distress nodes falls back to updated_at when created_at
is empty, preventing the 72h recency check from always treating nodes as stale
- Issue #3: err_404/err_405 now emit HTTP 404/405 via __status__ envelope instead of HTTP 200
- Issue #4: add auth_check() function to handle_request; enforces NEURON_TOKEN on all routes except /health and /lineage
- Issue #5: missing required params now return HTTP 400 (__status__ envelope) in /api/chat (GET+POST), /imprint/contextual, /imprint/user, and handle_chat
- Issue #6: LLM unavailable in handle_chat now returns HTTP 503 instead of HTTP 200
- Issue #7: add 32 KB message size guard on POST /api/chat before engram_compile and LLM
- Issue #8: add TODO comment to route_health documenting the live-engram-query problem and the /health/deep split plan
- Issue #9: add comment to hist_trim documenting fragile str_index_of parser and silent data corruption risk
- Issue #10: add TODO comment in handle_request documenting missing per-IP rate limiting
- Issue #11: fix connectd_post temp file collision — add monotonic sequence counter so concurrent requests get unique paths
- Issue #12: fix call_mcp_bridge fixed temp file race — add monotonic sequence counter for unique paths under concurrent load
- Issues #1/#2: add TODO comment in handle_request documenting EL no-exception limitation and SIGSEGV handler gap
Issue #5: detect empty string from llm_extract_text() as an error in handle_chat,
handle_chat_as_soul, and handle_dharma_room_turn. The C runtime silently returns ""
when the LLM response content array is missing or all blocks fail to parse; without
this guard the empty string passes through to callers as a silent empty reply.
Issue #9: make agentic_loop max_tokens configurable via NEURON_LLM_MAX_TOKENS env
var (default 4096). The hardcoded value is marginal for long tool chains (8 iterations
x 4096 tokens); operators can now set 8192+ for complex multi-step tasks without
rebuilding. Non-agentic path (llm_call_system) still uses the C runtime hardcode —
that fix lives in el_runtime.c (see TODO block added in this commit).
Issue #10: increase connector_tools_json and tool_auto_approved curl --max-time from
2s to 5s to reduce false-empty tool lists when neuron-connectd is under transient
load. Graceful degradation to [] on bridge down is unchanged.
Issues #1/#2/#3/#4/#6/#8: documented as TODO comments in chat.el. These require
targeted C runtime changes in el_runtime.c (llm_provider_request retry loop,
EL_LLM_TIMEOUT_MS separation, HTTP 429 backoff, 5xx retry, EL_HTTP_MAX_RESPONSE_BYTES
cap). Architectural decisions recorded so they are traceable to root causes.
- sessions.el: add session_exists() for chat-path session validation (ISSUE #6/#7)
- sessions.el: add session_create_cleanup() for ghost-session rollback (ISSUE #1)
- sessions.el: set session_pending_first_msg flag in session_create; clear it in
session_hist_save so the first successful chat marks the session active (ISSUE #1)
- sessions.el: session_delete now clears mcp_bridge:<id> and always_allow_<id>
state keys so abandoned pending-tool sessions do not accumulate (ISSUE #5)
- sessions.el: add TODO comments for ISSUE #2 (no TTL/expiry), ISSUE #3
(non-atomic delete-then-create), ISSUE #4 (no concurrent-create guard),
and ISSUE #8 (reconnect/duplicate resume race) where fixes are too invasive
to land without new runtime primitives
- chat.el: validate session_id exists via session_exists() before entering
agentic_loop; unknown session_ids now return a 404-style error instead of
silently starting a fresh empty session (ISSUE #6/#7)
Every engram_node_full call that dropped its return value now binds it
and emits a println on empty string. engram_save calls in consolidate,
heartbeat, and dharma-room-turn are checked for failure. The two API
handlers (log_state_event, tune_config) that skipped api_persisted()
now match the read-back-after-write contract used everywhere else in
neuron-api.el.
Files changed:
- chat.el: conv_history_persist, handle_dharma_room_turn, auto_persist
- soul.el: emit_session_start_event, seed_persona_from_env HTTP check
- memory.el: mem_save, mem_boot_count_inc
- neuron-api.el: handle_api_log_state_event, handle_api_tune_config,
handle_api_consolidate (engram_save + session summary write)
- awareness.el: ise_post local-engram fallback path
TODO comments added for non-atomic patterns (issues #12, #13) and
the missing circuit breaker (#14) — these require new primitives.
Issues addressed:
- #1 ASYMMETRIC PERSIST/LOAD: conv_history_load() now tries engram_get_node_by_label()
first (symmetric with the label-based write), falling back to vector search only when
label lookup returns nothing. Immune to cold/corrupt vector index.
- #2 SILENT LOAD FAILURE: all failure paths in conv_history_load() and conv_history_persist()
now emit a println log line rather than silently returning "" or dropping writes.
- #3 NO RECOVERY PATH: documented as TODO with explanation of why a full recovery path
(retry, ID fallback, orphan cleanup) is too invasive for a targeted fix here.
- #4 OVERWRITE WITHOUT DELETE: documented with TODO to replace engram_node_full with
explicit delete-then-create once engram exposes a label-scoped delete API.
- #5/#10 BROKEN TRIM / OFF-BY-ONE: hist_trim() rewritten to use json_array_len /
json_array_get (structural JSON ops) instead of raw str_index_of scanning for
'{"role":' markers. Immune to marker strings appearing inside message content.
Minimum retained count guard added: never trims below 2 entries.
- #6 PARTIAL-WRITE GUARD: conv_history_persist() refuses to write a blob that doesn't
contain both '[' and ']'. conv_history_load() requires both before accepting content.
- #7 DUAL STORAGE: documented with a comment at the persist call site.
- #8 NO MAX SIZE GUARD: documented as TODO with rationale for why a byte-length cap
requires a more invasive change (entry truncation or summarisation).
- #9 AGENTIC HISTORY NOT PERSISTED: handle_chat_agentic() now calls conv_history_persist()
for the default global session (hist_key == "conv_history") after updating state,
matching the non-agentic path's durability. Named sessions remain in-process only.
- auto_persist: detect bell level (soft/hard) on every user message using
safety_detect_bell_level; write a dedicated BellEvent engram node with
calibrated salience alongside the Conversation node when a bell fires.
Tag the Conversation node with bell:soft/bell:hard and 'affective' for
direct discovery without scanning all chat nodes.
- auto_persist: track per-session bell count, dominant level, and last
signal in state (session_bell_count/level/signal keys) so downstream
functions can act on the emotional history without re-scanning engram.
- engram_compile: include the top-1 most recent BellEvent node within 72h
in every context build. Distress context from earlier turns (same or
recent session) automatically travels into all subsequent LLM calls.
- hist_trim_with_bell_guard: replace hist_trim at the handle_chat call site.
Before evicting the oldest turn from the 20-turn window, inspect the user
message for bell signals. If a bell was present, write a preservation
BellEvent to engram before dropping the turn so the full message survives
the rolling window.
- session_hist_save: after writing the history node, check session bell
counters. On the first save where bell_count > 0, write a
session:emotional-summary BellEvent node with distress signal, count,
and dominant level. A state flag prevents duplicate writes on subsequent
saves in the same session.
- engram_compile: rank search results by recency x relevance before including
in context. Pulls 20 candidates, scores each (salience * importance * recency
decay), keeps top 8. Eliminates stale/low-signal nodes that diluted context.
- handle_chat: on hist_len==0 (session start), proactively load user profile
and active-work context from engram and inject as brief bullets in the system
prompt. Gives the soul grounding before any conversation history exists.
- build_system_prompt: add [CAPABILITY GAPS] directive instructing the soul to
offer partial help and reasoning instead of flat "I don't have access to that"
refusals when a tool is missing.
- handle_chat_agentic: run safety_screen at entry, mirroring layered_cycle.
Hard bell exits immediately with the crisis response without entering the loop.
- agentic_loop: surface the 8-iteration cap explicitly in the error envelope
("agentic loop hit the 8-iteration cap...") rather than the opaque "no response".
Add iterations count to both the error and success envelopes for observability.
- soul.el: fix state key bug in layered_cycle (conversation_history -> conv_history)
- safety.el: add indirect crisis location patterns to soft_bell phrase list
- soul.el: wire safety_augment_system into layered_cycle for soft_bell turns
- chat.el: load cross-session affective context at session start when distress signals found within 72h
REPRODUCED: in the non-agentic path (Tools off / 'Just chat'), asking for
tool-work makes the model role-play tool use — it emits a fake ```json {...}```
'tool call' and says 'let me search/query/pull your sessions' while NOTHING
runs. Reads as a broken/lying app. (The agentic path is fine: verified it
calls search_memory and reports honestly.)
Root cause: build_system_prompt (handle_chat, the tool-less path) never told
the model it has no tools this turn, so it fabricated.
Fix: add a NO-TOOLS directive to the non-agentic system prompt — never emit
tool calls / JSON tool blocks / 'let me pull...' narration; answer from context
only; if a tool is truly needed, say so in one sentence and tell the user to
turn Tools on. Applied to chat.el (source) AND dist/soul.c (the curated TU the
CI compiles), so the CI-built binary carries it.
Verified the FABRICATION repro on the live local soul; could not verify the
patched binary locally (no matching el-runtime version on this machine — a
hand-link against origin/main runtime 404s on all routes). Builds correctly via
CI, which links soul.c against the pinned runtime.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Completes the UI<->soul contract for #23 (scope file/command tools to an agent
workspace root). #23 made the tools read state_get("agent_workspace_root"), but
nothing set that key from the desktop UI, so the agent panel's Workspace Folder
was cosmetic and tools ran unscoped (default-allow). This reads the root the UI
now sends on each agentic request and state_sets it before tool dispatch, so
agent_workspace_root() picks it up for the turn.
Minimal + pattern-matching (same json_get/state_set shape used throughout chat.el).
Empty body field => unscoped (backward-compatible) and preserves the env fallback.
FOR WILL'S REVIEW — do not merge without sign-off:
- Ownership model: set state from body each turn (so clearing the folder un-scopes)
vs. only-when-nonempty. Flagged inline.
- Pairs with neuron-ui PR #32 (ChatRequest.agentWorkspaceRoot).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Confine the agentic file tools (read_file, write_file, list_files, grep)
to a configured workspace subtree via a lexical path check, and run
run_command with its cwd set to that root. Root comes from state key
"agent_workspace_root" or env NEURON_AGENT_ROOT. When no root is set,
behavior is unchanged (unscoped) for backward compatibility.
Defense-in-depth, NOT a hard boundary: the lexical guard does not resolve
symlinks and cannot stop an arbitrary shell command from cd-ing out of the
root. Real confinement needs runtime support (cwd-locked exec / sandbox-exec
/ chroot) in el_runtime.c.
Compile-checked with elc (darwin arm64); not link/run-gated locally
(darwin elb unavailable). Needs a soul build + smoke test before merge.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
agentic_tools_literal() already contains a custom web_search tool.
agentic_tools_with_web() adds the Anthropic server-side web_search_20250305
tool (also named web_search). Combining them caused Anthropic to reject
every agentic request with 'Tool names must be unique.'
agentic_tools_all() now calls agentic_tools_literal() directly. Connector
tools splice in as before. The web_search-only variant (agentic_tools_with_web)
is unchanged for callers that specifically want native search without connectors.
fix(soul): add HTTP-engram guard to safe_to_seed — when ENGRAM_URL is set
the HTTP Engram owns persistence; genesis must never save to local snapshot
regardless of node counts (was: guard_disk forced to empty string, making
the ratio check vacuously true and allowing init_soul_edges+engram_save).
fix(soul): use multiplication form for ratio guard — node_count * 16000 <
disk_len avoids floor-division truncation that underestimated boundary files
(250KB / 16000 = 15.6, floors to 15; a 15-node graph wrongly passed old guard).
fix(chat): add safety_augment_system to handle_chat_as_soul,
handle_dharma_room_turn, and handle_dharma_room_turn_agentic — all three
called the LLM without Hard Bell evaluation, leaving users in dharma rooms
without crisis resource routing.
fix(neuron-api): add api_persisted read-back to handle_api_define_process —
was the only write handler that returned ok:true without verifying the node
was actually written to engram.
fix(routes): unique temp file path in connectd_post — replaces fixed
/tmp/neuron-connectors-req.json with a timestamped path to prevent
collision if concurrency is added or two soul instances share a machine.
test: add tests/test_bell_safety.el — covers safety_detect_bell_level
(none/soft/hard), safety_classify_hard_bell (abuse/self_harm routing),
safety_normalize (smart-quote), safety_augment_system, and
handle_safety_contact_post (validation + read-back).
test: add tests/test_soul_guard.el — pure-function logic tests for the
safe_to_seed predicate: 200KB boundary, 47MB/63-node clobber scenario,
HTTP-engram mode, multiplication vs division truncation at 250KB.
test: add tests/test_api_define_process.el — verifies the define_process
write is read-back verified after the fix.
BLOCKER 1: use untyped reassignment (let x = ...) for the fallback bindings
in agentic_resume instead of re-declaring typed let bindings (let x: Type = ...)
for the same variable in the same scope. The typed form risks shadowing semantics
that differ from the established pattern used everywhere else in the loop
(e.g. agentic_loop line 720).
BLOCKER 2: add empty-string guards in both bridge_save and agentic_resume.
bridge_save now returns false without writing state if messages or tools_json
is empty — preventing syntactically invalid JSON blobs. agentic_resume now
returns an error envelope after the fallback resolution if either field is
still empty, rather than passing empty strings into agentic_loop which would
silently start a fresh turn with no context.
Also add tests:
- test_bridge_serialization.el: covers bridge_save empty-guard, golden-path
raw-JSON round-trip, agentic_resume unknown/corrupt/missing-fields paths,
and legacy string-escaped fallback path
- test_sessions_routes.el: covers DELETE and PATCH /api/sessions/:id routes
(valid args, unknown id, empty body) and GET /api/sessions regression after
removal of the duplicate route_sessions() handler
When agentic_loop suspends for an MCP bridge tool it returns a
{"tool_pending":true,...} envelope with no "reply" key. Without an
explicit check, json_get(loop_result, "reply") returns "" and the
function emitted {"response":"","cgi_id":"..."} — a silent empty
response indistinguishable from a successful LLM turn with no content.
Two guards added after the existing error check:
1. tool_pending passthrough: if the loop suspended, return the pending
envelope directly so callers (dharma room orchestrators) can
distinguish suspension from failure and route to the approve flow.
2. Empty-reply guard: if final_text is empty after the pending check,
return an explicit {"error":"no response",...} envelope instead of
silently succeeding with an empty response field.
Also adds tests/test_agentic_tools.el:
- agentic_tools_all() includes all literal tool names and web_search
- connector_tools_json() returns valid JSON when bridge is down (graceful degradation)
- tool_pending envelope detection patterns (the is_pending logic)
- json_get(pending_envelope, "reply") returns "" confirming the empty-reply
guard is load-bearing (pure string/JSON, no LLM or network required)
BLOCKER 1 (sessions.el, modern path): Add guard that rejects allow
action when tool_name is missing from the body. Previously, omitting
tool_name caused dispatch_tool("", ...) to return "unknown tool: " and
silently inject a corrupted tool_result into the conversation.
BLOCKER 2 (sessions.el, modern path): Stop re-executing client-side
tools server-side. When the client provides body["content"], use it
directly as the tool result (matching the handle_tool_result contract).
Only fall back to dispatch_tool for builtin tools when no content is
present. Non-builtin tools with no client content now return a clear
error instead of a broken dispatch attempt.
WARNING 1 (chat.el, agentic_loop): Wire always_allow_<session_id> state
into the bridge-suspension decision. When a tool is in the session's
always-allow list, treat it as locally dispatchable (like a builtin)
and skip the bridge pause, so the approval UI is never shown again for
that tool in that session.
WARNING 2 (sessions.el, legacy path): Read a "tools_variant" field from
the legacy pending blob when present, and call the corresponding
agentic_tools_*() variant on resume. Falls back to agentic_tools_literal()
for blobs written before this field existed.
tests/test_sessions_approve.el: Add 10-case test suite covering:
- empty session_id / missing call_id / missing action guards
- no pending tool returns correct error
- missing tool_name on allow returns error (BLOCKER 1)
- deny action does not require tool_name
- legacy call_id mismatch returns mismatch error
- always action records tool_name in always_allow state
- allow with client content skips re-execution (BLOCKER 2)
handle_chat_agentic was calling agentic_tools_with_web(), which omits
MCP connector tools, so mcp__* calls were never available in agentic
mode even when neuron-connectd is running.
Switch both agentic entry points to agentic_tools_all(). For
handle_dharma_room_turn_agentic, also replace the inline 8-iteration
loop with a call to agentic_loop() so bridge suspension and the full
connector tool set work consistently. Session IDs are prefixed with
'dharma:' + room_id so suspensions stay room-scoped.
bridge_save was wrapping messages and tools_json with json_safe() before
storing them as string fields. Since both are already well-formed JSON arrays
containing double quotes, json_safe added a second escape layer. agentic_resume
then called json_get() which stripped only one layer, leaving the messages array
corrupted before it was passed back into agentic_loop.
Fix: store messages as messages_raw and tools_json as tools_raw as inline raw
JSON values (unquoted), and read them back with json_get_raw. Backward
compatibility: fall back to the old string-escaped fields if the raw fields are
absent, so sessions saved before this fix can still be resumed.
Also fixes write_file returning a pre-escaped literal instead of calling
json_safe consistently with every other tool result.
Short/ambiguous messages (< 50 chars) now use the last reply as the
engram activation seed instead of the bare message. Prevents strong
off-topic memory nodes from hijacking replies when the user is clearly
continuing an existing thread.
Also gives handle_chat_agentic session continuity: reads/writes history
keyed by session_id (falling back to global conv_history), seeds the
LLM messages array with prior turns, and saves replies back so the
next turn has context.
Applies connector-specific additions from feat/connectors-soul:
- chat.el: connector_tools_json(), agentic_tools_all(), call_mcp_bridge(),
tool_auto_approved() and mcp__ dispatch in dispatch_tool()
- routes.el: connectd_get/post, handle_connectors(), /api/connectors routing
in GET and POST sections
- MEMORY_RECALL_BUG.md: investigation notes on memory retrieval failure
The agentic loop rewrite in the source branch was not applied — it conflicts
with the tool-bridge pattern from PR #5 which is the chosen design for
client-side MCP tool execution. The connectors themselves are now fully
wired: connector tools surface as mcp__<server>__<tool> in the tools array
and dispatch to neuron-connectd via call_mcp_bridge().
When handle_chat_agentic hits a tool the soul cannot run in-process (an MCP
connector/plugin surfaced by the Kotlin desktop app), instead of returning
"unknown tool" it now suspends the agentic loop and returns a tool_pending
envelope so the CLIENT executes the tool and posts the result back. Built-in
tools (read_file/write_file/web_get/search_memory/run_command) and Anthropic's
native web_search are unchanged.
Client contract:
- Soul returns (HTTP 200) on an unknown tool:
{ "tool_pending": true, "session_id": "br-...", "call_id": "<tool_use_id>",
"tool_name": "...", "tool_input": { ... }, "model": "...",
"agentic": true, "tools_used": [...] }
- Client runs the MCP tool, then POSTs to
/api/sessions/{session_id}/tool_result
with body:
{ "call_id": "<the call_id from the envelope>",
"content": "<MCP tool output as a string>" }
- Soul resumes the loop and returns the same envelope shape: either a final
{ "reply": ..., "tools_used": [...] }
or another tool_pending if the continuation needs a further MCP tool
(fully chainable). Saved continuation is one-shot (cleared on resume).
elc-verified (--target=c, exit 0, no stderr) on chat.el, routes.el, and the
full soul.el import graph. Needs Will's build to ship.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
handle_chat_agentic now always attaches Anthropic's native web_search_20250305
tool instead of gating it behind a per-request web_search flag. Web search is a
built-in capability: the model invokes it only when a query needs fresh info
(max_uses:5 caps it), so there is no user-facing toggle. The body's web_search
field is now ignored (back-compat — old UI clients sending it cause no harm).
Pairs with neuron-ui removing the chat-input web search toggle.
Note: .el change only — no elc on the authoring machine; reviewer builds/verifies.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
When a chat request carries web_search=true, handle_chat_agentic now attaches Anthropic's
NATIVE server-side web_search tool (web_search_20250305) to the request. The native tool is
executed by Anthropic (not by the soul), so it returns real results with citations and needs
no local runtime — it sidesteps the soul's lack of executable tools entirely.
- new agentic_tools_with_web(web_search) helper (appends the native tool to the standard set)
- handle_chat_agentic reads json_get_bool(body,"web_search") and uses it
Pairs with neuron-ui: ChatRequest.web_search + the chat-input Web search toggle.
Note: built/verified by reviewer — no elc on the authoring machine.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
chat.el recorded the soul's utterance via engram_node(content, "episodic", ...),
putting a TIER into the node_type slot (nodes showed node_type="episodic"). Now uses
engram_node_full(..., "Conversation", "soul:utterance", ..., "Episodic", tags).
The core wrapper fix is in the el repo (PR #52). HANDOFF-engram-write-corruption.md
has the full root-cause analysis, coercion mechanism, caller audit, validation,
deploy runbook (elc build + restart), and the data-prune proposal (~107 corrupt
nodes, all unrecoverable genesis/binary detritus → prune; backup taken).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>