Two design bugs in the state_set placement caused the dedup seen-ID set
to be incomplete even with callsites wired up:
1. state_set("engram_compile_seen_ids") was called immediately after
merging the main node pools, before scan_part (persona fallback) and
affective_part (bell node) were computed. Nodes appearing only in
those segments were never added to the seen set.
2. affective_part is a bare JSON object (bn0 from json_array_get), not
a JSON array. Passing it to engram_extract_ids would have gotten
json_array_len == 0 and silently skipped the affective node's ID.
Fix: move state_set to after ctx is assembled from all three segments.
Extract ids_from_merged and ids_from_scan via engram_extract_ids (both
are JSON arrays), and extract ids_from_affective via json_get(affective_part, "id")
directly since it is a bare object. Merge all three via add_to_seen
before publishing to state.
Thread a seen-node-ID exclusion set from engram_compile() through to
session_preload in handle_chat, preventing the same high-salience nodes
(identity, recent memories) from appearing 2-3x in the system prompt.
Changes:
- Add id_in_seen(), add_to_seen(), engram_extract_ids() helpers that
maintain a comma-delimited seen-ID accumulator (EL has no Set type)
- In engram_compile(): after merging all topic/entity/recall pools, extract
node IDs from merged_nodes and publish via state_set(engram_compile_seen_ids)
- In handle_chat(): read seen_ids from state after engram_compile() returns,
then check id_in_seen() before emitting each session_preload bullet
(profile x3, work x2, project x2, summary x1 — all 8 candidate nodes guarded)
Nodes already present in the compiled engram context are skipped in preload,
eliminating 3000-3500 token repetition on first-message turns.
Merge propose/agent-workspace-root-read (Tim's PR #28):
- path_within_root now appends '/' to root before prefix check (closes proj_evil bypass)
- edit_file in dispatch_tool now checks agent_workspace_root() and resolves path
- handle_chat_agentic reads agent_workspace_root from request body, only persists if non-empty
- Safety screen preserved after workspace root read (conflict resolved)
Merge improve/safety-crisis-detection (PR #31): reads layered_cycle_safety_system_addendum
from state and appends to system prompt on each turn (cleared after use to prevent bleed).
Safety ts extraction falls back to updated_at. Affective prefix now wires into system build.
Conflict with PR #33 resolved: capability_rules and session_preload both preserved.
- engram_compile: BellEvent nodes do not carry created_at in the engram
node JSON; extract the unix timestamp from the embedded ' | ts:NNNNN'
pattern in the content string instead. Fall back to created_at/updated_at
if the marker is absent. Guard str_to_int against empty string so the 72h
recency check never silently treats every node as epoch-0 stale.
- auto_persist: append the current unix timestamp to the BellEvent label
('bell:soft:1749876543') to make it unique per turn. The previous label
('bell:soft') was the same for every soft bell, causing engram to treat
all subsequent writes as updates to the same node.
- Remove dead soft_bell block in layered_cycle that wrote soul_safety_system_augment
to state but was never read; safety augmentation now goes through the correct
layered_cycle_safety_system_addendum state key read by build_system_prompt
- build_system_prompt now reads layered_cycle_safety_system_addendum and appends
it to the system prompt, clearing the key after consumption
- Timestamp extraction for distress nodes falls back to updated_at when created_at
is empty, preventing the 72h recency check from always treating nodes as stale
- Issue #3: err_404/err_405 now emit HTTP 404/405 via __status__ envelope instead of HTTP 200
- Issue #4: add auth_check() function to handle_request; enforces NEURON_TOKEN on all routes except /health and /lineage
- Issue #5: missing required params now return HTTP 400 (__status__ envelope) in /api/chat (GET+POST), /imprint/contextual, /imprint/user, and handle_chat
- Issue #6: LLM unavailable in handle_chat now returns HTTP 503 instead of HTTP 200
- Issue #7: add 32 KB message size guard on POST /api/chat before engram_compile and LLM
- Issue #8: add TODO comment to route_health documenting the live-engram-query problem and the /health/deep split plan
- Issue #9: add comment to hist_trim documenting fragile str_index_of parser and silent data corruption risk
- Issue #10: add TODO comment in handle_request documenting missing per-IP rate limiting
- Issue #11: fix connectd_post temp file collision — add monotonic sequence counter so concurrent requests get unique paths
- Issue #12: fix call_mcp_bridge fixed temp file race — add monotonic sequence counter for unique paths under concurrent load
- Issues #1/#2: add TODO comment in handle_request documenting EL no-exception limitation and SIGSEGV handler gap
Issue #5: detect empty string from llm_extract_text() as an error in handle_chat,
handle_chat_as_soul, and handle_dharma_room_turn. The C runtime silently returns ""
when the LLM response content array is missing or all blocks fail to parse; without
this guard the empty string passes through to callers as a silent empty reply.
Issue #9: make agentic_loop max_tokens configurable via NEURON_LLM_MAX_TOKENS env
var (default 4096). The hardcoded value is marginal for long tool chains (8 iterations
x 4096 tokens); operators can now set 8192+ for complex multi-step tasks without
rebuilding. Non-agentic path (llm_call_system) still uses the C runtime hardcode —
that fix lives in el_runtime.c (see TODO block added in this commit).
Issue #10: increase connector_tools_json and tool_auto_approved curl --max-time from
2s to 5s to reduce false-empty tool lists when neuron-connectd is under transient
load. Graceful degradation to [] on bridge down is unchanged.
Issues #1/#2/#3/#4/#6/#8: documented as TODO comments in chat.el. These require
targeted C runtime changes in el_runtime.c (llm_provider_request retry loop,
EL_LLM_TIMEOUT_MS separation, HTTP 429 backoff, 5xx retry, EL_HTTP_MAX_RESPONSE_BYTES
cap). Architectural decisions recorded so they are traceable to root causes.
- sessions.el: add session_exists() for chat-path session validation (ISSUE #6/#7)
- sessions.el: add session_create_cleanup() for ghost-session rollback (ISSUE #1)
- sessions.el: set session_pending_first_msg flag in session_create; clear it in
session_hist_save so the first successful chat marks the session active (ISSUE #1)
- sessions.el: session_delete now clears mcp_bridge:<id> and always_allow_<id>
state keys so abandoned pending-tool sessions do not accumulate (ISSUE #5)
- sessions.el: add TODO comments for ISSUE #2 (no TTL/expiry), ISSUE #3
(non-atomic delete-then-create), ISSUE #4 (no concurrent-create guard),
and ISSUE #8 (reconnect/duplicate resume race) where fixes are too invasive
to land without new runtime primitives
- chat.el: validate session_id exists via session_exists() before entering
agentic_loop; unknown session_ids now return a 404-style error instead of
silently starting a fresh empty session (ISSUE #6/#7)
Every engram_node_full call that dropped its return value now binds it
and emits a println on empty string. engram_save calls in consolidate,
heartbeat, and dharma-room-turn are checked for failure. The two API
handlers (log_state_event, tune_config) that skipped api_persisted()
now match the read-back-after-write contract used everywhere else in
neuron-api.el.
Files changed:
- chat.el: conv_history_persist, handle_dharma_room_turn, auto_persist
- soul.el: emit_session_start_event, seed_persona_from_env HTTP check
- memory.el: mem_save, mem_boot_count_inc
- neuron-api.el: handle_api_log_state_event, handle_api_tune_config,
handle_api_consolidate (engram_save + session summary write)
- awareness.el: ise_post local-engram fallback path
TODO comments added for non-atomic patterns (issues #12, #13) and
the missing circuit breaker (#14) — these require new primitives.
- Fix state key mismatch: soul.el layered_cycle now reads conv_history
(not conversation_history), unblocking the safety_score_distress_history
history-amplification path in safety_threat_score
- Add safety_augment_system call on the main handle_chat path so the
phrase-list bell detector fires on all chat turns, not just dharma rooms
- Add cross-session affective engram query in load_identity_context() at
boot; stores distress/crisis signals from prior sessions under
soul_affective_context with a 7-day soft recency filter
Issues addressed:
- #1 ASYMMETRIC PERSIST/LOAD: conv_history_load() now tries engram_get_node_by_label()
first (symmetric with the label-based write), falling back to vector search only when
label lookup returns nothing. Immune to cold/corrupt vector index.
- #2 SILENT LOAD FAILURE: all failure paths in conv_history_load() and conv_history_persist()
now emit a println log line rather than silently returning "" or dropping writes.
- #3 NO RECOVERY PATH: documented as TODO with explanation of why a full recovery path
(retry, ID fallback, orphan cleanup) is too invasive for a targeted fix here.
- #4 OVERWRITE WITHOUT DELETE: documented with TODO to replace engram_node_full with
explicit delete-then-create once engram exposes a label-scoped delete API.
- #5/#10 BROKEN TRIM / OFF-BY-ONE: hist_trim() rewritten to use json_array_len /
json_array_get (structural JSON ops) instead of raw str_index_of scanning for
'{"role":' markers. Immune to marker strings appearing inside message content.
Minimum retained count guard added: never trims below 2 entries.
- #6 PARTIAL-WRITE GUARD: conv_history_persist() refuses to write a blob that doesn't
contain both '[' and ']'. conv_history_load() requires both before accepting content.
- #7 DUAL STORAGE: documented with a comment at the persist call site.
- #8 NO MAX SIZE GUARD: documented as TODO with rationale for why a byte-length cap
requires a more invasive change (entry truncation or summarisation).
- #9 AGENTIC HISTORY NOT PERSISTED: handle_chat_agentic() now calls conv_history_persist()
for the default global session (hist_key == "conv_history") after updating state,
matching the non-agentic path's durability. Named sessions remain in-process only.
Add strip -s after gcc compilation to remove symbol table and relocation info.
Reduces binary size and prevents symbol-level reverse engineering of EL runtime internals.
- auto_persist: detect bell level (soft/hard) on every user message using
safety_detect_bell_level; write a dedicated BellEvent engram node with
calibrated salience alongside the Conversation node when a bell fires.
Tag the Conversation node with bell:soft/bell:hard and 'affective' for
direct discovery without scanning all chat nodes.
- auto_persist: track per-session bell count, dominant level, and last
signal in state (session_bell_count/level/signal keys) so downstream
functions can act on the emotional history without re-scanning engram.
- engram_compile: include the top-1 most recent BellEvent node within 72h
in every context build. Distress context from earlier turns (same or
recent session) automatically travels into all subsequent LLM calls.
- hist_trim_with_bell_guard: replace hist_trim at the handle_chat call site.
Before evicting the oldest turn from the 20-turn window, inspect the user
message for bell signals. If a bell was present, write a preservation
BellEvent to engram before dropping the turn so the full message survives
the rolling window.
- session_hist_save: after writing the history node, check session bell
counters. On the first save where bell_count > 0, write a
session:emotional-summary BellEvent node with distress signal, count,
and dominant level. A state flag prevents duplicate writes on subsequent
saves in the same session.
- engram_compile: rank search results by recency x relevance before including
in context. Pulls 20 candidates, scores each (salience * importance * recency
decay), keeps top 8. Eliminates stale/low-signal nodes that diluted context.
- handle_chat: on hist_len==0 (session start), proactively load user profile
and active-work context from engram and inject as brief bullets in the system
prompt. Gives the soul grounding before any conversation history exists.
- build_system_prompt: add [CAPABILITY GAPS] directive instructing the soul to
offer partial help and reasoning instead of flat "I don't have access to that"
refusals when a tool is missing.
- handle_chat_agentic: run safety_screen at entry, mirroring layered_cycle.
Hard bell exits immediately with the crisis response without entering the loop.
- agentic_loop: surface the 8-iteration cap explicitly in the error envelope
("agentic loop hit the 8-iteration cap...") rather than the opaque "no response".
Add iterations count to both the error and success envelopes for observability.
- Add per-IP in-memory rate limiter (60 req/min default, configurable via
soul_rate_limit state key; /health exempt; loopback callers skipped)
- Extend /health with uptime_secs (from soul_boot_ts) and live LLM probe
- Add missing_param 400 guard on POST /api/chat before passing to LLM
- Standardise error envelopes: add "code" field to err_404/err_405 and all
missing-param returns; route_synthesize now errors clearly instead of
returning the misleading {"mechanism":"did not engage"} on bad input
- Document streaming gap in /api/chat (SSE not implemented, note added)
- handle_request gains ip param; rate_limit_check wired at entry point
- soul.el: fix state key bug in layered_cycle (conversation_history -> conv_history)
- safety.el: add indirect crisis location patterns to soft_bell phrase list
- soul.el: wire safety_augment_system into layered_cycle for soft_bell turns
- chat.el: load cross-session affective context at session start when distress signals found within 72h
REPRODUCED: in the non-agentic path (Tools off / 'Just chat'), asking for
tool-work makes the model role-play tool use — it emits a fake ```json {...}```
'tool call' and says 'let me search/query/pull your sessions' while NOTHING
runs. Reads as a broken/lying app. (The agentic path is fine: verified it
calls search_memory and reports honestly.)
Root cause: build_system_prompt (handle_chat, the tool-less path) never told
the model it has no tools this turn, so it fabricated.
Fix: add a NO-TOOLS directive to the non-agentic system prompt — never emit
tool calls / JSON tool blocks / 'let me pull...' narration; answer from context
only; if a tool is truly needed, say so in one sentence and tell the user to
turn Tools on. Applied to chat.el (source) AND dist/soul.c (the curated TU the
CI compiles), so the CI-built binary carries it.
Verified the FABRICATION repro on the live local soul; could not verify the
patched binary locally (no matching el-runtime version on this machine — a
hand-link against origin/main runtime 404s on all routes). Builds correctly via
CI, which links soul.c against the pinned runtime.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Completes the UI<->soul contract for #23 (scope file/command tools to an agent
workspace root). #23 made the tools read state_get("agent_workspace_root"), but
nothing set that key from the desktop UI, so the agent panel's Workspace Folder
was cosmetic and tools ran unscoped (default-allow). This reads the root the UI
now sends on each agentic request and state_sets it before tool dispatch, so
agent_workspace_root() picks it up for the turn.
Minimal + pattern-matching (same json_get/state_set shape used throughout chat.el).
Empty body field => unscoped (backward-compatible) and preserves the env fallback.
FOR WILL'S REVIEW — do not merge without sign-off:
- Ownership model: set state from body each turn (so clearing the folder un-scopes)
vs. only-when-nonempty. Flagged inline.
- Pairs with neuron-ui PR #32 (ChatRequest.agentWorkspaceRoot).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Knowledge nodes dominated the WM-autobiographical auto_term slot:
'Numeric tier strings...' (a Knowledge node) always scored highest
in WM and its first word 'Numeric' became the curiosity seed every
scan — activating more Numeric nodes, keeping that node in WM,
repeating indefinitely.
Fix: only derive auto_term from Memory, BacklogItem, or Entity nodes.
Knowledge nodes are reference material, not live context. Dynamic/
personal nodes carry the salience worth radiating from.
Also patches proactive_curiosity directly in dist/neuron.c (ELC
cannot compile soul.el within timeout — fallback build pattern).
The Dockerfile's --mount=type=secret path was corrupting the SA key JSON
due to control character handling differences. Pre-download soul + El SDK
in the CI workflow (using already-authenticated gcloud) and COPY them from
the build context. No credentials needed inside the Docker build.