1c8438ad20
Applies connector-specific additions from feat/connectors-soul: - chat.el: connector_tools_json(), agentic_tools_all(), call_mcp_bridge(), tool_auto_approved() and mcp__ dispatch in dispatch_tool() - routes.el: connectd_get/post, handle_connectors(), /api/connectors routing in GET and POST sections - MEMORY_RECALL_BUG.md: investigation notes on memory retrieval failure The agentic loop rewrite in the source branch was not applied — it conflicts with the tool-bridge pattern from PR #5 which is the chosen design for client-side MCP tool execution. The connectors themselves are now fully wired: connector tools surface as mcp__<server>__<tool> in the tools array and dispatch to neuron-connectd via call_mcp_bridge().
185 lines
8.9 KiB
Markdown
185 lines
8.9 KiB
Markdown
# Memory Recall Bug — Handoff for Will
|
|
|
|
**Reported by:** Tim (via the Neuron UI chat)
|
|
**Diagnosed by:** Claude (Claude Code session), 2026-06-05
|
|
**Symptom:** The soul can't recall anything specific — e.g. "do you remember the jokes
|
|
from that night with Will, Tim, and April?" → it has no idea, and correctly self-reports
|
|
that either retrieval is failing or the memory was never captured.
|
|
|
|
---
|
|
|
|
## TL;DR
|
|
|
|
The memories are almost certainly **intact in the graph**. The problem is the
|
|
**retrieval layer**: `engram_search_json` and `engram_activate_json` return empty for
|
|
*every* query, so the chat falls back to two hardcoded pinned nodes and effectively
|
|
remembers nothing. Strongly looks like the **embedding / search index was never built or
|
|
isn't loaded at boot**.
|
|
|
|
Separately: the **soul daemon on :7770 was down** at the end of the investigation (it had
|
|
been up earlier in the session — it died/stopped partway through). Restart needed before
|
|
any of this can be re-tested.
|
|
|
|
---
|
|
|
|
## Evidence
|
|
|
|
All commands run against the live services during the session.
|
|
|
|
### Search/activate return nothing — even for guaranteed-present terms
|
|
```
|
|
curl "http://127.0.0.1:8742/api/search?q=MUDCraft&limit=3" -H "X-API-Key: ntn-user-2026" → []
|
|
curl "http://127.0.0.1:8742/api/search?q=neuron&limit=3" -H "X-API-Key: ntn-user-2026" → []
|
|
curl "http://127.0.0.1:8742/api/search?q=Will&limit=3" -H "X-API-Key: ntn-user-2026" → []
|
|
curl "http://127.0.0.1:8742/api/activate?q=jokes&depth=3" -H "X-API-Key: ntn-user-2026" → {"results":[]}
|
|
|
|
# soul's in-process equivalents (port 7770) — also empty:
|
|
curl "http://127.0.0.1:7770/api/neuron/recall?query=neuron" → (empty)
|
|
curl "http://127.0.0.1:7770/api/neuron/knowledge/search?q=MUDCraft" → (empty)
|
|
```
|
|
|
|
### But the raw data is present
|
|
```
|
|
curl "http://127.0.0.1:7770/api/graph/nodes?limit=2"
|
|
→ [{"id":"mem-30425134-...","content":"CGI ARCHITECTURE ? THREE LAYERS, MCP RETIRED ...
|
|
```
|
|
`/api/graph/nodes` is served by `engram_scan_nodes_json(9999, 0)` (routes.el:223-224) and
|
|
returns hundreds of rich nodes. So node storage is fine — only the **search/activation
|
|
index** is dead.
|
|
|
|
### The two standalone-engram counters
|
|
```
|
|
curl "http://127.0.0.1:8742/api/stats" → {"node_count":0,"edge_count":0,"layer_count":5}
|
|
```
|
|
Note: the standalone engram process on :8742 reports **0 nodes**, while the soul's
|
|
in-process engram (:7770) has the data. Worth confirming which engram instance is the
|
|
source of truth and whether they've diverged. (The `:8742` process was also showing up as
|
|
`engram --help` in `ps`, which is suspicious — may not be a real server instance.)
|
|
|
|
---
|
|
|
|
## Root cause (where it breaks in code)
|
|
|
|
`neuron/chat.el → engram_compile(intent)` (lines 15-53) builds the entire memory context
|
|
for every chat turn from exactly two sources:
|
|
|
|
```el
|
|
let activate_json: String = engram_activate_json(intent, 5) // returns []
|
|
let search_json: String = engram_search_json(intent, 15) // returns []
|
|
```
|
|
|
|
When **both are empty**, it falls back to two hardcoded nodes by literal ID
|
|
(chat.el:29-41):
|
|
|
|
```el
|
|
// "Fallback: when vector search returns nothing (no embeddings), fetch pinned
|
|
// high-salience nodes by their known IDs."
|
|
let family_node = engram_get_node_json("knw-35940684-abc4-42f0-b942-818f66b1f69a")
|
|
let origin_node = engram_get_node_json("knw-729fc901-8335-44c4-9f3a-b150b4aa0915")
|
|
```
|
|
|
|
So today the soul's *entire* recallable memory in a chat = those two nodes. That's why it
|
|
can't surface jokes, social moments, the dynamic with Tim/April, or anything else specific.
|
|
|
|
The comment ("when vector search returns nothing (no embeddings)") is the key hint: this
|
|
fallback was written *expecting* the embedding index to sometimes be absent — and right
|
|
now it's absent **all the time**.
|
|
|
|
Affected callers all funnel through the same two dead builtins:
|
|
- `handle_api_recall` (neuron-api.el:118) — `engram_search_json`
|
|
- `handle_api_search_knowledge` (neuron-api.el:135) — `engram_search_json` + `engram_activate_json`
|
|
- `engram_compile` (chat.el:15) — both
|
|
|
|
Working callers use a *different* builtin (`engram_scan_nodes_json` /
|
|
`engram_scan_nodes_by_type_json`), which is why graph/list views work but recall doesn't.
|
|
|
|
---
|
|
|
|
## Fix options (Will's call)
|
|
|
|
### Option 1 — Proper fix: rebuild/restore the embedding + activation index
|
|
`engram_search_json` and `engram_activate_json` are native runtime builtins. They're
|
|
returning empty because (most likely) the vector/search index was never built or isn't
|
|
loaded at boot, even though node storage loads fine. Investigate the engram boot path:
|
|
does it build embeddings for loaded nodes? Is there an index file that's missing/stale?
|
|
Fixing this restores recall everywhere at once. **This is the real fix.**
|
|
|
|
### Option 2 — Pragmatic EL-level fallback (no native changes)
|
|
Since `engram_scan_nodes_json()` works, `engram_compile` could do a keyword scan when the
|
|
vector path is empty: pull nodes, substring/token match the query against `content` +
|
|
`label`, rank by overlap, return the top N. Restores basic recall even with the vector
|
|
index down. ~20 lines of EL in `engram_compile`, but requires a soul rebuild + restart.
|
|
Claude offered to write this patch for your review if you want it — say the word.
|
|
|
|
Tradeoff: keyword matching is much weaker than semantic recall (won't find "jokes" unless
|
|
the node text literally contains joke-ish words), but it's strictly better than the current
|
|
two-node fallback and needs no native/runtime work.
|
|
|
|
---
|
|
|
|
## Also needs attention
|
|
|
|
- **Soul daemon (:7770) was down** at end of session — restart and confirm it stays up.
|
|
- **Confirm the engram instance topology** — :8742 standalone shows 0 nodes while the
|
|
soul's in-process engram has the data. Make sure chat is reading the populated one and
|
|
they haven't diverged.
|
|
- **Social memory weighting** (Tim's deeper point): even once retrieval works, jokes /
|
|
interpersonal moments may not be tagged or salience-weighted to surface as "important."
|
|
Worth a look at how those get captured and scored — but that's secondary to getting
|
|
retrieval working at all.
|
|
|
|
---
|
|
|
|
## Daemon lifecycle — needs a supervisor (NEW, 2026-06-06)
|
|
|
|
The soul daemon **crashed again** the next day. It had been up earlier, then died on its
|
|
own (not from any change). When it's down, the UI's Backlog / Artifacts / Knowledge /
|
|
Graph / Memories tabs all go **blank**, because they read from `:7770/api/graph/nodes`.
|
|
The chat also stops working. This is the second unexplained death in two days.
|
|
|
|
### How it's currently run (fragile)
|
|
- Binary: `neuron/dist/neuron-fresh` (compiled from the EL sources)
|
|
- Launched manually as a bare background process (`./neuron-fresh &`) — **no supervisor,
|
|
no auto-restart, no crash logging beyond stdout**. When it dies, it stays dead until a
|
|
human notices the blank UI and restarts it.
|
|
- Boot log only shows `[http] listening on [::]:7770` — there's no captured stack/exit
|
|
reason when it crashes, so we can't yet say *why* it's dying.
|
|
|
|
### How I restarted it (for reference)
|
|
```sh
|
|
# snapshot lives at ~/.neuron/engram/snapshot.json (loaded on boot, ~9.7MB)
|
|
# ALWAYS back it up first — genesis boot re-saves it:
|
|
cp ~/.neuron/engram/snapshot.json ~/.neuron/engram/snapshot.backup-$(date +%Y%m%d-%H%M%S).json
|
|
|
|
cd neuron/dist
|
|
ANTHROPIC_API_KEY='<key>' NEURON_PORT=7770 ./neuron-fresh > /tmp/soul-restart.log 2>&1 &
|
|
# verify:
|
|
curl -s http://127.0.0.1:7770/health
|
|
# → {"status":"alive","cgi_id":"ntn-genesis","boot":2,"node_count":3660,"edge_count":14207,...}
|
|
```
|
|
After this, data came back: 3,660 nodes / 14,207 edges; Backlog 485, Memory 493, etc.
|
|
|
|
### Recommendations for Will
|
|
1. **Put it under a supervisor** so it auto-restarts on crash and logs exit codes:
|
|
- macOS dev: a `launchd` LaunchAgent plist (KeepAlive=true), or `brew services`, or
|
|
even a simple `while true; do ./neuron-fresh; done` wrapper with timestamped logs.
|
|
- Prod/k8s already has `entrypoint.sh` + restart policy — the gap is the **local dev**
|
|
run path.
|
|
2. **Capture crash diagnostics** — redirect stdout/stderr to a rotating logfile and, if the
|
|
EL runtime can, dump a reason on exit. Right now we're blind to the cause.
|
|
3. **Find the root cause of the crashes** — two self-deaths in two days suggests a real bug
|
|
(memory? an unhandled request? a panic in a native builtin?). The supervisor stops the
|
|
*symptom* (blank UI) but not the underlying instability.
|
|
4. **Snapshot safety** — genesis boot calls `engram_save(snapshot)` (soul.el:240,248). A
|
|
crash mid-save could corrupt the 9.7MB memory file. Consider write-to-temp + atomic
|
|
rename, and/or periodic timestamped backups, so a bad save can't lose Neuron's memory.
|
|
|
|
---
|
|
|
|
## What was NOT touched
|
|
No backend EL code and no engram data were modified — the memory-recall diagnosis is
|
|
read-only. The only operational action taken was **restarting the already-existing
|
|
`neuron-fresh` daemon** (after backing up the snapshot) to bring the blank UI tabs back;
|
|
no source or data was changed by that. All UI work this session was in `neuron-ui` and is
|
|
unrelated to this bug.
|