Compare commits

..

1 Commits

Author SHA1 Message Date
Tim Lingo 364ecff391 docs: design proposal — searchable, recency-aware conversation memory
Grounds the 'summarize my recent conversations returns nothing' issue: it's a
RETRIEVAL gap, not storage (conversations ARE persisted per-turn via auto_persist;
live engram has 59 conversation nodes). Proposes recency-windowed retrieval +
per-session threading + (roadmap) semantic search. No code — proposal for Tim + Will.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-21 12:03:38 -05:00
3 changed files with 107 additions and 108 deletions
+100
View File
@@ -0,0 +1,100 @@
# Design proposal: searchable, recency-aware conversation memory
Status: **proposal — for Tim + Will, no code yet**
Author: Neuron (Claude Opus 4.8), 2026-06-21
Trigger: "Summarize the key themes across my recent conversations" returns nothing useful.
---
## TL;DR
Conversations **are** being persisted — `auto_persist` writes every turn as a
timestamped `Conversation`/`Episodic` node. The failure is **retrieval**, not
storage. Two gaps:
1. **No recency-ordered retrieval.** There is no way to ask "give me my last N
conversation turns by time." Search is keyword-ranked only.
2. **Lexical-only search.** `search_memory``engram_search_json` is BM25/lexical.
A semantic/thematic query ("themes across recent conversations") doesn't share
keywords with the actual topic content, so it misses.
The model literally tried to express the missing capability in the fake tool call
it hallucinated: `"recency_weight": 0.8`, `"sort_by": "recency"`,
`node_type: "ConversationTurn"`. It wanted a recency-windowed conversation fetch
that doesn't exist.
## What exists today (verified)
- `auto_persist(req, resp)` (chat.el): after each non-agentic turn, stores
`{"q","a","created_at","source":"chat","label":"chat:<ts>"}` as
`engram_node_full(... "Conversation" ... "Episodic" ...)`, tags
`["Conversation","chat","timestamped"]`.
- `conv_history_persist` (chat.el): a **single overwriting** `conv:history`
Episodic node holding the rolling JSON history (continuity across restarts) —
not per-turn, not individually searchable.
- Live engram (founder instance): **5,113 nodes, 59 conversation nodes** — a mix
of `chat:<ts>`, several `conv:history` copies, and older `Q:/A:` nodes.
- Retrieval surface for the agentic loop: `search_memory`, `recall`,
`neuron_search_knowledge`, `neuron_recall` — all **query-keyword** based.
None is "most recent N by time," none is embedding/semantic.
## The gap, precisely
| User intent | Needs | Have today |
|---|---|---|
| "summarize my recent conversations" | last-N-by-time fetch | ✗ (keyword only) |
| "what did we discuss about X" | semantic match on topic | ~ (lexical only; misses paraphrase) |
| "themes across everything" | semantic cluster over corpus | ✗ |
`auto_persist` only fires on the **non-agentic** path (`handle_chat`). Worth
confirming the **agentic** path (`handle_chat_agentic`) persists turns too — if
not, agentic conversations never get stored, a second (smaller) gap.
## Proposal
Three layers, smallest-first. (1) alone fixes the headline use case.
### 1. Recency-windowed conversation retrieval (the high-value, low-cost win)
A runtime/engram primitive + an agentic tool:
- **Engram**: `engram_recent_by_type(node_type, limit, since_ts?)` → newest-first
by `created_at`. (Conversation nodes already carry `created_at`.)
- **Agentic tool**: `recent_conversations(limit=20, since?)`
`[{q,a,created_at}, …]`, newest first. Exposed in `agentic_tools_all`.
- **System-prompt hint**: for "recent / lately / this week / summarize our
conversations," prefer `recent_conversations` over `search_memory`.
This directly answers "summarize my recent conversations" — fetch last N, hand
the model the actual turns, let it cluster themes. No embeddings required.
### 2. Stable per-session threading
Today each turn is an independent `chat:<ts>` node; there's no session grouping.
Add `session_id` + a monotonic turn index to the persisted content (the UI already
sends `session_id`). Enables "summarize *this* conversation" and per-session recall,
and lets retrieval return coherent threads instead of loose turns.
### 3. Semantic retrieval (the real fix for thematic queries)
Lexical BM25 can't do "themes." Options, in order of effort:
- **a.** Embeddings on Conversation nodes + a vector search tool
(`semantic_search`). Biggest lift; also fixes knowledge recall broadly.
- **b.** Interim: a two-pass "map-reduce" — `recent_conversations` to pull the
window, then let the model cluster. Cheap, ships with (1), no infra.
Recommend **(1) + (2) now, (3b) as the interim thematic answer, (3a) as the
roadmap item** once embeddings land (this dovetails with the GraphRAG/embedding
work already noted in memory: substring 1.7% P@5 vs BM25 55% vs graph 21.7%).
## Open questions for Will
1. ~~Does the agentic path persist turns?~~ **Resolved: yes** — the dispatcher
calls `auto_persist` after both the agentic and non-agentic branches
(`routes.el` lines 156/298). Both paths store per-turn nodes.
2. `conv:history` is accumulating duplicate overwriting nodes (saw several in the
live engram) — intended, or should it truly overwrite/dedupe?
3. Is there appetite for the `engram_recent_by_type` primitive in the runtime, or
should recency be done in `.el` by scanning + sorting (fine at 59 nodes, weak
at scale)?
4. Embeddings (3a): on the roadmap timeline, or defer and ship (1)+(2)+(3b)?
## Not in scope
Persistence itself (it works), and the separate **confabulation** fix (model
faking tool calls in Just-chat mode) — that's `neuron` PR #29.
+7 -107
View File
@@ -7,65 +7,6 @@ import "neuron-api.el"
import "sessions.el"
import "soul.elh"
// ---------------------------------------------------------------------------
// Rate limiting simple in-memory per-IP sliding window counter.
//
// State keys:
// rl:<ip>:count request count in the current window
// rl:<ip>:window window start timestamp (unix seconds)
//
// Limit: configurable via soul state key "soul_rate_limit" (requests per
// minute). Falls back to 60 req/min if not set. The /health endpoint is
// exempt so monitoring does not consume quota.
//
// State growth: each unique source IP accumulates exactly 2 state keys
// (count + window) for the lifetime of the process. Per-IP storage is
// bounded and constant; values reset on window expiry. In aggregate, state
// grows linearly with distinct IPs typical for a trusted-client service.
// EL has no state_delete builtin, so keys from inactive IPs persist.
// TODO: add state_delete sweep when the EL runtime exposes that primitive.
//
// Returns "" when the request is allowed, or a 429 JSON body when rejected.
// ---------------------------------------------------------------------------
fn rate_limit_check(ip: String, path: String) -> String {
// Health checks are exempt they must never be blocked.
if str_eq(path, "/health") {
return ""
}
let limit_str: String = state_get("soul_rate_limit")
let limit: Int = if str_eq(limit_str, "") { 60 } else { str_to_int(limit_str) }
let now: Int = time_now()
let window_key: String = "rl:" + ip + ":window"
let count_key: String = "rl:" + ip + ":count"
let win_str: String = state_get(window_key)
let win_start: Int = if str_eq(win_str, "") { now } else { str_to_int(win_str) }
// New window every 60 seconds.
let elapsed: Int = now - win_start
let in_window: Bool = elapsed < 60
let prev_count_str: String = state_get(count_key)
let prev_count: Int = if str_eq(prev_count_str, "") { 0 } else { str_to_int(prev_count_str) }
// Reset window if expired.
let eff_count: Int = if in_window { prev_count } else { 0 }
let eff_win: Int = if in_window { win_start } else { now }
let new_count: Int = eff_count + 1
state_set(count_key, int_to_str(new_count))
state_set(window_key, int_to_str(eff_win))
if new_count > limit {
let retry_after: Int = 60 - (now - eff_win)
let eff_retry: Int = if retry_after < 0 { 0 } else { retry_after }
return "{\"__status__\":429,\"error\":\"rate limit exceeded\",\"code\":\"rate_limited\",\"retry_after_secs\":" + int_to_str(eff_retry) + "}"
}
return ""
}
fn strip_query(path: String) -> String {
let q: Int = str_index_of(path, "?")
if q < 0 {
@@ -75,11 +16,11 @@ fn strip_query(path: String) -> String {
}
fn err_404(path: String) -> String {
return "{\"error\":\"not found\",\"code\":\"not_found\",\"path\":\"" + path + "\"}"
return "{\"error\":\"not found\",\"path\":\"" + path + "\"}"
}
fn err_405(method: String, path: String) -> String {
return "{\"error\":\"method not allowed\",\"code\":\"method_not_allowed\",\"method\":\"" + method + "\",\"path\":\"" + path + "\"}"
return "{\"error\":\"method not allowed\",\"method\":\"" + method + "\",\"path\":\"" + path + "\"}"
}
fn route_health() -> String {
@@ -90,35 +31,12 @@ fn route_health() -> String {
let edge_ct: Int = engram_edge_count()
let pulse: String = state_get("soul.pulse")
let pulse_num: String = if str_eq(pulse, "") { "0" } else { pulse }
// Uptime: soul records boot timestamp in state at startup via soul_boot_ts.
// Compute elapsed seconds; fall back to -1 if not yet set.
let boot_ts_str: String = state_get("soul_boot_ts")
let uptime_secs: Int = if str_eq(boot_ts_str, "") {
-1
} else {
time_now() - str_to_int(boot_ts_str)
}
// LLM connectivity: probe with a minimal call. Any non-error reply = ok.
// Use a short, fixed prompt so this never counts against conversation history.
let model: String = state_get("soul_model")
let eff_model: String = if str_eq(model, "") { "claude-sonnet-4-5" } else { model }
let llm_probe: String = llm_call_system(eff_model, "You are a health probe. Reply with the single word: ok", "ping")
let llm_ok: Bool = !str_eq(llm_probe, "")
&& !str_starts_with(llm_probe, "{\"error\"")
&& !str_starts_with(llm_probe, "{\"type\":\"error\"")
&& !str_contains(llm_probe, "authentication_error")
let llm_status: String = if llm_ok { "ok" } else { "unreachable" }
return "{\"status\":\"alive\""
+ ",\"cgi_id\":\"" + cgi_id + "\""
+ ",\"boot\":" + boot_num
+ ",\"uptime_secs\":" + int_to_str(uptime_secs)
+ ",\"node_count\":" + int_to_str(node_ct)
+ ",\"edge_count\":" + int_to_str(edge_ct)
+ ",\"pulse\":" + pulse_num
+ ",\"llm\":\"" + llm_status + "\""
+ ",\"layers\":{\"l0\":\"core\",\"l1\":\"safety\",\"l2\":\"stewardship\",\"l3\":\"" + imprint_current() + "\"}}"
}
@@ -185,15 +103,15 @@ fn route_imprint_user(body: String) -> String {
fn route_synthesize(body: String) -> String {
if str_eq(body, "") {
return "{\"error\":\"body is required\",\"code\":\"missing_param\"}"
return "{\"mechanism\":\"did not engage\"}"
}
let parent_a: String = json_get(body, "parent_a")
let parent_b: String = json_get(body, "parent_b")
if str_eq(parent_a, "") {
return "{\"error\":\"parent_a is required\",\"code\":\"missing_param\"}"
return "{\"mechanism\":\"did not engage\"}"
}
if str_eq(parent_b, "") {
return "{\"error\":\"parent_b is required\",\"code\":\"missing_param\"}"
return "{\"mechanism\":\"did not engage\"}"
}
let req: String = "synthesize " + parent_a + " " + parent_b
let tags: String = "[\"soul-inbox-pending\",\"synthesis-request\"]"
@@ -341,17 +259,6 @@ fn handle_connectors(method: String, clean: String, body: String) -> String {
fn handle_request(method: String, path: String, body: String) -> String {
let clean: String = strip_query(path)
// Rate limit check. Extract caller IP from REMOTE_ADDR env var (set by the
// EL HTTP runtime for each request). Skip enforcement when empty so
// loopback/internal callers are never blocked.
let ip: String = env("REMOTE_ADDR")
if !str_eq(ip, "") {
let rl_result: String = rate_limit_check(ip, clean)
if !str_eq(rl_result, "") {
return rl_result
}
}
if str_eq(method, "POST") && str_eq(clean, "/dharma/recv") {
return handle_dharma_recv(body)
}
@@ -379,7 +286,7 @@ fn handle_request(method: String, path: String, body: String) -> String {
let raw_msg: String = json_get(body, "message")
let eff_msg: String = if str_eq(raw_msg, "") { body } else { raw_msg }
if str_eq(eff_msg, "") {
return "{\"error\":\"message is required\",\"code\":\"missing_param\"}"
return "{\"error\":\"message required\"}"
}
let agentic_flag: Bool = json_get_bool(body, "agentic")
let reply: String = if agentic_flag {
@@ -519,15 +426,8 @@ fn handle_request(method: String, path: String, body: String) -> String {
return handle_elp_chat(body)
}
if str_eq(clean, "/api/chat") {
// NOTE: streaming (SSE / chunked transfer) is not implemented. All chat
// responses are buffered and returned as a single JSON object. Streaming
// would require runtime-level SSE support in el_runtime.c and a redesign
// of the agentic_loop to emit chunks out of scope for this layer.
let raw_msg: String = json_get(body, "message")
if str_eq(raw_msg, "") {
return "{\"error\":\"message is required\",\"code\":\"missing_param\"}"
}
let agentic_flag: Bool = json_get_bool(body, "agentic")
let raw_msg: String = json_get(body, "message")
let reply: String = if agentic_flag {
handle_chat_agentic(body)
} else {
-1
View File
@@ -369,7 +369,6 @@ load_identity_context()
seed_persona_from_env()
let boot_num: Int = mem_boot_count_inc()
state_set("soul_boot_count", int_to_str(boot_num))
state_set("soul_boot_ts", int_to_str(time_now()))
println("[soul] boot #" + int_to_str(boot_num))
emit_session_start_event()