Compare commits
1 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 364ecff391 |
@@ -0,0 +1,100 @@
|
||||
# Design proposal: searchable, recency-aware conversation memory
|
||||
|
||||
Status: **proposal — for Tim + Will, no code yet**
|
||||
Author: Neuron (Claude Opus 4.8), 2026-06-21
|
||||
Trigger: "Summarize the key themes across my recent conversations" returns nothing useful.
|
||||
|
||||
---
|
||||
|
||||
## TL;DR
|
||||
|
||||
Conversations **are** being persisted — `auto_persist` writes every turn as a
|
||||
timestamped `Conversation`/`Episodic` node. The failure is **retrieval**, not
|
||||
storage. Two gaps:
|
||||
|
||||
1. **No recency-ordered retrieval.** There is no way to ask "give me my last N
|
||||
conversation turns by time." Search is keyword-ranked only.
|
||||
2. **Lexical-only search.** `search_memory` → `engram_search_json` is BM25/lexical.
|
||||
A semantic/thematic query ("themes across recent conversations") doesn't share
|
||||
keywords with the actual topic content, so it misses.
|
||||
|
||||
The model literally tried to express the missing capability in the fake tool call
|
||||
it hallucinated: `"recency_weight": 0.8`, `"sort_by": "recency"`,
|
||||
`node_type: "ConversationTurn"`. It wanted a recency-windowed conversation fetch
|
||||
that doesn't exist.
|
||||
|
||||
## What exists today (verified)
|
||||
|
||||
- `auto_persist(req, resp)` (chat.el): after each non-agentic turn, stores
|
||||
`{"q","a","created_at","source":"chat","label":"chat:<ts>"}` as
|
||||
`engram_node_full(... "Conversation" ... "Episodic" ...)`, tags
|
||||
`["Conversation","chat","timestamped"]`.
|
||||
- `conv_history_persist` (chat.el): a **single overwriting** `conv:history`
|
||||
Episodic node holding the rolling JSON history (continuity across restarts) —
|
||||
not per-turn, not individually searchable.
|
||||
- Live engram (founder instance): **5,113 nodes, 59 conversation nodes** — a mix
|
||||
of `chat:<ts>`, several `conv:history` copies, and older `Q:/A:` nodes.
|
||||
- Retrieval surface for the agentic loop: `search_memory`, `recall`,
|
||||
`neuron_search_knowledge`, `neuron_recall` — all **query-keyword** based.
|
||||
None is "most recent N by time," none is embedding/semantic.
|
||||
|
||||
## The gap, precisely
|
||||
|
||||
| User intent | Needs | Have today |
|
||||
|---|---|---|
|
||||
| "summarize my recent conversations" | last-N-by-time fetch | ✗ (keyword only) |
|
||||
| "what did we discuss about X" | semantic match on topic | ~ (lexical only; misses paraphrase) |
|
||||
| "themes across everything" | semantic cluster over corpus | ✗ |
|
||||
|
||||
`auto_persist` only fires on the **non-agentic** path (`handle_chat`). Worth
|
||||
confirming the **agentic** path (`handle_chat_agentic`) persists turns too — if
|
||||
not, agentic conversations never get stored, a second (smaller) gap.
|
||||
|
||||
## Proposal
|
||||
|
||||
Three layers, smallest-first. (1) alone fixes the headline use case.
|
||||
|
||||
### 1. Recency-windowed conversation retrieval (the high-value, low-cost win)
|
||||
A runtime/engram primitive + an agentic tool:
|
||||
|
||||
- **Engram**: `engram_recent_by_type(node_type, limit, since_ts?)` → newest-first
|
||||
by `created_at`. (Conversation nodes already carry `created_at`.)
|
||||
- **Agentic tool**: `recent_conversations(limit=20, since?)` →
|
||||
`[{q,a,created_at}, …]`, newest first. Exposed in `agentic_tools_all`.
|
||||
- **System-prompt hint**: for "recent / lately / this week / summarize our
|
||||
conversations," prefer `recent_conversations` over `search_memory`.
|
||||
|
||||
This directly answers "summarize my recent conversations" — fetch last N, hand
|
||||
the model the actual turns, let it cluster themes. No embeddings required.
|
||||
|
||||
### 2. Stable per-session threading
|
||||
Today each turn is an independent `chat:<ts>` node; there's no session grouping.
|
||||
Add `session_id` + a monotonic turn index to the persisted content (the UI already
|
||||
sends `session_id`). Enables "summarize *this* conversation" and per-session recall,
|
||||
and lets retrieval return coherent threads instead of loose turns.
|
||||
|
||||
### 3. Semantic retrieval (the real fix for thematic queries)
|
||||
Lexical BM25 can't do "themes." Options, in order of effort:
|
||||
- **a.** Embeddings on Conversation nodes + a vector search tool
|
||||
(`semantic_search`). Biggest lift; also fixes knowledge recall broadly.
|
||||
- **b.** Interim: a two-pass "map-reduce" — `recent_conversations` to pull the
|
||||
window, then let the model cluster. Cheap, ships with (1), no infra.
|
||||
|
||||
Recommend **(1) + (2) now, (3b) as the interim thematic answer, (3a) as the
|
||||
roadmap item** once embeddings land (this dovetails with the GraphRAG/embedding
|
||||
work already noted in memory: substring 1.7% P@5 vs BM25 55% vs graph 21.7%).
|
||||
|
||||
## Open questions for Will
|
||||
1. ~~Does the agentic path persist turns?~~ **Resolved: yes** — the dispatcher
|
||||
calls `auto_persist` after both the agentic and non-agentic branches
|
||||
(`routes.el` lines 156/298). Both paths store per-turn nodes.
|
||||
2. `conv:history` is accumulating duplicate overwriting nodes (saw several in the
|
||||
live engram) — intended, or should it truly overwrite/dedupe?
|
||||
3. Is there appetite for the `engram_recent_by_type` primitive in the runtime, or
|
||||
should recency be done in `.el` by scanning + sorting (fine at 59 nodes, weak
|
||||
at scale)?
|
||||
4. Embeddings (3a): on the roadmap timeline, or defer and ship (1)+(2)+(3b)?
|
||||
|
||||
## Not in scope
|
||||
Persistence itself (it works), and the separate **confabulation** fix (model
|
||||
faking tool calls in Just-chat mode) — that's `neuron` PR #29.
|
||||
@@ -7,65 +7,6 @@ import "neuron-api.el"
|
||||
import "sessions.el"
|
||||
import "soul.elh"
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Rate limiting — simple in-memory per-IP sliding window counter.
|
||||
//
|
||||
// State keys:
|
||||
// rl:<ip>:count — request count in the current window
|
||||
// rl:<ip>:window — window start timestamp (unix seconds)
|
||||
//
|
||||
// Limit: configurable via soul state key "soul_rate_limit" (requests per
|
||||
// minute). Falls back to 60 req/min if not set. The /health endpoint is
|
||||
// exempt so monitoring does not consume quota.
|
||||
//
|
||||
// State growth: each unique source IP accumulates exactly 2 state keys
|
||||
// (count + window) for the lifetime of the process. Per-IP storage is
|
||||
// bounded and constant; values reset on window expiry. In aggregate, state
|
||||
// grows linearly with distinct IPs — typical for a trusted-client service.
|
||||
// EL has no state_delete builtin, so keys from inactive IPs persist.
|
||||
// TODO: add state_delete sweep when the EL runtime exposes that primitive.
|
||||
//
|
||||
// Returns "" when the request is allowed, or a 429 JSON body when rejected.
|
||||
// ---------------------------------------------------------------------------
|
||||
fn rate_limit_check(ip: String, path: String) -> String {
|
||||
// Health checks are exempt — they must never be blocked.
|
||||
if str_eq(path, "/health") {
|
||||
return ""
|
||||
}
|
||||
|
||||
let limit_str: String = state_get("soul_rate_limit")
|
||||
let limit: Int = if str_eq(limit_str, "") { 60 } else { str_to_int(limit_str) }
|
||||
|
||||
let now: Int = time_now()
|
||||
let window_key: String = "rl:" + ip + ":window"
|
||||
let count_key: String = "rl:" + ip + ":count"
|
||||
|
||||
let win_str: String = state_get(window_key)
|
||||
let win_start: Int = if str_eq(win_str, "") { now } else { str_to_int(win_str) }
|
||||
|
||||
// New window every 60 seconds.
|
||||
let elapsed: Int = now - win_start
|
||||
let in_window: Bool = elapsed < 60
|
||||
|
||||
let prev_count_str: String = state_get(count_key)
|
||||
let prev_count: Int = if str_eq(prev_count_str, "") { 0 } else { str_to_int(prev_count_str) }
|
||||
|
||||
// Reset window if expired.
|
||||
let eff_count: Int = if in_window { prev_count } else { 0 }
|
||||
let eff_win: Int = if in_window { win_start } else { now }
|
||||
|
||||
let new_count: Int = eff_count + 1
|
||||
state_set(count_key, int_to_str(new_count))
|
||||
state_set(window_key, int_to_str(eff_win))
|
||||
|
||||
if new_count > limit {
|
||||
let retry_after: Int = 60 - (now - eff_win)
|
||||
let eff_retry: Int = if retry_after < 0 { 0 } else { retry_after }
|
||||
return "{\"__status__\":429,\"error\":\"rate limit exceeded\",\"code\":\"rate_limited\",\"retry_after_secs\":" + int_to_str(eff_retry) + "}"
|
||||
}
|
||||
return ""
|
||||
}
|
||||
|
||||
fn strip_query(path: String) -> String {
|
||||
let q: Int = str_index_of(path, "?")
|
||||
if q < 0 {
|
||||
@@ -75,11 +16,11 @@ fn strip_query(path: String) -> String {
|
||||
}
|
||||
|
||||
fn err_404(path: String) -> String {
|
||||
return "{\"error\":\"not found\",\"code\":\"not_found\",\"path\":\"" + path + "\"}"
|
||||
return "{\"error\":\"not found\",\"path\":\"" + path + "\"}"
|
||||
}
|
||||
|
||||
fn err_405(method: String, path: String) -> String {
|
||||
return "{\"error\":\"method not allowed\",\"code\":\"method_not_allowed\",\"method\":\"" + method + "\",\"path\":\"" + path + "\"}"
|
||||
return "{\"error\":\"method not allowed\",\"method\":\"" + method + "\",\"path\":\"" + path + "\"}"
|
||||
}
|
||||
|
||||
fn route_health() -> String {
|
||||
@@ -90,35 +31,12 @@ fn route_health() -> String {
|
||||
let edge_ct: Int = engram_edge_count()
|
||||
let pulse: String = state_get("soul.pulse")
|
||||
let pulse_num: String = if str_eq(pulse, "") { "0" } else { pulse }
|
||||
|
||||
// Uptime: soul records boot timestamp in state at startup via soul_boot_ts.
|
||||
// Compute elapsed seconds; fall back to -1 if not yet set.
|
||||
let boot_ts_str: String = state_get("soul_boot_ts")
|
||||
let uptime_secs: Int = if str_eq(boot_ts_str, "") {
|
||||
-1
|
||||
} else {
|
||||
time_now() - str_to_int(boot_ts_str)
|
||||
}
|
||||
|
||||
// LLM connectivity: probe with a minimal call. Any non-error reply = ok.
|
||||
// Use a short, fixed prompt so this never counts against conversation history.
|
||||
let model: String = state_get("soul_model")
|
||||
let eff_model: String = if str_eq(model, "") { "claude-sonnet-4-5" } else { model }
|
||||
let llm_probe: String = llm_call_system(eff_model, "You are a health probe. Reply with the single word: ok", "ping")
|
||||
let llm_ok: Bool = !str_eq(llm_probe, "")
|
||||
&& !str_starts_with(llm_probe, "{\"error\"")
|
||||
&& !str_starts_with(llm_probe, "{\"type\":\"error\"")
|
||||
&& !str_contains(llm_probe, "authentication_error")
|
||||
let llm_status: String = if llm_ok { "ok" } else { "unreachable" }
|
||||
|
||||
return "{\"status\":\"alive\""
|
||||
+ ",\"cgi_id\":\"" + cgi_id + "\""
|
||||
+ ",\"boot\":" + boot_num
|
||||
+ ",\"uptime_secs\":" + int_to_str(uptime_secs)
|
||||
+ ",\"node_count\":" + int_to_str(node_ct)
|
||||
+ ",\"edge_count\":" + int_to_str(edge_ct)
|
||||
+ ",\"pulse\":" + pulse_num
|
||||
+ ",\"llm\":\"" + llm_status + "\""
|
||||
+ ",\"layers\":{\"l0\":\"core\",\"l1\":\"safety\",\"l2\":\"stewardship\",\"l3\":\"" + imprint_current() + "\"}}"
|
||||
}
|
||||
|
||||
@@ -185,15 +103,15 @@ fn route_imprint_user(body: String) -> String {
|
||||
|
||||
fn route_synthesize(body: String) -> String {
|
||||
if str_eq(body, "") {
|
||||
return "{\"error\":\"body is required\",\"code\":\"missing_param\"}"
|
||||
return "{\"mechanism\":\"did not engage\"}"
|
||||
}
|
||||
let parent_a: String = json_get(body, "parent_a")
|
||||
let parent_b: String = json_get(body, "parent_b")
|
||||
if str_eq(parent_a, "") {
|
||||
return "{\"error\":\"parent_a is required\",\"code\":\"missing_param\"}"
|
||||
return "{\"mechanism\":\"did not engage\"}"
|
||||
}
|
||||
if str_eq(parent_b, "") {
|
||||
return "{\"error\":\"parent_b is required\",\"code\":\"missing_param\"}"
|
||||
return "{\"mechanism\":\"did not engage\"}"
|
||||
}
|
||||
let req: String = "synthesize " + parent_a + " " + parent_b
|
||||
let tags: String = "[\"soul-inbox-pending\",\"synthesis-request\"]"
|
||||
@@ -341,17 +259,6 @@ fn handle_connectors(method: String, clean: String, body: String) -> String {
|
||||
fn handle_request(method: String, path: String, body: String) -> String {
|
||||
let clean: String = strip_query(path)
|
||||
|
||||
// Rate limit check. Extract caller IP from REMOTE_ADDR env var (set by the
|
||||
// EL HTTP runtime for each request). Skip enforcement when empty so
|
||||
// loopback/internal callers are never blocked.
|
||||
let ip: String = env("REMOTE_ADDR")
|
||||
if !str_eq(ip, "") {
|
||||
let rl_result: String = rate_limit_check(ip, clean)
|
||||
if !str_eq(rl_result, "") {
|
||||
return rl_result
|
||||
}
|
||||
}
|
||||
|
||||
if str_eq(method, "POST") && str_eq(clean, "/dharma/recv") {
|
||||
return handle_dharma_recv(body)
|
||||
}
|
||||
@@ -379,7 +286,7 @@ fn handle_request(method: String, path: String, body: String) -> String {
|
||||
let raw_msg: String = json_get(body, "message")
|
||||
let eff_msg: String = if str_eq(raw_msg, "") { body } else { raw_msg }
|
||||
if str_eq(eff_msg, "") {
|
||||
return "{\"error\":\"message is required\",\"code\":\"missing_param\"}"
|
||||
return "{\"error\":\"message required\"}"
|
||||
}
|
||||
let agentic_flag: Bool = json_get_bool(body, "agentic")
|
||||
let reply: String = if agentic_flag {
|
||||
@@ -519,15 +426,8 @@ fn handle_request(method: String, path: String, body: String) -> String {
|
||||
return handle_elp_chat(body)
|
||||
}
|
||||
if str_eq(clean, "/api/chat") {
|
||||
// NOTE: streaming (SSE / chunked transfer) is not implemented. All chat
|
||||
// responses are buffered and returned as a single JSON object. Streaming
|
||||
// would require runtime-level SSE support in el_runtime.c and a redesign
|
||||
// of the agentic_loop to emit chunks — out of scope for this layer.
|
||||
let raw_msg: String = json_get(body, "message")
|
||||
if str_eq(raw_msg, "") {
|
||||
return "{\"error\":\"message is required\",\"code\":\"missing_param\"}"
|
||||
}
|
||||
let agentic_flag: Bool = json_get_bool(body, "agentic")
|
||||
let raw_msg: String = json_get(body, "message")
|
||||
let reply: String = if agentic_flag {
|
||||
handle_chat_agentic(body)
|
||||
} else {
|
||||
|
||||
@@ -369,7 +369,6 @@ load_identity_context()
|
||||
seed_persona_from_env()
|
||||
let boot_num: Int = mem_boot_count_inc()
|
||||
state_set("soul_boot_count", int_to_str(boot_num))
|
||||
state_set("soul_boot_ts", int_to_str(time_now()))
|
||||
println("[soul] boot #" + int_to_str(boot_num))
|
||||
emit_session_start_event()
|
||||
|
||||
|
||||
Reference in New Issue
Block a user