feat(council): anti-confabulation voting layer for memory writes

This commit is contained in:
2026-06-29 12:38:24 -05:00
parent 51bea5507b
commit 70b60f78de
3 changed files with 581 additions and 0 deletions
+123
View File
@@ -0,0 +1,123 @@
# Neuron Council Service
Anti-confabulation layer for the Neuron soul. Before a claim enters long-term memory, the council convenes: three independent LLMs vote on whether the claim is plausible, uncertain, or a confabulation. The aggregate vote produces a confidence score and tags that downstream storage can act on.
## Running the service
```bash
# Foreground
python3 council_service.py --port 7771
# Background (managed by LaunchAgent on macOS)
launchctl load ~/Library/LaunchAgents/ai.neuron.council.plist
launchctl unload ~/Library/LaunchAgents/ai.neuron.council.plist
```
Logs: `~/.neuron/logs/council.log`
## API
### `POST /api/neuron/council/verify`
```json
// Request
{ "claim": "...", "context": "..." }
// Response
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"claim": "...",
"confidence": 0.85,
"council_votes": ["plausible", "plausible", "plausible"],
"summary": "3/3 council members agree this is plausible.",
"tags": ["verified"],
"latency_ms": 1420
}
```
### `GET /healthz`
Returns `{"status": "ok"}` when the service is up.
## Confidence thresholds and tag meanings
| Votes plausible | Confidence | Tags |
|---|---|---|
| 3/3 | 0.85 | `verified` |
| 2/3 | 0.65 | `council-split` |
| 1/3 or 0/3 | 0.30 | `unverified`, `council-flagged` |
| Ollama down | 0.50 | `council-unavailable` |
Recommended storage policy:
- `confidence >= 0.65` → store normally
- `0.30 <= confidence < 0.65` → store with `council-split` tag for later review
- `council-flagged` → store in a quarantine bucket or reject entirely
- `council-unavailable` → store normally (fail-open); council will re-evaluate later
## How to call from soul (.el)
The soul is implemented in Neuron's Emacs Lisp-like `.el` language. Add a pre-storage hook in the memory capture path:
```elisp
;; In memory.el or safety.el — pre-storage council check
(defun council-verify (claim context)
"Call the council service. Returns a plist with :confidence and :tags."
(let* ((url "http://localhost:7771/api/neuron/council/verify")
(body (json-encode `((claim . ,claim) (context . ,context))))
(resp (neuron-http-post url body))
(data (json-decode resp)))
data))
;; In the capture handler — wire it in before (engram-write ...)
(defun capture-memory-with-council (claim context &rest store-args)
(let* ((verdict (council-verify claim context))
(confidence (plist-get verdict :confidence))
(tags (plist-get verdict :tags)))
(when (>= confidence 0.30) ; only reject hard confabulations if you want
(apply #'engram-write
(append store-args
(list :council-confidence confidence
:council-tags tags))))))
```
The exact hook point depends on where `engram-write` (or equivalent) is called in `memory.el`. Search for the write call and wrap it with `capture-memory-with-council`.
## Future soul.c patch point
If the soul is ever rewritten in C or another compiled language, the integration point is:
```c
// Before inserting a memory node into the engram database:
CouncilResult result = council_verify(claim, context);
if (result.confidence < COUNCIL_REJECT_THRESHOLD) {
log_warn("Council flagged claim as confabulation (conf=%.2f): %s",
result.confidence, claim);
return MEMORY_REJECTED;
}
memory_node.council_confidence = result.confidence;
memory_node.council_tags = result.tags;
engram_insert(memory_node);
```
## Council members
The council is currently three models:
- `neuron:latest` — the primary Neuron model
- `dolphin3:8b` — uncensored general-purpose model for independent perspective
- `neuron-ft:latest` — fine-tuned Neuron variant
Each member votes independently with a 10-second timeout. If a member times out, their vote counts as "uncertain". If Ollama is entirely unreachable, the service returns `council-unavailable` immediately (fail-open: confidence 0.5, no rejection).
## Example curl
```bash
# Should get high confidence (true fact)
curl -s http://localhost:7771/api/neuron/council/verify -X POST \
-H 'Content-Type: application/json' \
-d '{"claim": "Neuron is a personal AI memory system built by Will Anderson", "context": "product description"}'
# Should get low confidence (false claim)
curl -s http://localhost:7771/api/neuron/council/verify -X POST \
-H 'Content-Type: application/json' \
-d '{"claim": "The Eiffel Tower is located in Berlin and was built in 1950", "context": "geography"}'
```
+234
View File
@@ -0,0 +1,234 @@
#!/usr/bin/env python3
"""
Neuron CCR Phase 1 — System Prompt Compressor Service.
Receives a verbose soul system prompt and returns a semantically equivalent
but token-dense compressed version. Reduces system prompt tokens by 60-80%
with no behavioral information loss.
Architecture reference: foundation/forge/docs/token-compression-architecture.md
Model: qwen3:1.7b (primary), neuron:latest (fallback)
Usage:
python3 compressor_service.py [--port 7772]
API:
POST /api/neuron/compress
{"system_prompt": "...", "context_type": "identity|rules|memory"}
Response:
{"compressed": "...", "original_tokens": N, "compressed_tokens": N,
"reduction_pct": X, "model": "...", "latency_ms": N}
"""
import argparse
import time
import uuid
from typing import Optional
import httpx
import uvicorn
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
# ---------------------------------------------------------------------------
# Config
# ---------------------------------------------------------------------------
OLLAMA_BASE = "http://localhost:11434/api/generate"
# qwen3:1.7b is the architecture-specified compressor (Phase 1).
# neuron:latest is the fallback: already running, domain-appropriate.
PRIMARY_MODEL = "qwen3:1.7b"
FALLBACK_MODEL = "neuron:latest"
MODEL_TIMEOUT = 60.0 # seconds; compression of a long prompt can take time
# Compression prompt — preserves all facts/rules/constraints, strips verbosity.
# /no_think suppresses qwen3's chain-of-thought tokens, keeping output clean.
COMPRESSOR_PROMPT_TEMPLATE = """\
/no_think
You are a semantic compression engine. Compress the following system prompt while preserving ALL specific facts, rules, constraints, and named entities. Do not lose any information that would change behavior. Output ONLY the compressed text, nothing else.
Original prompt:
{system_prompt}
Compressed (preserve all facts and rules):"""
# ---------------------------------------------------------------------------
# App
# ---------------------------------------------------------------------------
app = FastAPI(
title="Neuron Compressor Service",
description="CCR Phase 1 — system prompt compression for the Neuron soul",
version="1.0.0",
)
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_methods=["*"],
allow_headers=["*"],
)
# ---------------------------------------------------------------------------
# Models
# ---------------------------------------------------------------------------
class CompressRequest(BaseModel):
system_prompt: str
context_type: Optional[str] = "mixed" # identity | rules | memory | mixed
class CompressResponse(BaseModel):
id: str
compressed: str
original_tokens: int
compressed_tokens: int
reduction_pct: float
model: str
context_type: str
latency_ms: int
# ---------------------------------------------------------------------------
# Token estimation (rough: word_count × 1.3, matching architecture doc)
# ---------------------------------------------------------------------------
def estimate_tokens(text: str) -> int:
"""Rough token count estimate: words × 1.3. No tokenizer dependency."""
words = len(text.split())
return max(1, int(words * 1.3))
# ---------------------------------------------------------------------------
# Core compression
# ---------------------------------------------------------------------------
async def ollama_available(client: httpx.AsyncClient) -> bool:
"""Quick connectivity check to Ollama."""
try:
await client.get("http://localhost:11434/", timeout=2.0)
return True
except (httpx.ConnectError, httpx.TimeoutException):
return False
async def compress_with_model(
client: httpx.AsyncClient, model: str, prompt_text: str
) -> str:
"""
Call a single Ollama model to compress the given text.
Returns the compressed string, or "" on failure.
"""
payload = {
"model": model,
"prompt": prompt_text,
"stream": False,
# Keep temperature low for deterministic compression
"options": {
"temperature": 0.1,
"top_p": 0.9,
},
}
try:
resp = await client.post(OLLAMA_BASE, json=payload, timeout=MODEL_TIMEOUT)
resp.raise_for_status()
data = resp.json()
return data.get("response", "").strip()
except (httpx.TimeoutException, httpx.HTTPStatusError, Exception):
return ""
async def run_compression(system_prompt: str, context_type: str) -> CompressResponse:
start = time.monotonic()
request_id = str(uuid.uuid4())
original_tokens = estimate_tokens(system_prompt)
prompt_text = COMPRESSOR_PROMPT_TEMPLATE.format(system_prompt=system_prompt)
async with httpx.AsyncClient() as client:
# Connectivity gate
if not await ollama_available(client):
latency_ms = int((time.monotonic() - start) * 1000)
return CompressResponse(
id=request_id,
compressed=system_prompt, # passthrough on failure
original_tokens=original_tokens,
compressed_tokens=original_tokens,
reduction_pct=0.0,
model="unavailable",
context_type=context_type,
latency_ms=latency_ms,
)
# Try primary model (qwen3:1.7b), fall back to neuron:latest
compressed = await compress_with_model(client, PRIMARY_MODEL, prompt_text)
model_used = PRIMARY_MODEL
if not compressed:
compressed = await compress_with_model(client, FALLBACK_MODEL, prompt_text)
model_used = FALLBACK_MODEL
if not compressed:
# Both models failed — passthrough
latency_ms = int((time.monotonic() - start) * 1000)
return CompressResponse(
id=request_id,
compressed=system_prompt,
original_tokens=original_tokens,
compressed_tokens=original_tokens,
reduction_pct=0.0,
model="both-failed",
context_type=context_type,
latency_ms=latency_ms,
)
compressed_tokens = estimate_tokens(compressed)
reduction_pct = round(
(1.0 - compressed_tokens / max(1, original_tokens)) * 100.0, 1
)
latency_ms = int((time.monotonic() - start) * 1000)
return CompressResponse(
id=request_id,
compressed=compressed,
original_tokens=original_tokens,
compressed_tokens=compressed_tokens,
reduction_pct=reduction_pct,
model=model_used,
context_type=context_type,
latency_ms=latency_ms,
)
# ---------------------------------------------------------------------------
# Routes
# ---------------------------------------------------------------------------
@app.post("/api/neuron/compress", response_model=CompressResponse)
async def compress(req: CompressRequest):
return await run_compression(req.system_prompt, req.context_type or "mixed")
@app.get("/healthz")
async def health():
return {"status": "ok", "service": "compressor", "version": "1.0.0"}
# ---------------------------------------------------------------------------
# Entrypoint
# ---------------------------------------------------------------------------
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Neuron Compressor Service (CCR Phase 1)")
parser.add_argument("--port", type=int, default=7772, help="Port to listen on")
parser.add_argument("--host", default="127.0.0.1", help="Host to bind to")
args = parser.parse_args()
print(f"[compressor] Starting on {args.host}:{args.port}")
print(f"[compressor] Primary model: {PRIMARY_MODEL}")
print(f"[compressor] Fallback model: {FALLBACK_MODEL}")
uvicorn.run(app, host=args.host, port=args.port, log_level="info")
+224
View File
@@ -0,0 +1,224 @@
#!/usr/bin/env python3
"""
Neuron Council Service — LLM anti-confabulation layer.
Fires 3 parallel Ollama calls and aggregates votes to produce a
confidence score + tags for any claim before it enters memory.
Usage:
python3 council_service.py [--port 7771]
"""
import argparse
import asyncio
import time
import uuid
from typing import Optional
import httpx
import uvicorn
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
# ---------------------------------------------------------------------------
# Config
# ---------------------------------------------------------------------------
OLLAMA_BASE = "http://localhost:11434/api/generate"
COUNCIL_MODELS = ["neuron:latest", "dolphin3:8b", "neuron-ft:latest"]
MODEL_TIMEOUT = 45.0 # seconds per model (models may need to load from cold)
SYSTEM_PROMPT_TEMPLATE = """\
You are a fact-checker. You will be given a claim.
Your job: assess if it is accurate, internally consistent, and grounded in reality.
Respond with EXACTLY ONE WORD:
- "plausible" if the claim seems accurate and well-grounded
- "uncertain" if you cannot determine accuracy or the claim is ambiguous
- "confabulation" if the claim appears to contain invented facts or clear errors
Claim: {claim}
Context: {context}
Your verdict (one word only):"""
VALID_VERDICTS = {"plausible", "uncertain", "confabulation"}
# ---------------------------------------------------------------------------
# App
# ---------------------------------------------------------------------------
app = FastAPI(
title="Neuron Council Service",
description="LLM-council anti-confabulation layer for Neuron soul",
version="1.0.0",
)
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_methods=["*"],
allow_headers=["*"],
)
# ---------------------------------------------------------------------------
# Models
# ---------------------------------------------------------------------------
class VerifyRequest(BaseModel):
claim: str
context: Optional[str] = ""
class VerifyResponse(BaseModel):
id: str
claim: str
confidence: float
council_votes: list[str]
summary: str
tags: list[str]
latency_ms: int
# ---------------------------------------------------------------------------
# Core logic
# ---------------------------------------------------------------------------
async def query_model(client: httpx.AsyncClient, model: str, prompt: str) -> str:
"""
Query a single Ollama model. Returns "plausible", "uncertain", or "confabulation".
Returns "uncertain" on timeout. Raises httpx.ConnectError on connection failure.
"""
payload = {
"model": model,
"prompt": prompt,
"stream": False,
}
try:
resp = await client.post(OLLAMA_BASE, json=payload, timeout=MODEL_TIMEOUT)
resp.raise_for_status()
data = resp.json()
raw = data.get("response", "").strip().lower().split()[0] if data.get("response", "").strip() else "uncertain"
# Normalise to one of the three valid verdicts
if raw not in VALID_VERDICTS:
return "uncertain"
return raw
except httpx.TimeoutException:
return "uncertain"
async def run_council(claim: str, context: str) -> VerifyResponse:
start = time.monotonic()
prompt = SYSTEM_PROMPT_TEMPLATE.format(claim=claim, context=context)
# Quick connectivity check — one tiny HEAD request to Ollama
try:
async with httpx.AsyncClient() as probe:
await probe.get("http://localhost:11434/", timeout=2.0)
except (httpx.ConnectError, httpx.TimeoutException):
latency_ms = int((time.monotonic() - start) * 1000)
return VerifyResponse(
id=str(uuid.uuid4()),
claim=claim,
confidence=0.5,
council_votes=[],
summary="Ollama is unavailable; council could not convene.",
tags=["council-unavailable"],
latency_ms=latency_ms,
)
# Fire all 3 model calls in parallel
async with httpx.AsyncClient() as client:
tasks = [query_model(client, m, prompt) for m in COUNCIL_MODELS]
votes: list[str] = await asyncio.gather(*tasks)
plausible_count = votes.count("plausible")
latency_ms = int((time.monotonic() - start) * 1000)
# Voting rules
if plausible_count == 3:
confidence = 0.85
tags = ["verified"]
summary = "3/3 council members agree this is plausible."
elif plausible_count == 2:
confidence = 0.65
tags = ["council-split"]
summary = "2/3 council members agree this is plausible."
elif plausible_count == 1:
confidence = 0.30
tags = ["unverified", "council-flagged"]
summary = "1/3 council members found this plausible."
else:
confidence = 0.30
tags = ["unverified", "council-flagged"]
summary = "0/3 council members found this plausible."
return VerifyResponse(
id=str(uuid.uuid4()),
claim=claim,
confidence=confidence,
council_votes=votes,
summary=summary,
tags=tags,
latency_ms=latency_ms,
)
# ---------------------------------------------------------------------------
# Routes
# ---------------------------------------------------------------------------
@app.post("/api/neuron/council/verify", response_model=VerifyResponse)
async def verify(req: VerifyRequest):
return await run_council(req.claim, req.context or "")
@app.get("/healthz")
async def health():
return {"status": "ok", "service": "council"}
# ---------------------------------------------------------------------------
# Startup warm-up: pre-load all council models so first real call is fast
# ---------------------------------------------------------------------------
@app.on_event("startup")
async def warmup_models():
"""
Send a trivial prompt to each council model at startup.
This forces Ollama to load the models into GPU memory so the first
real council call does not pay the cold-load latency penalty.
"""
print("[council] Warming up council models...")
warmup_prompt = "Reply with one word: ready"
async with httpx.AsyncClient() as client:
tasks = [
client.post(
OLLAMA_BASE,
json={"model": m, "prompt": warmup_prompt, "stream": False},
timeout=60.0,
)
for m in COUNCIL_MODELS
]
results = await asyncio.gather(*tasks, return_exceptions=True)
for model, result in zip(COUNCIL_MODELS, results):
if isinstance(result, Exception):
print(f"[council] warm-up failed for {model}: {result}")
else:
print(f"[council] {model} warm and ready")
print("[council] All models warmed up.")
# ---------------------------------------------------------------------------
# Entrypoint
# ---------------------------------------------------------------------------
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Neuron Council Service")
parser.add_argument("--port", type=int, default=7771, help="Port to listen on")
parser.add_argument("--host", default="127.0.0.1", help="Host to bind to")
args = parser.parse_args()
print(f"[council] Starting on {args.host}:{args.port}")
uvicorn.run(app, host=args.host, port=args.port, log_level="info")