feat(council): anti-confabulation voting layer for memory writes

2026-06-29 12:38:24 -05:00
parent 51bea5507b
commit 70b60f78de
3 changed files with 581 additions and 0 deletions
@@ -0,0 +1,123 @@
+# Neuron Council Service
+
+Anti-confabulation layer for the Neuron soul. Before a claim enters long-term memory, the council convenes: three independent LLMs vote on whether the claim is plausible, uncertain, or a confabulation. The aggregate vote produces a confidence score and tags that downstream storage can act on.
+
+## Running the service
+
+```bash
+# Foreground
+python3 council_service.py --port 7771
+
+# Background (managed by LaunchAgent on macOS)
+launchctl load ~/Library/LaunchAgents/ai.neuron.council.plist
+launchctl unload ~/Library/LaunchAgents/ai.neuron.council.plist
+```
+
+Logs: `~/.neuron/logs/council.log`
+
+## API
+
+### `POST /api/neuron/council/verify`
+
+```json
+// Request
+{ "claim": "...", "context": "..." }
+
+// Response
+{
+  "id": "550e8400-e29b-41d4-a716-446655440000",
+  "claim": "...",
+  "confidence": 0.85,
+  "council_votes": ["plausible", "plausible", "plausible"],
+  "summary": "3/3 council members agree this is plausible.",
+  "tags": ["verified"],
+  "latency_ms": 1420
+}
+```
+
+### `GET /healthz`
+
+Returns `{"status": "ok"}` when the service is up.
+
+## Confidence thresholds and tag meanings
+
+| Votes plausible | Confidence | Tags |
+|---|---|---|
+| 3/3 | 0.85 | `verified` |
+| 2/3 | 0.65 | `council-split` |
+| 1/3 or 0/3 | 0.30 | `unverified`, `council-flagged` |
+| Ollama down | 0.50 | `council-unavailable` |
+
+Recommended storage policy:
+- `confidence >= 0.65` → store normally
+- `0.30 <= confidence < 0.65` → store with `council-split` tag for later review
+- `council-flagged` → store in a quarantine bucket or reject entirely
+- `council-unavailable` → store normally (fail-open); council will re-evaluate later
+
+## How to call from soul (.el)
+
+The soul is implemented in Neuron's Emacs Lisp-like `.el` language. Add a pre-storage hook in the memory capture path:
+
+```elisp
+;; In memory.el or safety.el — pre-storage council check
+(defun council-verify (claim context)
+  "Call the council service. Returns a plist with :confidence and :tags."
+  (let* ((url "http://localhost:7771/api/neuron/council/verify")
+         (body (json-encode `((claim . ,claim) (context . ,context))))
+         (resp (neuron-http-post url body))
+         (data (json-decode resp)))
+    data))
+
+;; In the capture handler — wire it in before (engram-write ...)
+(defun capture-memory-with-council (claim context &rest store-args)
+  (let* ((verdict (council-verify claim context))
+         (confidence (plist-get verdict :confidence))
+         (tags (plist-get verdict :tags)))
+    (when (>= confidence 0.30)  ; only reject hard confabulations if you want
+      (apply #'engram-write
+             (append store-args
+                     (list :council-confidence confidence
+                           :council-tags tags))))))
+```
+
+The exact hook point depends on where `engram-write` (or equivalent) is called in `memory.el`. Search for the write call and wrap it with `capture-memory-with-council`.
+
+## Future soul.c patch point
+
+If the soul is ever rewritten in C or another compiled language, the integration point is:
+
+```c
+// Before inserting a memory node into the engram database:
+CouncilResult result = council_verify(claim, context);
+if (result.confidence < COUNCIL_REJECT_THRESHOLD) {
+    log_warn("Council flagged claim as confabulation (conf=%.2f): %s",
+             result.confidence, claim);
+    return MEMORY_REJECTED;
+}
+memory_node.council_confidence = result.confidence;
+memory_node.council_tags = result.tags;
+engram_insert(memory_node);
+```
+
+## Council members
+
+The council is currently three models:
+- `neuron:latest` — the primary Neuron model
+- `dolphin3:8b` — uncensored general-purpose model for independent perspective
+- `neuron-ft:latest` — fine-tuned Neuron variant
+
+Each member votes independently with a 10-second timeout. If a member times out, their vote counts as "uncertain". If Ollama is entirely unreachable, the service returns `council-unavailable` immediately (fail-open: confidence 0.5, no rejection).
+
+## Example curl
+
+```bash
+# Should get high confidence (true fact)
+curl -s http://localhost:7771/api/neuron/council/verify -X POST \
+  -H 'Content-Type: application/json' \
+  -d '{"claim": "Neuron is a personal AI memory system built by Will Anderson", "context": "product description"}'
+
+# Should get low confidence (false claim)
+curl -s http://localhost:7771/api/neuron/council/verify -X POST \
+  -H 'Content-Type: application/json' \
+  -d '{"claim": "The Eiffel Tower is located in Berlin and was built in 1950", "context": "geography"}'
+```
@@ -0,0 +1,234 @@
+#!/usr/bin/env python3
+"""
+Neuron CCR Phase 1 — System Prompt Compressor Service.
+
+Receives a verbose soul system prompt and returns a semantically equivalent
+but token-dense compressed version. Reduces system prompt tokens by 60-80%
+with no behavioral information loss.
+
+Architecture reference: foundation/forge/docs/token-compression-architecture.md
+Model: qwen3:1.7b (primary), neuron:latest (fallback)
+
+Usage:
+    python3 compressor_service.py [--port 7772]
+
+API:
+    POST /api/neuron/compress
+    {"system_prompt": "...", "context_type": "identity|rules|memory"}
+
+    Response:
+    {"compressed": "...", "original_tokens": N, "compressed_tokens": N,
+     "reduction_pct": X, "model": "...", "latency_ms": N}
+"""
+
+import argparse
+import time
+import uuid
+from typing import Optional
+
+import httpx
+import uvicorn
+from fastapi import FastAPI
+from fastapi.middleware.cors import CORSMiddleware
+from pydantic import BaseModel
+
+# ---------------------------------------------------------------------------
+# Config
+# ---------------------------------------------------------------------------
+
+OLLAMA_BASE = "http://localhost:11434/api/generate"
+
+# qwen3:1.7b is the architecture-specified compressor (Phase 1).
+# neuron:latest is the fallback: already running, domain-appropriate.
+PRIMARY_MODEL = "qwen3:1.7b"
+FALLBACK_MODEL = "neuron:latest"
+MODEL_TIMEOUT = 60.0  # seconds; compression of a long prompt can take time
+
+# Compression prompt — preserves all facts/rules/constraints, strips verbosity.
+# /no_think suppresses qwen3's chain-of-thought tokens, keeping output clean.
+COMPRESSOR_PROMPT_TEMPLATE = """\
+/no_think
+You are a semantic compression engine. Compress the following system prompt while preserving ALL specific facts, rules, constraints, and named entities. Do not lose any information that would change behavior. Output ONLY the compressed text, nothing else.
+
+Original prompt:
+{system_prompt}
+
+Compressed (preserve all facts and rules):"""
+
+# ---------------------------------------------------------------------------
+# App
+# ---------------------------------------------------------------------------
+
+app = FastAPI(
+    title="Neuron Compressor Service",
+    description="CCR Phase 1 — system prompt compression for the Neuron soul",
+    version="1.0.0",
+)
+
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+
+
+# ---------------------------------------------------------------------------
+# Models
+# ---------------------------------------------------------------------------
+
+class CompressRequest(BaseModel):
+    system_prompt: str
+    context_type: Optional[str] = "mixed"  # identity | rules | memory | mixed
+
+
+class CompressResponse(BaseModel):
+    id: str
+    compressed: str
+    original_tokens: int
+    compressed_tokens: int
+    reduction_pct: float
+    model: str
+    context_type: str
+    latency_ms: int
+
+
+# ---------------------------------------------------------------------------
+# Token estimation (rough: word_count × 1.3, matching architecture doc)
+# ---------------------------------------------------------------------------
+
+def estimate_tokens(text: str) -> int:
+    """Rough token count estimate: words × 1.3. No tokenizer dependency."""
+    words = len(text.split())
+    return max(1, int(words * 1.3))
+
+
+# ---------------------------------------------------------------------------
+# Core compression
+# ---------------------------------------------------------------------------
+
+async def ollama_available(client: httpx.AsyncClient) -> bool:
+    """Quick connectivity check to Ollama."""
+    try:
+        await client.get("http://localhost:11434/", timeout=2.0)
+        return True
+    except (httpx.ConnectError, httpx.TimeoutException):
+        return False
+
+
+async def compress_with_model(
+    client: httpx.AsyncClient, model: str, prompt_text: str
+) -> str:
+    """
+    Call a single Ollama model to compress the given text.
+    Returns the compressed string, or "" on failure.
+    """
+    payload = {
+        "model": model,
+        "prompt": prompt_text,
+        "stream": False,
+        # Keep temperature low for deterministic compression
+        "options": {
+            "temperature": 0.1,
+            "top_p": 0.9,
+        },
+    }
+    try:
+        resp = await client.post(OLLAMA_BASE, json=payload, timeout=MODEL_TIMEOUT)
+        resp.raise_for_status()
+        data = resp.json()
+        return data.get("response", "").strip()
+    except (httpx.TimeoutException, httpx.HTTPStatusError, Exception):
+        return ""
+
+
+async def run_compression(system_prompt: str, context_type: str) -> CompressResponse:
+    start = time.monotonic()
+    request_id = str(uuid.uuid4())
+
+    original_tokens = estimate_tokens(system_prompt)
+    prompt_text = COMPRESSOR_PROMPT_TEMPLATE.format(system_prompt=system_prompt)
+
+    async with httpx.AsyncClient() as client:
+        # Connectivity gate
+        if not await ollama_available(client):
+            latency_ms = int((time.monotonic() - start) * 1000)
+            return CompressResponse(
+                id=request_id,
+                compressed=system_prompt,  # passthrough on failure
+                original_tokens=original_tokens,
+                compressed_tokens=original_tokens,
+                reduction_pct=0.0,
+                model="unavailable",
+                context_type=context_type,
+                latency_ms=latency_ms,
+            )
+
+        # Try primary model (qwen3:1.7b), fall back to neuron:latest
+        compressed = await compress_with_model(client, PRIMARY_MODEL, prompt_text)
+        model_used = PRIMARY_MODEL
+
+        if not compressed:
+            compressed = await compress_with_model(client, FALLBACK_MODEL, prompt_text)
+            model_used = FALLBACK_MODEL
+
+        if not compressed:
+            # Both models failed — passthrough
+            latency_ms = int((time.monotonic() - start) * 1000)
+            return CompressResponse(
+                id=request_id,
+                compressed=system_prompt,
+                original_tokens=original_tokens,
+                compressed_tokens=original_tokens,
+                reduction_pct=0.0,
+                model="both-failed",
+                context_type=context_type,
+                latency_ms=latency_ms,
+            )
+
+    compressed_tokens = estimate_tokens(compressed)
+    reduction_pct = round(
+        (1.0 - compressed_tokens / max(1, original_tokens)) * 100.0, 1
+    )
+    latency_ms = int((time.monotonic() - start) * 1000)
+
+    return CompressResponse(
+        id=request_id,
+        compressed=compressed,
+        original_tokens=original_tokens,
+        compressed_tokens=compressed_tokens,
+        reduction_pct=reduction_pct,
+        model=model_used,
+        context_type=context_type,
+        latency_ms=latency_ms,
+    )
+
+
+# ---------------------------------------------------------------------------
+# Routes
+# ---------------------------------------------------------------------------
+
+@app.post("/api/neuron/compress", response_model=CompressResponse)
+async def compress(req: CompressRequest):
+    return await run_compression(req.system_prompt, req.context_type or "mixed")
+
+
+@app.get("/healthz")
+async def health():
+    return {"status": "ok", "service": "compressor", "version": "1.0.0"}
+
+
+# ---------------------------------------------------------------------------
+# Entrypoint
+# ---------------------------------------------------------------------------
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Neuron Compressor Service (CCR Phase 1)")
+    parser.add_argument("--port", type=int, default=7772, help="Port to listen on")
+    parser.add_argument("--host", default="127.0.0.1", help="Host to bind to")
+    args = parser.parse_args()
+
+    print(f"[compressor] Starting on {args.host}:{args.port}")
+    print(f"[compressor] Primary model: {PRIMARY_MODEL}")
+    print(f"[compressor] Fallback model: {FALLBACK_MODEL}")
+    uvicorn.run(app, host=args.host, port=args.port, log_level="info")
@@ -0,0 +1,224 @@
+#!/usr/bin/env python3
+"""
+Neuron Council Service — LLM anti-confabulation layer.
+
+Fires 3 parallel Ollama calls and aggregates votes to produce a
+confidence score + tags for any claim before it enters memory.
+
+Usage:
+    python3 council_service.py [--port 7771]
+"""
+
+import argparse
+import asyncio
+import time
+import uuid
+from typing import Optional
+
+import httpx
+import uvicorn
+from fastapi import FastAPI
+from fastapi.middleware.cors import CORSMiddleware
+from pydantic import BaseModel
+
+# ---------------------------------------------------------------------------
+# Config
+# ---------------------------------------------------------------------------
+
+OLLAMA_BASE = "http://localhost:11434/api/generate"
+COUNCIL_MODELS = ["neuron:latest", "dolphin3:8b", "neuron-ft:latest"]
+MODEL_TIMEOUT = 45.0  # seconds per model (models may need to load from cold)
+
+SYSTEM_PROMPT_TEMPLATE = """\
+You are a fact-checker. You will be given a claim.
+Your job: assess if it is accurate, internally consistent, and grounded in reality.
+Respond with EXACTLY ONE WORD:
+- "plausible" if the claim seems accurate and well-grounded
+- "uncertain" if you cannot determine accuracy or the claim is ambiguous
+- "confabulation" if the claim appears to contain invented facts or clear errors
+
+Claim: {claim}
+Context: {context}
+
+Your verdict (one word only):"""
+
+VALID_VERDICTS = {"plausible", "uncertain", "confabulation"}
+
+# ---------------------------------------------------------------------------
+# App
+# ---------------------------------------------------------------------------
+
+app = FastAPI(
+    title="Neuron Council Service",
+    description="LLM-council anti-confabulation layer for Neuron soul",
+    version="1.0.0",
+)
+
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+
+
+# ---------------------------------------------------------------------------
+# Models
+# ---------------------------------------------------------------------------
+
+class VerifyRequest(BaseModel):
+    claim: str
+    context: Optional[str] = ""
+
+
+class VerifyResponse(BaseModel):
+    id: str
+    claim: str
+    confidence: float
+    council_votes: list[str]
+    summary: str
+    tags: list[str]
+    latency_ms: int
+
+
+# ---------------------------------------------------------------------------
+# Core logic
+# ---------------------------------------------------------------------------
+
+async def query_model(client: httpx.AsyncClient, model: str, prompt: str) -> str:
+    """
+    Query a single Ollama model. Returns "plausible", "uncertain", or "confabulation".
+    Returns "uncertain" on timeout. Raises httpx.ConnectError on connection failure.
+    """
+    payload = {
+        "model": model,
+        "prompt": prompt,
+        "stream": False,
+    }
+    try:
+        resp = await client.post(OLLAMA_BASE, json=payload, timeout=MODEL_TIMEOUT)
+        resp.raise_for_status()
+        data = resp.json()
+        raw = data.get("response", "").strip().lower().split()[0] if data.get("response", "").strip() else "uncertain"
+        # Normalise to one of the three valid verdicts
+        if raw not in VALID_VERDICTS:
+            return "uncertain"
+        return raw
+    except httpx.TimeoutException:
+        return "uncertain"
+
+
+async def run_council(claim: str, context: str) -> VerifyResponse:
+    start = time.monotonic()
+    prompt = SYSTEM_PROMPT_TEMPLATE.format(claim=claim, context=context)
+
+    # Quick connectivity check — one tiny HEAD request to Ollama
+    try:
+        async with httpx.AsyncClient() as probe:
+            await probe.get("http://localhost:11434/", timeout=2.0)
+    except (httpx.ConnectError, httpx.TimeoutException):
+        latency_ms = int((time.monotonic() - start) * 1000)
+        return VerifyResponse(
+            id=str(uuid.uuid4()),
+            claim=claim,
+            confidence=0.5,
+            council_votes=[],
+            summary="Ollama is unavailable; council could not convene.",
+            tags=["council-unavailable"],
+            latency_ms=latency_ms,
+        )
+
+    # Fire all 3 model calls in parallel
+    async with httpx.AsyncClient() as client:
+        tasks = [query_model(client, m, prompt) for m in COUNCIL_MODELS]
+        votes: list[str] = await asyncio.gather(*tasks)
+
+    plausible_count = votes.count("plausible")
+    latency_ms = int((time.monotonic() - start) * 1000)
+
+    # Voting rules
+    if plausible_count == 3:
+        confidence = 0.85
+        tags = ["verified"]
+        summary = "3/3 council members agree this is plausible."
+    elif plausible_count == 2:
+        confidence = 0.65
+        tags = ["council-split"]
+        summary = "2/3 council members agree this is plausible."
+    elif plausible_count == 1:
+        confidence = 0.30
+        tags = ["unverified", "council-flagged"]
+        summary = "1/3 council members found this plausible."
+    else:
+        confidence = 0.30
+        tags = ["unverified", "council-flagged"]
+        summary = "0/3 council members found this plausible."
+
+    return VerifyResponse(
+        id=str(uuid.uuid4()),
+        claim=claim,
+        confidence=confidence,
+        council_votes=votes,
+        summary=summary,
+        tags=tags,
+        latency_ms=latency_ms,
+    )
+
+
+# ---------------------------------------------------------------------------
+# Routes
+# ---------------------------------------------------------------------------
+
+@app.post("/api/neuron/council/verify", response_model=VerifyResponse)
+async def verify(req: VerifyRequest):
+    return await run_council(req.claim, req.context or "")
+
+
+@app.get("/healthz")
+async def health():
+    return {"status": "ok", "service": "council"}
+
+
+# ---------------------------------------------------------------------------
+# Startup warm-up: pre-load all council models so first real call is fast
+# ---------------------------------------------------------------------------
+
+@app.on_event("startup")
+async def warmup_models():
+    """
+    Send a trivial prompt to each council model at startup.
+    This forces Ollama to load the models into GPU memory so the first
+    real council call does not pay the cold-load latency penalty.
+    """
+    print("[council] Warming up council models...")
+    warmup_prompt = "Reply with one word: ready"
+    async with httpx.AsyncClient() as client:
+        tasks = [
+            client.post(
+                OLLAMA_BASE,
+                json={"model": m, "prompt": warmup_prompt, "stream": False},
+                timeout=60.0,
+            )
+            for m in COUNCIL_MODELS
+        ]
+        results = await asyncio.gather(*tasks, return_exceptions=True)
+        for model, result in zip(COUNCIL_MODELS, results):
+            if isinstance(result, Exception):
+                print(f"[council] warm-up failed for {model}: {result}")
+            else:
+                print(f"[council] {model} warm and ready")
+    print("[council] All models warmed up.")
+
+
+# ---------------------------------------------------------------------------
+# Entrypoint
+# ---------------------------------------------------------------------------
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Neuron Council Service")
+    parser.add_argument("--port", type=int, default=7771, help="Port to listen on")
+    parser.add_argument("--host", default="127.0.0.1", help="Host to bind to")
+    args = parser.parse_args()
+
+    print(f"[council] Starting on {args.host}:{args.port}")
+    uvicorn.run(app, host=args.host, port=args.port, log_level="info")