fix: vision in agentic chat path (image content block) #56

Merged
will.anderson merged 1 commits from fix/chat-vision-attachments into main 2026-06-28 17:12:55 +00:00
Member

Vision in the agentic chat path (image content block)

Why: In the desktop app, attaching a screenshot and asking "what's in this?" returned a confabulated description of a different image — the cardinal rule, broken in-product. Root cause: the app sent attachments to the soul as text path markers ([Attached file: <path>]) with no pixels, and the only vision endpoint (/api/see) is a one-shot with no memory/tools that the app never called anyway.

What this changes (soul side): handle_chat_agentic now reads image (base64) + image_media_type from the request body. When present, the current user turn's content is built as an Anthropic content-block array [{type:text}, {type:image, source:{base64,…}}] instead of a plain string — so the model sees raw pixels with memory, history, and tools (parity with the CLI).

Safe / additive: when image is empty, output is byte-identical to before — zero regression risk for text turns. elc compiles clean; a full elb codegen of the tree regenerated dist/chat.c with the change cleanly.

⚠️ Build note for Will (the reason this is a PR, not a direct build): the binary links from the committed inlined dist/soul.c, which my local toolchain can't regenerate (local elc emits only the ~40 KB non-inlined unit; elb has no amalgamate flag; the matched Artifact-Registry toolchain is gated behind the CI service account I don't have). CI itself restores the committed soul.c rather than regenerating it. So this PR changes chat.el only — dist/soul.c needs to be regenerated with your toolchain for the change to reach the binary.

Pairs with: neuron-ui PR fix/chat-vision-attachments (reads bytes → base64 + MIME, sends image/image_media_type, anti-confab guard).

Test plan (after soul.c regen + build):

  1. Attach a screenshot, ask "what's in this?" → description matches the real image.
  2. Follow-up needing memory ("does this relate to what we discussed?") → uses image and memory.
  3. No image → unchanged. Unreadable file type → app says so, never invents.
## Vision in the agentic chat path (image content block) **Why:** In the desktop app, attaching a screenshot and asking "what's in this?" returned a *confabulated* description of a different image — the cardinal rule, broken in-product. Root cause: the app sent attachments to the soul as **text path markers** (`[Attached file: <path>]`) with no pixels, and the only vision endpoint (`/api/see`) is a one-shot with no memory/tools that the app never called anyway. **What this changes (soul side):** `handle_chat_agentic` now reads `image` (base64) + `image_media_type` from the request body. When present, the current user turn's `content` is built as an Anthropic **content-block array** `[{type:text}, {type:image, source:{base64,…}}]` instead of a plain string — so the model sees raw pixels **with** memory, history, and tools (parity with the CLI). **Safe / additive:** when `image` is empty, output is byte-identical to before — zero regression risk for text turns. `elc` compiles clean; a full `elb` codegen of the tree regenerated `dist/chat.c` with the change cleanly. **⚠️ Build note for Will (the reason this is a PR, not a direct build):** the binary links from the committed **inlined `dist/soul.c`**, which my local toolchain can't regenerate (local `elc` emits only the ~40 KB non-inlined unit; `elb` has no amalgamate flag; the matched Artifact-Registry toolchain is gated behind the CI service account I don't have). CI itself restores the committed `soul.c` rather than regenerating it. **So this PR changes `chat.el` only — `dist/soul.c` needs to be regenerated with your toolchain** for the change to reach the binary. **Pairs with:** neuron-ui PR `fix/chat-vision-attachments` (reads bytes → base64 + MIME, sends `image`/`image_media_type`, anti-confab guard). **Test plan (after `soul.c` regen + build):** 1. Attach a screenshot, ask "what's in this?" → description matches the real image. 2. Follow-up needing memory ("does this relate to what we discussed?") → uses image **and** memory. 3. No image → unchanged. Unreadable file type → app says so, never invents.
tim.lingo added 1 commit 2026-06-27 18:01:47 +00:00
feat: vision in the agentic chat path (image content block)
Neuron Soul CI / build (pull_request) Failing after 23m26s
f47c92a71a
handle_chat_agentic now reads body image + image_media_type and, when present, sends the current
user turn as an Anthropic content-block array [{text},{image}] instead of a plain string — so the
model sees raw pixels alongside memory, history, and tools (parity with the CLI). Additive: no image
=> output byte-identical to before. elc-clean. Pairs with neuron-ui fix/chat-vision-attachments.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
will.anderson merged commit 9f9f271e78 into main 2026-06-28 17:12:55 +00:00
Sign in to join this conversation.
No Reviewers
No labels
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: neuron-technologies/neuron#56