fix: vision in agentic chat path (image content block) #56
Reference in New Issue
Block a user
Delete Branch "fix/chat-vision-attachments"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Vision in the agentic chat path (image content block)
Why: In the desktop app, attaching a screenshot and asking "what's in this?" returned a confabulated description of a different image — the cardinal rule, broken in-product. Root cause: the app sent attachments to the soul as text path markers (
[Attached file: <path>]) with no pixels, and the only vision endpoint (/api/see) is a one-shot with no memory/tools that the app never called anyway.What this changes (soul side):
handle_chat_agenticnow readsimage(base64) +image_media_typefrom the request body. When present, the current user turn'scontentis built as an Anthropic content-block array[{type:text}, {type:image, source:{base64,…}}]instead of a plain string — so the model sees raw pixels with memory, history, and tools (parity with the CLI).Safe / additive: when
imageis empty, output is byte-identical to before — zero regression risk for text turns.elccompiles clean; a fullelbcodegen of the tree regenerateddist/chat.cwith the change cleanly.⚠️ Build note for Will (the reason this is a PR, not a direct build): the binary links from the committed inlined
dist/soul.c, which my local toolchain can't regenerate (localelcemits only the ~40 KB non-inlined unit;elbhas no amalgamate flag; the matched Artifact-Registry toolchain is gated behind the CI service account I don't have). CI itself restores the committedsoul.crather than regenerating it. So this PR changeschat.elonly —dist/soul.cneeds to be regenerated with your toolchain for the change to reach the binary.Pairs with: neuron-ui PR
fix/chat-vision-attachments(reads bytes → base64 + MIME, sendsimage/image_media_type, anti-confab guard).Test plan (after
soul.cregen + build):handle_chat_agentic now reads body image + image_media_type and, when present, sends the current user turn as an Anthropic content-block array [{text},{image}] instead of a plain string — so the model sees raw pixels alongside memory, history, and tools (parity with the CLI). Additive: no image => output byte-identical to before. elc-clean. Pairs with neuron-ui fix/chat-vision-attachments. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>