fix(reliability): llm-retry #43

Closed
will.anderson wants to merge 0 commits from improve/reliability-llm-retry into main
Owner

Reliability fixes for the LLM request path.

Fixed (EL layer):

  • Issue #5: empty LLM response detected as error in all handle_chat paths
  • Issue #9: agentic_loop max_tokens configurable via NEURON_LLM_MAX_TOKENS (default 4096)
  • Issue #10: connector_tools_json and tool_auto_approved curl timeout raised from 2s to 5s

Documented as TODO (require C runtime changes in el_runtime.c):

  • Issue #1: retry on timeout/connection errors
  • Issue #2: separate EL_LLM_TIMEOUT_MS from EL_HTTP_TIMEOUT_MS
  • Issue #3: HTTP 429 backoff
  • Issue #4: HTTP 5xx retry
  • Issue #6: secondary LLM provider (set NEURON_LLM_1_URL/KEY/FORMAT)
  • Issue #8: EL_HTTP_MAX_RESPONSE_BYTES cap

Automated reliability audit.

Reliability fixes for the LLM request path. Fixed (EL layer): - Issue #5: empty LLM response detected as error in all handle_chat paths - Issue #9: agentic_loop max_tokens configurable via NEURON_LLM_MAX_TOKENS (default 4096) - Issue #10: connector_tools_json and tool_auto_approved curl timeout raised from 2s to 5s Documented as TODO (require C runtime changes in el_runtime.c): - Issue #1: retry on timeout/connection errors - Issue #2: separate EL_LLM_TIMEOUT_MS from EL_HTTP_TIMEOUT_MS - Issue #3: HTTP 429 backoff - Issue #4: HTTP 5xx retry - Issue #6: secondary LLM provider (set NEURON_LLM_1_URL/KEY/FORMAT) - Issue #8: EL_HTTP_MAX_RESPONSE_BYTES cap Automated reliability audit.
will.anderson added 2 commits 2026-06-22 17:02:53 +00:00
Issue #5: detect empty string from llm_extract_text() as an error in handle_chat,
handle_chat_as_soul, and handle_dharma_room_turn. The C runtime silently returns ""
when the LLM response content array is missing or all blocks fail to parse; without
this guard the empty string passes through to callers as a silent empty reply.

Issue #9: make agentic_loop max_tokens configurable via NEURON_LLM_MAX_TOKENS env
var (default 4096). The hardcoded value is marginal for long tool chains (8 iterations
x 4096 tokens); operators can now set 8192+ for complex multi-step tasks without
rebuilding. Non-agentic path (llm_call_system) still uses the C runtime hardcode —
that fix lives in el_runtime.c (see TODO block added in this commit).

Issue #10: increase connector_tools_json and tool_auto_approved curl --max-time from
2s to 5s to reduce false-empty tool lists when neuron-connectd is under transient
load. Graceful degradation to [] on bridge down is unchanged.

Issues #1/#2/#3/#4/#6/#8: documented as TODO comments in chat.el. These require
targeted C runtime changes in el_runtime.c (llm_provider_request retry loop,
EL_LLM_TIMEOUT_MS separation, HTTP 429 backoff, 5xx retry, EL_HTTP_MAX_RESPONSE_BYTES
cap). Architectural decisions recorded so they are traceable to root causes.
will.anderson closed this pull request 2026-06-22 17:55:38 +00:00

Pull request closed

Please reopen this pull request to perform a merge.
Sign in to join this conversation.
No Reviewers
No labels
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: neuron-technologies/neuron#43