fix(reliability): llm-retry #43

2026-06-22T17:02:42Z

will.anderson commented

2026-06-22 17:02:42 +00:00

Reliability fixes for the LLM request path.

Fixed (EL layer):

Issue #5: empty LLM response detected as error in all handle_chat paths
Issue #9: agentic_loop max_tokens configurable via NEURON_LLM_MAX_TOKENS (default 4096)
Issue #10: connector_tools_json and tool_auto_approved curl timeout raised from 2s to 5s

Documented as TODO (require C runtime changes in el_runtime.c):

Issue #1: retry on timeout/connection errors
Issue #2: separate EL_LLM_TIMEOUT_MS from EL_HTTP_TIMEOUT_MS
Issue #3: HTTP 429 backoff
Issue #4: HTTP 5xx retry
Issue #6: secondary LLM provider (set NEURON_LLM_1_URL/KEY/FORMAT)
Issue #8: EL_HTTP_MAX_RESPONSE_BYTES cap

Automated reliability audit.

Reliability fixes for the LLM request path. Fixed (EL layer): - Issue #5: empty LLM response detected as error in all handle_chat paths - Issue #9: agentic_loop max_tokens configurable via NEURON_LLM_MAX_TOKENS (default 4096) - Issue #10: connector_tools_json and tool_auto_approved curl timeout raised from 2s to 5s Documented as TODO (require C runtime changes in el_runtime.c): - Issue #1: retry on timeout/connection errors - Issue #2: separate EL_LLM_TIMEOUT_MS from EL_HTTP_TIMEOUT_MS - Issue #3: HTTP 429 backoff - Issue #4: HTTP 5xx retry - Issue #6: secondary LLM provider (set NEURON_LLM_1_URL/KEY/FORMAT) - Issue #8: EL_HTTP_MAX_RESPONSE_BYTES cap Automated reliability audit.

will.anderson added 2 commits 2026-06-22 17:02:53 +00:00

fix(reliability): safety-resilience — bell augmentation, safe mode, dedup logging, tab escaping, handle_chat coverage deddb9a18e

fix(reliability): llm-retry — empty response detection, configurable max_tokens, connector timeout

Neuron Soul CI / build (pull_request) Failing after 11m16s

Details

47d0e6f985

Issue #5: detect empty string from llm_extract_text() as an error in handle_chat,
handle_chat_as_soul, and handle_dharma_room_turn. The C runtime silently returns ""
when the LLM response content array is missing or all blocks fail to parse; without
this guard the empty string passes through to callers as a silent empty reply.

Issue #9: make agentic_loop max_tokens configurable via NEURON_LLM_MAX_TOKENS env
var (default 4096). The hardcoded value is marginal for long tool chains (8 iterations
x 4096 tokens); operators can now set 8192+ for complex multi-step tasks without
rebuilding. Non-agentic path (llm_call_system) still uses the C runtime hardcode —
that fix lives in el_runtime.c (see TODO block added in this commit).

Issue #10: increase connector_tools_json and tool_auto_approved curl --max-time from
2s to 5s to reduce false-empty tool lists when neuron-connectd is under transient
load. Graceful degradation to [] on bridge down is unchanged.

Issues #1/#2/#3/#4/#6/#8: documented as TODO comments in chat.el. These require
targeted C runtime changes in el_runtime.c (llm_provider_request retry loop,
EL_LLM_TIMEOUT_MS separation, HTTP 429 backoff, 5xx retry, EL_HTTP_MAX_RESPONSE_BYTES
cap). Architectural decisions recorded so they are traceable to root causes.

will.anderson closed this pull request

2026-06-22 17:55:38 +00:00

Pull request closed

Please reopen this pull request to perform a merge.

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: neuron-technologies/neuron#43