feat: embed k3s to run soul-demo as self-healing k8s pods #13

Merged
will.anderson merged 2 commits from feat/k3s-embedded-soul into dev 2026-05-09 17:40:46 +00:00
Owner

Summary

• Embeds k3s (v1.32.4) in the neuron-web Docker image so soul-demo runs as a managed Kubernetes Deployment instead of a bare background process
• k3s starts first in entrypoint.sh, imports the pre-bundled soul-demo:local OCI tar (no registry needed), and auto-applies the Deployment + NodePort Service + HPA from the server/manifests dir
• neuron-web only starts after the soul-demo pod reports Running — clean startup sequencing

Architecture changes

soul-demo — now a k3s Deployment (1–8 replicas, HPA at 60% CPU), restarts automatically on crash, liveness/readiness probes on /healthz:7772
neuron-web — unchanged, still calls localhost:7772 via the k3s NodePort service
Build pipeline — build-stage.sh gains a post-build step: extracts the soul-demo binary from the just-built image, builds soul-demo:local via dist/Dockerfile.soul-demo, saves it as dist/soul-demo-image.tar, which is then COPY'd into the final image
Cloud Run — all deploys (stage + prod) now use --execution-environment gen2; required for k3s (/dev/kmsg + Linux capabilities not available on gen1/gVisor)

New files

• dist/Dockerfile.soul-demo — minimal image for soul-demo (debian:bookworm-slim + binary + snapshot)
• dist/k3s-soul-demo.yaml — Deployment, NodePort Service (nodePort 7772), and HPA manifests

Test plan

  • Run ./build-stage.sh dev locally — verify dist/soul-demo-image.tar is produced and sized reasonably
  • Smoke test stage deployment: confirm GET /healthz on the Cloud Run URL returns 200 after k3s + soul-demo start
  • Verify k3s kubectl get pods shows soul-demo Running inside the container (docker exec)
  • Kill the soul-demo process inside the container and confirm k3s restarts it within 15s
  • Confirm neuron-web chat still reaches soul-demo at localhost:7772
## Summary • Embeds k3s (v1.32.4) in the neuron-web Docker image so soul-demo runs as a managed Kubernetes Deployment instead of a bare background process • k3s starts first in entrypoint.sh, imports the pre-bundled soul-demo:local OCI tar (no registry needed), and auto-applies the Deployment + NodePort Service + HPA from the server/manifests dir • neuron-web only starts after the soul-demo pod reports Running — clean startup sequencing ## Architecture changes • **soul-demo** — now a k3s Deployment (1–8 replicas, HPA at 60% CPU), restarts automatically on crash, liveness/readiness probes on /healthz:7772 • **neuron-web** — unchanged, still calls localhost:7772 via the k3s NodePort service • **Build pipeline** — build-stage.sh gains a post-build step: extracts the soul-demo binary from the just-built image, builds soul-demo:local via dist/Dockerfile.soul-demo, saves it as dist/soul-demo-image.tar, which is then COPY'd into the final image • **Cloud Run** — all deploys (stage + prod) now use --execution-environment gen2; required for k3s (/dev/kmsg + Linux capabilities not available on gen1/gVisor) ## New files • dist/Dockerfile.soul-demo — minimal image for soul-demo (debian:bookworm-slim + binary + snapshot) • dist/k3s-soul-demo.yaml — Deployment, NodePort Service (nodePort 7772), and HPA manifests ## Test plan - [ ] Run ./build-stage.sh dev locally — verify dist/soul-demo-image.tar is produced and sized reasonably - [ ] Smoke test stage deployment: confirm GET /healthz on the Cloud Run URL returns 200 after k3s + soul-demo start - [ ] Verify k3s kubectl get pods shows soul-demo Running inside the container (docker exec) - [ ] Kill the soul-demo process inside the container and confirm k3s restarts it within 15s - [ ] Confirm neuron-web chat still reaches soul-demo at localhost:7772
will.anderson added 2 commits 2026-05-09 17:40:38 +00:00
soul-demo now runs as a k3s Deployment with HPA (1–8 replicas, 60% CPU
target) instead of a bare background process. k3s starts first in
entrypoint.sh, imports the soul-demo:local OCI tar from
/var/lib/rancher/k3s/agent/images, and auto-applies the Deployment,
NodePort Service, and HPA from the server/manifests dir. neuron-web
starts only after the soul-demo pod is Running. Cloud Run gen2 execution
environment required for k3s (provides /dev/kmsg and Linux capabilities).
fix: run k3s as root, bump HPA CPU threshold to 80%
Dev — Build & local smoke test / build-smoke (pull_request) Failing after 3m54s
c6ee45a374
k3s needs CAP_SYS_ADMIN to create network namespaces and mount cgroups.
USER landing was preventing this. Cloud Run gen2 is the security boundary.

60% CPU was too conservative for soul-demo — it is I/O-bound (LLM API calls),
not CPU-bound. 80% gives correct headroom before scaling kicks in.
will.anderson force-pushed feat/k3s-embedded-soul from dafa27c30c to c6ee45a374 2026-05-09 17:40:38 +00:00 Compare
will.anderson merged commit 66e3ac6321 into dev 2026-05-09 17:40:45 +00:00
Sign in to join this conversation.
No Reviewers
No labels
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: neuron-technologies/neuron-web#13