Commit Graph

471 Commits

Author SHA1 Message Date
Will Anderson bb583e3ccb Fix HCL syntax errors in accounts and api Cloud Run definitions 2026-04-25 22:54:18 -05:00
Will Anderson d4c65d5857 Expand GCP infra: accounts + API services, Cloud SQL, Artifact Registry
Architecture: intelligence stays on Legion; only compiled artifacts cross
to GCP. Source code and Neuron's knowledge base never leave the system.

Artifact Registry:
- neuron-marketing, neuron-accounts, neuron-api repos in us-central1
- Keep-last-10 cleanup policy; ci-pusher SA with writer access
- Legion CI runners authenticate via GCP_SA_KEY Gitea secret

Cloud SQL (cloud-sql.tf):
- postgres-15 on db-g1-small, us-central1 (scale up to REGIONAL HA at 1k users)
- Point-in-time recovery, 14-day backup retention
- Accounts DB + user; password generated and stored in Secret Manager
- JWT signing key in Secret Manager (shared by accounts + api)
- Cloud Run connects via built-in Auth Proxy (Unix socket volume mount)

Accounts Cloud Run (cloud-run-accounts.tf):
- 3 regions (us-central1, europe-west1, asia-northeast1), min:1 max:50
- Cloud SQL proxy volume mount; secrets via Secret Manager
- Stripe + JWT env vars; health probe on /health

API Cloud Run (cloud-run-api.tf):
- 3 regions, min:1 max:100, cpu_idle=false (always-hot)
- Validates JWTs from accounts service; no direct DB connection
- License admin token from Secret Manager

Load balancer (host-based routing):
- Same global anycast IP for all three services
- URL map routes by Host: neurontechnologies.ai→marketing,
  api.neurontechnologies.ai→api, accounts.neurontechnologies.ai→accounts
- New managed SSL certs for api.* and accounts.* added to HTTPS proxy
- Cloud Armor (WAF + rate limit) applied to all backends

Service accounts + IAM:
- neuron-accounts-sa: secretmanager.secretAccessor + cloudsql.client
- neuron-api-sa: secretmanager.secretAccessor
- allUsers invoker on all prod Cloud Run services (LB health checks)

bootstrap.sh:
- One-shot setup: pulls Stripe secrets from Vault → Secret Manager,
  creates CI SA JSON key, prints DNS + next-step instructions
2026-04-25 22:54:18 -05:00
Will Anderson 93358505fc Harden prod: security, autoscaling, observability, BuildKit CI
Security:
- Drop ALL capabilities, enforce non-root, RuntimeDefault seccomp on
  neuron-mcp, neuron-rest, neuron-marketing pods
- Add startup probes (150s window for JVM) so liveness doesn't fire early
- Replace docker-sock hostPath with BuildKit rootless TCP endpoint
  (moby/buildkit:v0.19.0-rootless) — removes node root access from CI
- Document full ESO AppRole migration path in cluster-secret-store.yaml

Autoscaling & availability:
- HPAs on mcp (1–6), rest (1–4), marketing (2–8) at 65–70% CPU
- PodDisruptionBudgets (minAvailable: 1) on all three services
- NetworkPolicy: default-deny-all in neuron-prod, explicit allow rules
  for Traefik ingress, intra-namespace, and egress to DNS/platform/vault

Observability:
- ServiceMonitors for mcp, rest, marketing (cross-namespace enabled in
  kube-prometheus-stack with serviceMonitorSelectorNilUsesHelmValues:false)
- PrometheusRules: high error rate, high latency, crash loops, replica
  shortage, Postgres down/connections, backup failure, backup staleness

Chart version pinning:
- kube-prometheus-stack, loki, tempo, redis, alloy, postgres — all pinned
  to major-version ranges to block silent breaking upgrades

Backup hardening:
- restic:latest → restic:0.17.3 (deterministic image)
- Weekly backup-verify CronJob: restores latest snapshot and validates
  SQL dump structure (≥5 CREATE TABLE, pg_dump header check)

ArgoCD:
- neuron-prod AppProject: scopes deploys to neuron-prod + platform ns,
  blacklists ClusterRole/ClusterRoleBinding/Namespace creation,
  automated sync window 2–6am UTC, manual always allowed
2026-04-25 22:54:18 -05:00
Will Anderson 8fd3d12907 simplify neuron self-improve loop to blue/green + stage
Replace the aspirational alpha/beta/gamma model with the actual
deployment topology: prod runs blue/green in neuron-prod namespace,
stage is the single experiment slot in neuron-stage namespace.

The old script referenced neuron-alpha/beta/gamma deployments that
never existed. The new script uses blue-green-deploy.sh for prod
promotion and kubectl set image for stage experiments.

Loop: snapshot → deploy stage → evaluate → promote via blue/green.
2026-04-25 22:54:18 -05:00
will.anderson f3ed83cdd0 Wire GCS backup to neuron-db-backup-prod (neuron-494301) 2026-04-25 22:52:17 +00:00
Will Anderson 7eeff54a11 Wire GCS backup to neuron-db-backup-prod bucket (neuron-494301)
Bucket created, SA key stored in Vault at secret/gcs.
CronJob ExternalSecret updated to pull from secret/gcs.
Hourly restic backup now runs to both R2 and GCS.
2026-04-25 17:51:57 -05:00
Will Anderson 8d97bbd802 Merge branch 'main' of https://git.neuralplatform.ai/will/infrastructure 2026-04-25 17:50:32 -05:00
will.anderson 67aed61cfb Scale Docuseal up to 1 replica 2026-04-25 20:59:34 +00:00
Will Anderson 2d0ce77518 Scale Docuseal up to 1 replica 2026-04-25 15:59:03 -05:00
Will Anderson a37deca724 Add GCS backup bucket + dual-destination hourly backup (R2 + GCS)
Provision Google Cloud Storage bucket for neuron prod DB backups via Terraform.
Create dedicated backup service account with objectAdmin on the bucket.
Update neuron-prod backup CronJob to run restic against both R2 and GCS hourly —
R2 as primary, GCS as secondary, independent credentials and repositories.
2026-04-25 15:23:51 -05:00
Neuron CI 8de866a8b9 ci(neuron-prod): update rest+license to v0.15.3 2026-04-25 20:06:27 +00:00
Neuron CI df85598df2 ci(neuron-prod): blue-green flip to blue@v0.15.3 (registry.neuralplatform.ai/neuron-technologies/neuron-mcp:v0.15.3) 2026-04-25 20:06:23 +00:00
Neuron CI 950f00586f ci(neuron-prod): update rest+license to v0.15.2 2026-04-25 19:53:53 +00:00
Neuron CI 35e346d94c ci(neuron-prod): blue-green flip to green@v0.15.2 (registry.neuralplatform.ai/neuron-technologies/neuron-mcp:v0.15.2) 2026-04-25 19:53:50 +00:00
Neuron CI 8e5320c7b3 ci(neuron-prod): update rest+license to v0.15.1 2026-04-25 19:39:31 +00:00
Neuron CI 9b057411db ci(neuron-prod): blue-green flip to blue@v0.15.1 (registry.neuralplatform.ai/neuron-technologies/neuron-mcp:v0.15.1) 2026-04-25 19:39:29 +00:00
Neuron CI 14f0663e28 ci(neuron-prod): update rest+license to v0.15.0 2026-04-25 19:20:40 +00:00
Neuron CI 8b56eb5290 ci(neuron-prod): blue-green flip to green@v0.15.0 (registry.neuralplatform.ai/neuron-technologies/neuron-mcp:v0.15.0) 2026-04-25 19:20:35 +00:00
Neuron CI 39050bfde4 ci(neuron-prod): update rest+license to v0.14.12 2026-04-25 10:18:30 +00:00
Neuron CI 6b782f04e9 ci(neuron-prod): blue-green flip to blue@v0.14.12 (registry.neuralplatform.ai/neuron-technologies/neuron-mcp:v0.14.12) 2026-04-25 10:18:28 +00:00
Neuron CI 5c9ea05631 ci(neuron-prod): update rest+license to v0.14.11 2026-04-25 10:08:15 +00:00
Neuron CI acf9f07dbd ci(neuron-prod): blue-green flip to green@v0.14.11 (registry.neuralplatform.ai/neuron-technologies/neuron-mcp:v0.14.11) 2026-04-25 10:08:12 +00:00
Neuron CI 491d00fd1a ci(neuron-prod): update rest+license to v0.14.10 2026-04-25 10:02:23 +00:00
Neuron CI 663abd5188 ci(neuron-prod): blue-green flip to blue@v0.14.10 (registry.neuralplatform.ai/neuron-technologies/neuron-mcp:v0.14.10) 2026-04-25 10:02:20 +00:00
Neuron CI bdd2033d78 ci(neuron-prod): update rest+license to v0.14.9 2026-04-25 09:47:48 +00:00
Neuron CI fed551185e ci(neuron-prod): blue-green flip to green@v0.14.9 (registry.neuralplatform.ai/neuron-technologies/neuron-mcp:v0.14.9) 2026-04-25 09:47:46 +00:00
Neuron CI 2793799532 ci(neuron-prod): update rest+license to v0.14.8 2026-04-25 09:14:24 +00:00
Neuron CI 8d02a1164b ci(neuron-prod): blue-green flip to blue@v0.14.8 (registry.neuralplatform.ai/neuron-technologies/neuron-mcp:v0.14.8) 2026-04-25 09:14:22 +00:00
Neuron CI 1ee6dc2804 ci(neuron-prod): update rest+license to v0.14.7 2026-04-25 09:08:20 +00:00
Neuron CI 302f37ad29 ci(neuron-prod): blue-green flip to green@v0.14.7 (registry.neuralplatform.ai/neuron-technologies/neuron-mcp:v0.14.7) 2026-04-25 09:08:17 +00:00
Neuron CI 8d0faae070 ci(neuron-prod): update rest+license to v0.14.6 2026-04-25 09:03:36 +00:00
Neuron CI 12da58e716 ci(neuron-prod): blue-green flip to blue@v0.14.6 (registry.neuralplatform.ai/neuron-technologies/neuron-mcp:v0.14.6) 2026-04-25 09:03:32 +00:00
Neuron CI 1edcbeaed3 ci(neuron-prod): update rest+license to v0.14.5 2026-04-25 08:50:36 +00:00
Neuron CI 244ba40c20 ci(neuron-prod): blue-green flip to green@v0.14.5 (registry.neuralplatform.ai/neuron-technologies/neuron-mcp:v0.14.5) 2026-04-25 08:50:35 +00:00
Will Anderson 46fd91b1fe ops(neuron-prod): rollback blue to v0.14.3 — v0.14.4 crashes on empty updated_at (v0.14.5 fix in build) 2026-04-25 03:47:17 -05:00
Will Anderson 47bcef9bc2 ops(neuron-prod): restore blue service while v0.14.5 builds (green crashed on empty updated_at) 2026-04-25 03:45:09 -05:00
Will Anderson 3185cb5062 ci(neuron-prod): deploy v0.14.4 to blue (DB schema migrations + session persistence fixes) 2026-04-25 03:38:57 -05:00
Neuron CI a0961004e7 ci(neuron-prod): update rest+license to v0.14.4 2026-04-25 08:36:19 +00:00
Neuron CI d0b8a812f0 ci(neuron-prod): blue-green flip to green@v0.14.4 (registry.neuralplatform.ai/neuron-technologies/neuron-mcp:v0.14.4) 2026-04-25 08:36:18 +00:00
Will Anderson e201fa7dbb ops(neuron-prod): restart blue pod to recover new MCP session after compact 2026-04-25 03:23:09 -05:00
Will Anderson 24daf99f9c ops(neuron-prod): disable OTLP metrics push — no metrics pipeline in Alloy
Alloy's otelcol.receiver.otlp only has traces→Tempo and logs→Loki pipelines.
No metrics output is configured, so /v1/metrics returns 404, flooding MCP
server logs every minute. Disable Micrometer OTLP push; Prometheus scrapes
metrics from the actuator endpoint instead.
2026-04-25 03:20:48 -05:00
Neuron CI cfa38013c6 ci(neuron-prod): update rest+license to v0.14.3 2026-04-25 08:20:20 +00:00
Neuron CI d8919f869b ci(neuron-prod): blue-green flip to blue@v0.14.3 (registry.neuralplatform.ai/neuron-technologies/neuron-mcp:v0.14.3) 2026-04-25 08:20:19 +00:00
Will Anderson ab02716d82 ops(neuron-prod): restart green MCP pod to recover session after blue-green flip 2026-04-25 03:09:47 -05:00
Neuron CI 7cbcb9e8c2 ci(neuron-prod): update rest+license to v0.14.2 2026-04-25 07:59:18 +00:00
Neuron CI 7931d54e40 ci(neuron-prod): blue-green flip to green@v0.14.2 (registry.neuralplatform.ai/neuron-technologies/neuron-mcp:v0.14.2) 2026-04-25 07:59:17 +00:00
Neuron CI ba716640b4 ci(marketing): deploy marketing@1e94e8ae 2026-04-25 07:46:30 +00:00
Will Anderson ff71e53c11 point MCP STRIPE_WEBHOOK_SECRET to marketplace webhook 2026-04-25 02:05:40 -05:00
Will Anderson 8b7dea5b50 switch neuron-prod Stripe secrets to live marketing keys 2026-04-25 02:01:41 -05:00
Will Anderson 9b41d2302c feat(secrets): add STRIPE_WEBHOOK_SECRET to neuron-prod ExternalSecret 2026-04-25 01:50:51 -05:00