Files
infrastructure/docs/vpn-comparison.md
T
Will Anderson 6ebf8a9b3e Fix gitea_domain variable and add VPN/Headscale infrastructure
- Fix gitea_domain default from git.neuralplatform.dev → git.neuralplatform.ai
  so Traefik serves the correct Let's Encrypt cert when accessed via VPN/LAN
- Add docs/vpn-comparison.md: full evaluation of WireGuard/Tailscale/Headscale/Netbird/Netmaker
2026-03-25 08:44:20 -05:00

14 KiB
Raw Blame History

Private VPN Options — Comparison for Legion Homelab

Date: 2026-03-24 Context: Legion (Ubuntu 24.04, k3s) at dynamic home IP (ddclient-managed), Mac workstation as primary client, Cloudflare Tunnel for all public services, AdGuard Home for DNS. Goal: secure remote access to internal k3s services without going through Cloudflare.


Options at a Glance

Criterion WireGuard (raw) Tailscale Headscale Netbird Netmaker
k3s/k8s native Manual DaemonSet or host install DaemonSet or sidecar Helm chart exists Helm chart exists Helm chart exists
Dynamic IP support Needs re-resolution (use hostname) Yes — NAT traversal handles it Yes Yes Partial — needs STUN config
NAT traversal No — requires inbound port Yes (STUN/TURN built in) Yes (same as Tailscale) Yes (STUN/TURN built in) Yes (STUN/TURN, less polished)
No open inbound ports No — needs UDP port Yes Yes Yes Partial (STUN helps; relay fallback)
iOS/Android client WireGuard app Native apps WireGuard app (any client) Native apps WireGuard app
Mesh / multi-peer Manual config per peer Automatic Automatic Automatic Automatic
Self-hosted control plane N/A (serverless) No (SaaS, closed) Yes — full replacement Yes — full stack Yes — full stack
AdGuard DNS integration Manual (push routes + DNS) Magic DNS, or manual override Split DNS config possible Split DNS / custom DNS DNS override possible
Resource overhead on Legion Very low (~2MB RAM) ~2030MB (tailscaled) ~3050MB (control plane) ~3050MB (management plane) ~50100MB (server + UI)
License Kernel (GPLv2) Proprietary SaaS (free tier) BSD-3 Apache 2.0 SSPL (server) / Apache (client)
Cost Free Free up to 100 devices Free (self-hosted) Free (self-hosted) Free (self-hosted)

Detailed Evaluation

1. WireGuard (raw)

Pros:

  • Built into Linux 5.6+ kernel — zero extra packages on Legion, near-zero overhead
  • Fastest throughput of any option; minimal latency
  • No coordination server, no external dependency, full privacy
  • WireGuard app available on iOS, Android, macOS, Windows

Cons:

  • Dynamic IP is a gotcha: WireGuard peers use static IP:port references. If Legion's home IP changes, remote peers need a DNS name (ddns.example.com) in their config and WireGuard must re-resolve. The PostUp hook or a cron job calling wg set to refresh the endpoint works, but it's manual and fragile.
  • No NAT traversal. Both sides need to be reachable, OR one side needs an open inbound UDP port (typically 51820). Since Legion has no open inbound ports, remote clients behind NAT will fail to initiate unless Legion proactively keeps-alive to them (requires knowing all clients in advance).
  • Adding a second trusted user means manual key exchange and config edits on both sides
  • No web UI, no ACLs, all config is flat files

k3s notes: Can run as a host network DaemonSet or just install WireGuard directly on the Legion node. The k3s pod CIDR (10.42.x.x) and service CIDR (10.43.x.x) would need to be routed through the WireGuard interface — doable but requires explicit AllowedIPs config and IP forwarding.

Verdict: Best for fixed IPs or when you're happy to open one UDP port. Not great for this setup's dynamic IP + no-inbound-port requirement.


2. Tailscale (managed SaaS)

Pros:

  • Easiest setup of all options — curl | sh on Legion, download app on Mac/iPhone
  • Excellent NAT traversal via STUN/DERP relay network — works with no open ports and no fixed IP
  • Dynamic IP is completely irrelevant — Tailscale handles endpoint discovery
  • Native iOS/Android/macOS apps with split-tunneling
  • Subnet routing: run tailscale up --advertise-routes=10.42.0.0/16,10.43.0.0/16 on Legion and all k3s pods/services become reachable from any Tailscale peer
  • Taildrop, MagicDNS, and ACLs all polished
  • Free tier: 100 devices, 3 users — sufficient for homelab + 1-2 trusted people
  • Very active development, excellent documentation

Cons:

  • Control plane is Tailscale's SaaS — your peer list, public keys, and ACLs live on their servers. Peer-to-peer traffic is end-to-end encrypted and never touches their servers, but the coordination layer is not self-hosted
  • If Tailscale's coordination servers have an outage or the company goes away, VPN stops working (until you switch)
  • Privacy-sensitive: Tailscale knows which devices are in your network and their IPs, even if they can't see traffic content
  • Subnet router on Legion means that pod gets elevated privileges in k3s (host network access)

k3s notes: Run tailscaled on the Legion host (not in a pod). Use --advertise-routes to expose k3s pod and service CIDRs. Alternatively, there is a Tailscale operator for Kubernetes that can expose individual services without advertising the whole CIDR.

Verdict: Best developer experience by a large margin. Works perfectly with this setup. The only real downside is the SaaS control plane. Appropriate if you trust Tailscale as a company and accept that trade-off.


3. Headscale (self-hosted Tailscale control plane)

Pros:

  • Drop-in replacement for Tailscale's coordination server — all Tailscale clients work unchanged
  • Fully self-hosted: your keys and peer registry never leave your network
  • Helm chart available, deploys cleanly in k3s
  • Retains Tailscale's NAT traversal and STUN/DERP infrastructure (DERP relays are still Tailscale's unless you self-host those too)
  • ACLs, user management, pre-auth keys — all Tailscale features minus some newer ones (Taildrop, MagicDNS not fully supported)
  • SQLite backend — tiny footprint

Cons:

  • Headscale itself is reachable via a public HTTPS endpoint (needed for clients to bootstrap). In this setup, expose it through Cloudflare Tunnel or add a separate ingress — adds a step.
  • Lags behind Tailscale client features by a few versions
  • If you use Tailscale's DERP relay network for fallback (when direct connection fails), that traffic still touches Tailscale's infrastructure. Self-hosting DERP servers is possible but extra work.
  • More setup than raw Tailscale; less documentation than Tailscale SaaS

k3s notes: Headscale runs as a Deployment in k3s, exposes an HTTPS service, and stores state in a PVC (SQLite). Use existing Cloudflare Tunnel to expose headscale.yourdomain.com — this is the coordination URL you give to Tailscale clients. Then run tailscaled on Legion host as a subnet router, same as option 2.

Verdict: Best balance of control and usability. Keeps all the Tailscale client goodness, eliminates the SaaS dependency for the control plane. The Cloudflare Tunnel coexistence is actually an asset here — headscale can be exposed through the existing tunnel with no new open ports.


4. Netbird

Pros:

  • Fully open-source (Apache 2.0), self-hostable management plane + STUN/TURN/DERP equivalent (Signal + Relay servers)
  • WireGuard-based, similar architecture to Tailscale
  • Native iOS/Android/macOS apps
  • Helm chart available for k3s deployment of the management plane
  • Fine-grained ACL policies via web UI
  • No Tailscale dependency whatsoever — entirely independent stack
  • Free self-hosted tier is fully featured

Cons:

  • Management plane has more moving parts than Headscale: management server + dashboard + signal server + relay (coturn or built-in) — 3-4 containers vs. 1 for Headscale
  • Smaller community and ecosystem than Tailscale/Headscale
  • Docs are good but not as polished; some rough edges in k8s setup
  • DNS integration requires manual configuration — no equivalent of MagicDNS

k3s notes: Netbird management stack deploys via Helm into k3s. Exposes an API endpoint that clients register against. Relay/STUN server also needs to be reachable — expose via Cloudflare Tunnel or NodePort. Netbird client runs on Legion host and Mac.

Verdict: Solid option if you want zero dependency on Tailscale infrastructure (including DERP relays). More work than Headscale to stand up initially, but cleaner long-term ownership. Good choice if privacy is top priority.


5. Netmaker

Pros:

  • Mature WireGuard mesh orchestration, self-hosted
  • Web UI for managing nodes and networks
  • Supports egress gateways (route cluster traffic through a node)
  • Commercial company with a free Community Edition

Cons:

  • Community Edition is SSPL licensed — not truly open source (cannot be used as a SaaS by others, but fine for personal use)
  • More resource-heavy than alternatives: server + UI + MQ broker (MQTT/Mosquitto) — ~150-200MB total
  • STUN/NAT traversal exists but is less reliable than Tailscale/Netbird's implementation in practice (community reports)
  • Active development but smaller homelab community vs. Tailscale/Headscale
  • Licensing changed (previously Apache) which caused community friction
  • DNS integration is basic

k3s notes: Docker Compose primary deployment method; k8s manifests exist but are not as well-maintained as Netbird's Helm charts. Extra friction in a k3s environment.

Verdict: Viable, but outclassed by Netbird (better k8s support, cleaner license) and Headscale (simpler, lighter). Hard to recommend over either.


Dynamic IP Gotchas (All Options)

Option Dynamic IP Behavior
WireGuard raw Peers cache the resolved IP at connect time. If home IP changes mid-session, tunnel drops until peer re-resolves DNS and reconnects. Needs wg set or wg-quick down/up to pick up new IP.
Tailscale Transparent — STUN/DERP handles endpoint discovery continuously. IP change causes a brief reconnect (~seconds), invisible to applications.
Headscale Same as Tailscale — Tailscale clients handle it.
Netbird Same mechanism as Tailscale (STUN + relay fallback). IP change handled transparently.
Netmaker STUN helps, but relay path may not reconnect cleanly on IP change in all configurations.

Cloudflare Tunnel Coexistence

All options coexist cleanly with Cloudflare Tunnel:

  • Cloudflare Tunnel handles public-facing services (Gitea web UI, Argo CD, Vault, etc.)
  • VPN handles private access (k3s dashboard, raw Postgres port, internal APIs, Grafana without SSO overhead)
  • No conflict — they use different network paths
  • For Headscale/Netbird: the management plane endpoint needs to be reachable by clients. Easiest path: expose it through the existing Cloudflare Tunnel (e.g., headscale.neuralplatform.ai). No new open ports needed.

AdGuard DNS Integration

Option DNS Integration
WireGuard raw Push DNS = 192.168.68.77 in client config. Works, but AdGuard only handles requests when the tunnel is up.
Tailscale MagicDNS or override DNS to point to the subnet-routed AdGuard IP. Use --accept-dns=false on Legion itself to avoid loop.
Headscale Configure dns_config.nameservers in headscale config to push AdGuard IP to clients. Clean.
Netbird Manual DNS configuration per network. Point to 192.168.68.77 or the AdGuard ClusterIP.
Netmaker DNS override in network settings, less reliable.

Recommendation

Use Headscale.

Why Headscale wins for this setup

  1. Tailscale clients (iOS, Android, macOS, Linux) are mature, polished, and widely used. You get the best client experience without the SaaS control plane.
  2. NAT traversal is reliable — dynamic home IP is a non-issue. No open inbound ports required.
  3. Runs in k3s as a single-container Deployment with a SQLite PVC. Low resource usage (~3050MB).
  4. Cloudflare Tunnel synergy — expose headscale.neuralplatform.ai through the existing tunnel. No new infrastructure.
  5. Adding trusted users is a headscale users create command and a pre-auth key. Clean ACLs if needed later.
  6. Privacy: your device registry and keys stay on your server.

What to accept

  • Tailscale's DERP relay servers are used as fallback when direct WireGuard connections can't be established. Traffic through DERP is encrypted end-to-end, but it touches Tailscale's network. If that's unacceptable, run your own DERP server (one binary, minimal overhead) or use Netbird instead.
  • Headscale occasionally lags 1-2 versions behind Tailscale client features. Newer Tailscale features (Taildrop, some MagicDNS features) may not work.

If you want zero Tailscale dependency

Choose Netbird instead. Same reliability, same NAT traversal quality, Apache 2.0 license, fully self-contained stack. Trade-off: slightly more setup (3-4 containers vs. 1) and no MagicDNS equivalent.

Avoid for this setup

  • WireGuard raw: dynamic IP + no inbound ports makes this unnecessarily painful
  • Tailscale SaaS: fine functionally, but control plane privacy is worse than Headscale for the same client experience
  • Netmaker: heavier, worse k8s support, SSPL license edge case, less reliable NAT traversal

Suggested Deployment Architecture (Headscale)

Cloudflare Tunnel
  └─► headscale.neuralplatform.ai ──► Headscale pod (k3s, dns ns or new vpn ns)
                                          └─► SQLite PVC

Legion host (tailscaled, subnet router)
  ├─ advertises 10.42.0.0/16 (k3s pod CIDR)
  └─ advertises 10.43.0.0/16 (k3s service CIDR)

Mac (Tailscale client, custom login server = headscale.neuralplatform.ai)
  └─ accesses k3s dashboard, Postgres, Grafana, internal APIs directly

Trusted user (Tailscale client, same server)
  └─ access controlled via Headscale ACL policy

No firewall changes required. Cloudflare Tunnel carries the Headscale coordination traffic only (tiny). All data plane traffic is direct WireGuard between peers, or DERP-relayed if direct fails.