Skip to main content
ZSoftly logo
Private Cloud

Running Your VPN Gateway Inside the Thing It Protects Is a Bad Idea

Staff at ZSoftly
6 min read
Proxmox
Tailscale
Local DC
Private Cloud
Infrastructure
Networking
VPN
Self-Hosted
Share:
Tailscale subnet router migrated from LXC container on Proxmox to a dedicated physical ops node

Our Tailscale subnet router lived on Proxmox. As an LXC container. On the same host it was supposed to give us remote access to.

It worked fine. Until we needed to reboot the host. Then we lost VPN access to the machine we were trying to reboot.


Note: The specific values below (subnets, IPs, hostnames, container IDs, and internal tooling) reflect our own local DC environment. Substitute your own values wherever you see <PLACEHOLDER> style references or obvious environment-specific names.


TL;DR

We ran a Tailscale subnet router as an LXC container on our Proxmox host to get remote access to our management network (<YOUR_MGMT_SUBNET> in our case). It was a quick fix. Any Proxmox issue took down VPN access simultaneously.

We replaced it with a dedicated physical ops node running Tailscale natively as a systemd service. VPN and Proxmox are now independent.


How We Got Here

When we first stood up the local DC, we needed a way to reach the management network remotely. The fastest path was an LXC container on our existing Proxmox host with Tailscale installed, advertising the subnet.

It took about twenty minutes to spin up. It worked immediately. We moved on to building out the rest of the platform.

The container quietly kept the lights on for under two weeks.


The LXC Approach: Pros and Cons

This is not a bad pattern. It is the right call in the right context. The problem is knowing when you have outgrown it.

Pros

  • Fast to spin up. An LXC with Tailscale is a twenty-minute job. No extra hardware, no racking, no OS install. If you are in bootstrapping mode and need remote access now, this gets you there.
  • Zero additional cost. You are using existing capacity. During early build-out when you are still procuring hardware and standing up services, this matters.
  • Good enough for short windows. If your Proxmox host is stable and you are physically near the site, the blast radius of a failure is low. It is a calculated trade-off, not a mistake.
  • Disposable by design. When you are moving fast and iterating on your architecture, you do not want permanent decisions embedded in your network path. An LXC is easy to replace when you are ready.

Cons

  • Circular dependency. Your remote access tool depends on the host you need remote access to. This is the core problem.
  • Worst-case timing. The LXC goes down at the exact moment you need it, when something is wrong with Proxmox. Every failure scenario is the maximum severity scenario.
  • Container networking adds abstraction. Tailscale in an LXC works, but it shares the host kernel network namespace and produces subtle routing issues under load or after unexpected system events. Native systemd is cleaner.
  • Not audit-friendly. On anything with compliance requirements, having your VPN gateway co-resident with your hypervisor is a conversation you do not want to have.

When the LXC Is the Right Call

During bootstrapping, moving fast is the priority. You are standing up CloudStack, configuring Ceph, iterating on your Ansible playbooks. Every hour you spend procuring and racking dedicated hardware is an hour you are not building.

The LXC approach is correct when:

  • You are in early build-out and need remote access before dedicated hardware arrives
  • You are physically close to the site and recover in person if needed
  • The environment is not yet running production workloads
  • You have a clear plan to migrate once the platform stabilizes

Treat it as scaffolding. Useful while you are building, removed once the structure holds.


The Problem With Staying Too Long

The circular dependency is obvious in hindsight:

Remote Access → Tailscale → LXC on Proxmox → Proxmox
                                      ↑
                              (same machine)

If Proxmox hung, crashed, or needed a reboot, the LXC went down with it. VPN access disappeared at exactly the moment we needed it most. Recovery required physical access to the machine, or a trip on-site.

For a local DC this is manageable in the short term. For anything running workloads you care about, it is a single point of failure you do not want to carry past bootstrap.


The Fix: A Dedicated Physical Node

We deployed a dedicated ops node (in our case a Lenovo ThinkCentre M75q Tiny, but any spare hardware on a separate power and network path works). It runs Ubuntu 24.04, sits on the management VLAN at <OPS_NODE_MGMT_IP>, has an independent Wi-Fi uplink at <OPS_NODE_WIFI_IP>, and runs Tailscale as a native tailscaled systemd service.

Key differences from the LXC approach:

LXC ContainerDedicated Physical Node
LifecycleTied to ProxmoxIndependent
TailscaleContainerizedNative systemd
Kernel networkingShared (containerized)Full host kernel
Recovery if Proxmox failsNo VPN accessVPN stays up
Subnet advertised<YOUR_MGMT_SUBNET><YOUR_MGMT_SUBNET>

Running Tailscale natively matters for subnet routing. A native tailscaled service on a physical host owns its TUN interface and iptables rules directly. No namespace indirection, no shared kernel path.


What We Bootstrapped

The full ops node bootstrap runs through an Ansible playbook in phases. Adapt these to match your environment and tooling:

  1. Netplan: Static IPs on both ethernet and Wi-Fi interfaces (<OPS_NODE_MGMT_IP> and <OPS_NODE_WIFI_IP>)
  2. Prerequisites: Base packages, hostname, chrony (NTP)
  3. Swap: 4GB swapfile (size to match your RAM)
  4. SSH hardening: UFW, fail2ban, your auth provider (we use Authentik with PAM/NSS), sshd config
  5. Docker Engine: For future ops tooling
  6. Tailscale: Installed and connected to your Tailscale or Headscale server, advertising <YOUR_MGMT_SUBNET>
  7. SIEM agent: Enrollment in your monitoring groups (we use Wazuh)
  8. Observability: Collector shipping to your metrics backend (we use OpenTelemetry + SigNoz)

We approved the subnet route for <YOUR_MGMT_SUBNET> via the control plane UI. Tailscale on the new node showed connected immediately.


Saying Goodbye to the LXC

The original container served us well. It came up fast when we needed it, stayed stable for two weeks, and never dropped a packet. For a quick solution while the real infrastructure was taking shape, it was exactly right.

But the job it was doing (providing reliable remote access to the local DC) deserves a node not dependent on the local DC itself.

We are decommissioning the LXC. The dedicated ops node is the new subnet router.


Need help designing on-premises infrastructure built without circular dependencies? As a Private Cloud builder, ZSoftly helps Canadian companies stand up production-ready IaaS. Talk to us