Skip to main content
ZSoftly logo
DevOps

How We Consolidated 4 PostgreSQL Clusters into 1 with CloudNativePG

Staff at ZSoftly
8 min read
Kubernetes
PostgreSQL
CloudNativePG
PgBouncer
Database
DevOps
Platform Engineering
Cost Optimization
ArgoCD
GitOps
AWS
EKS
Share:
Architecture diagram showing 4 applications connecting to a shared PostgreSQL cluster with PgBouncer connection pooling

We had a problem. Four applications. Four separate PostgreSQL clusters. 91MB of actual data spread across 50Gi of allocated storage.

Each cluster created its own PodDisruptionBudget. Node drains during maintenance became difficult. Karpenter struggled to consolidate workloads.

This is the story of how we consolidated everything into a single HA cluster with PgBouncer connection pooling.


TL;DR

We consolidated 4 CloudNativePG clusters into 1 shared cluster with PgBouncer. The key gotcha: ArgoCD sync waves are required when using ExternalSecrets with CNPG, or the cluster bootstraps with the wrong password.

Results:

  • 4 PDBs reduced to 1
  • 50Gi storage reduced to 30Gi (40% less)
  • High availability added (2 replicas with streaming replication)
  • Connection pooling enabled via PgBouncer
  • Single backup schedule instead of 4

The Problem

Our platform ran four internal applications, each with its own CloudNativePG database:

AppInstancesStorageCPUMemoryData Size
n8n120Gi100m256Mi11 MB
vaultwarden15Gi100m256Mi9 MB
wikijs15Gi100m256Mi10 MB
zammad120Gi250m512Mi61 MB
Total4 pods50Gi550m1.3Gi91 MB

91MB of data. 50Gi of provisioned storage. Four separate upgrade cycles to coordinate.

The real issue was PodDisruptionBudgets. Each CNPG cluster creates a PDB that blocks node drains until replicas are ready. With single-instance clusters, nodes could not drain at all during maintenance windows.


Why These 4 Apps?

Before consolidating, we analyzed access patterns and compatibility:

  1. Low write frequency - All four apps have minimal concurrent writes. No risk of lock contention.
  2. Independent schemas - Each app uses its own database. No shared tables or cross-database queries.
  3. Similar SLAs - All are internal tools. A brief maintenance window is acceptable.
  4. Non-critical workloads - These support internal workflows, not customer-facing production.

Critical apps keep their own clusters. Our production databases (customer data, billing, auth) remain on dedicated CNPG clusters with separate backup schedules, stricter PDBs, and isolated failure domains. Consolidation is not a universal solution—it fits workloads with compatible access patterns and similar reliability requirements.


The Solution

One shared PostgreSQL cluster with PgBouncer connection pooling. This also gave us High Availability—something the individual single-instance clusters lacked.

ComponentInstancesStorageCPUMemory
platform-db2 (HA)30Gi500m1Gi
pgbouncer2-100m128Mi
Total4 pods30Gi700m1.3Gi

Same pod count. Less storage. High availability included.


The Architecture

Before: 4 Separate Clusters

Before: 4 Separate PostgreSQL Clusters

Each application had its own dedicated CloudNativePG cluster. Four PodDisruptionBudgets. Four backup schedules. 50Gi of storage for 91MB of data.

After: 1 Shared Cluster

After: 1 Shared PostgreSQL Cluster with PgBouncer

Applications connect to platform-db-pooler.platform-db.svc on port 5432. PgBouncer handles connection pooling in transaction mode. The CNPG cluster manages replication and failover automatically.


The Gotcha: Sync Waves Required

Our first deployment failed. The init job could not authenticate to the database.

FATAL: password authentication failed for user "postgres"

Root cause: ArgoCD synced all resources in parallel. The CNPG Cluster bootstrapped before ExternalSecret finished syncing the superuser password from AWS Secrets Manager. CNPG generated its own random password. The init job read the (now-synced) secret with a different password.

The fix: ArgoCD sync waves.

# ExternalSecrets must sync FIRST (wave -1)
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: platform-db-superuser
  annotations:
    argocd.argoproj.io/sync-wave: '-1'
# CNPG Cluster syncs AFTER secrets exist (wave 0)
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: platform-db
  annotations:
    argocd.argoproj.io/sync-wave: '0'

Sync order:

  1. Wave -1: ExternalSecrets create Kubernetes secrets from AWS Secrets Manager
  2. Wave 0: CNPG Cluster reads superuserSecret (now exists with correct password)
  3. PostSync: Init job creates database users

This is not documented anywhere. If you use CNPG with ExternalSecrets and ArgoCD, you need sync waves.


The Migration

We migrated apps in order of risk. Lowest risk first.

Migration Steps (per app)

  1. Scale down the application
  2. Dump from old cluster using postgres superuser
  3. Restore to shared cluster
  4. Grant permissions to app user
  5. Update Helm values to point to new host
  6. Push changes via GitOps
  7. Scale up and verify

Example: Wikijs Migration

# 1. Scale down
kubectl scale deployment p11-wikijs -n wikijs --replicas=0

# 2. Dump from old cluster
kubectl exec wikijs-db-1 -n wikijs -c postgres -- \
  pg_dump -U postgres -d wikijs --no-owner --no-acl > wikijs.sql

# 3. Restore to shared cluster
cat wikijs.sql | kubectl exec -i platform-db-1 -n platform-db -c postgres -- \
  psql -U postgres -d wikijs

# 4. Grant permissions
kubectl exec platform-db-1 -n platform-db -c postgres -- \
  psql -U postgres -d wikijs -c "GRANT ALL ON ALL TABLES IN SCHEMA public TO wikijs"

Then update the Helm values:

# Before
cloudnativepg:
  enabled: true
wiki:
  postgresql:
    postgresqlHost: wikijs-db-rw

# After
cloudnativepg:
  enabled: false
wiki:
  postgresql:
    postgresqlHost: platform-db-pooler.platform-db.svc

Commit, push, wait for ArgoCD sync, scale up, verify.

Migration Summary

AppTables MigratedResult
wikijs341/1 Running
n8n561/1 Running
vaultwarden301/1 Running
zammad120All Running
Total240All successful

Important Notes

Use postgres superuser for dump/restore

Peer authentication fails for app users when using kubectl exec. Always use the postgres superuser:

# This fails
kubectl exec db-1 -c postgres -- pg_dump -U appuser -d appdb

# This works
kubectl exec db-1 -c postgres -- pg_dump -U postgres -d appdb --no-owner --no-acl

Grant permissions after restore

After restoring as postgres, grant ownership to the app user:

GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO appuser;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO appuser;

Wait for ArgoCD sync before deleting resources

If you manually delete Kubernetes resources that ArgoCD manages, ArgoCD recreates them on the next reconciliation. Wait for ArgoCD to sync the new values first. Then it prunes the old resources automatically.

EKS Auto Mode ArgoCD

If you run ArgoCD as an EKS Auto Mode capability, the CLI requires token-based authentication. Standard argocd login does not work. See ArgoCD CLI Login on EKS Auto Mode for the correct setup.


Results

MetricBeforeAfterChange
CNPG Clusters41-75%
Total Pods44Same
Storage50Gi30Gi-40%
CPU Requests550m~250m-300m
Memory1.3Gi~500Mi-800Mi
PDBs41-75%
HA Replicas01Added
Backup Jobs41-75%
Conn PoolingNoYesAdded

The consolidation reduced operational overhead without sacrificing reliability. Node drains work smoothly with a single PDB. Backups are simpler with one schedule. Connection pooling reduces database load.

We saved approximately 300m CPU and 800Mi memory. More importantly, we eliminated 3 PodDisruptionBudgets that were blocking node consolidation.


Lessons Learned

  1. Analyze access patterns first. Not all databases should be consolidated. We chose these four because they have low write frequency, independent schemas, and similar SLAs. Critical production databases stay isolated.

  2. Consolidation can add HA, not just reduce costs. Our single-instance clusters had no replicas. The shared cluster runs PRIMARY + REPLICA with automatic failover. We improved reliability while reducing overhead.

  3. Sync waves are mandatory for CNPG + ExternalSecrets. The superuser secret must exist before the Cluster resource. Use wave -1 for ExternalSecrets, wave 0 for Cluster.

  4. Reuse existing secrets. We already had per-app database credentials in AWS Secrets Manager. No need to create new ones.

  5. Migrate in order of risk. Start with the least critical app. Build confidence before touching production data.

  6. Do not manually delete ArgoCD-managed resources. Wait for the sync. Let ArgoCD prune automatically.

  7. PgBouncer transaction mode works for most apps. All four applications worked without changes to connection handling.


Running multiple PostgreSQL clusters on Kubernetes? As an AWS Partner, ZSoftly provides Kubernetes consulting and database optimization for Canadian companies. Talk to us


Sources