High Availability
This guide explains how to deploy the AtlasAI Tenant Plane (TP) in a highly available configuration for production on-prem and BYOC environments. For SaaS and Dedicated TP customers, AtlasAI manages HA automatically — this guide is for self-hosted deployments.
How TP handles availability
The Tenant Plane is designed to be stateless. This means:
- All persistent data lives in PostgreSQL (or external Redis for rate limiting)
- Authentication uses JWT tokens that are verified locally on each request — no session store or sticky sessions needed
- Any replica can serve any request — the load balancer does not need session affinity
- You can run as many replicas as needed and scale them up or down without downtime
Because TP is stateless, high availability simply means: run at least 2 replicas behind a load balancer, connected to a highly available database.
Minimum HA requirements
| Component | Minimum for HA | Recommended |
|---|---|---|
| TP replicas | 2 | 3+ (across multiple nodes) |
| Database | PostgreSQL with read replica | PostgreSQL with streaming replication + auto-failover (Patroni, Aurora, CloudSQL) |
| Load balancer | Any TCP/HTTP LB (nginx, HAProxy, ALB) | Application load balancer with health checks |
| Redis | Optional (improves rate-limit consistency) | Redis Sentinel or cluster for HA rate limiting |
Architecture overview
Internet / Internal Network
│
▼
┌────────────────────────┐
│ Load Balancer │
│ (ALB / nginx / k8s) │
│ Health: /api/health │
└────────┬───────────────┘
│
┌────┴────┐
│ │
▼ ▼
┌────────┐ ┌────────┐ (add more replicas as needed)
│ TP #1 │ │ TP #2 │
│ │ │ │
└───┬────┘ └───┬────┘
│ │
└─────┬─────┘
│
┌─────▼──────────────┐
│ PostgreSQL (HA) │
│ Primary + Replica │
└────────────────────┘
│
┌─────▼──────────────┐
│ Redis (optional) │
│ Rate limit cache │
└────────────────────┘Option 1: Docker Compose with multiple replicas
For on-prem deployments not using Kubernetes, use Docker Compose with a reverse proxy.
Step 1: Configure your .env
# Identity
TENANT_ID=acme-corp
JWT_SECRET=<generate: openssl rand -hex 32>
ENCRYPTION_KEY=<generate: openssl rand -hex 32>
# Database (shared between all replicas)
TENANT_PLANE_DATABASE_URL=postgresql://atlasusr:password@db:5432/atlas
# Optional: Redis for rate limiting consistency across replicas
REDIS_URL=redis://redis:6379
# License (on-prem)
ATLASAI_LICENSE_KEY=eyJhbGciOiJSUzI1NiJ9...
CP_LICENSE_PUBLIC_KEY=-----BEGIN PUBLIC KEY-----\n...\n-----END PUBLIC KEY-----Step 2: Scale using Docker Compose
# docker-compose.yml
services:
tenant-plane:
image: atlasai/tenant-plane:1.3.0
env_file: .env
deploy:
replicas: 3 # run 3 instances
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://localhost:3000/api/health"]
interval: 30s
timeout: 5s
retries: 3
nginx:
image: nginx:alpine
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
depends_on:
- tenant-planeStep 3: Configure nginx as load balancer
# nginx.conf
upstream tenant_plane {
least_conn;
server tenant-plane:3000;
keepalive 32;
}
server {
listen 80;
server_name atlas.yourdomain.com;
location /api/health {
proxy_pass http://tenant_plane;
access_log off;
}
location / {
proxy_pass http://tenant_plane;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_http_version 1.1;
proxy_read_timeout 90s;
}
}Start everything:
docker compose up -d --scale tenant-plane=3Verify all replicas are healthy:
docker compose ps
# Should show 3 tenant-plane instances, all "Up (healthy)"Option 2: Kubernetes with Helm (recommended for production)
Step 1: Prepare your values file
Create a values-production.yaml file:
# Number of replicas — minimum 2 for HA, 3+ recommended
replicaCount: 3
image:
repository: atlasai/tenant-plane
tag: "1.3.0"
pullPolicy: IfNotPresent
# Database — REQUIRED for HA (cannot use SQLite with multiple replicas)
db:
url: "postgresql://atlasusr:password@rds.internal:5432/atlas"
poolMin: 2
poolMax: 10
# Authentication secrets — must be identical on all replicas
auth:
jwtSecret: "your-32-char-secret-here"
encryptionKey: "your-32-char-encryption-key"
# Tenant identity
tenant:
id: "acme-corp"
# ─── High Availability settings ──────────────────────────────────────────────
ha:
# Spread replicas across different nodes (preferred, not required)
podAntiAffinity: true
# Ensure at least 1 pod is always running during updates or node drains
podDisruptionBudget:
enabled: true
minAvailable: 1
# Spread replicas across availability zones
topologySpreadConstraints: true
# Auto-scaling: scale based on CPU/memory load
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 20
targetCPUUtilizationPercentage: 70
targetMemoryUtilizationPercentage: 80
# Resource requests and limits per replica
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
cpu: "1"
memory: 1Gi
# License (for on-prem/BYOC)
license:
publicKey: "-----BEGIN PUBLIC KEY-----\n...\n-----END PUBLIC KEY-----"
key: "eyJhbGciOiJSUzI1NiJ9..."Step 2: Install
helm install atlasai-tp deploy/helm/atlasai-tp \
-f values-production.yaml \
--namespace atlasai \
--create-namespace \
--wait \
--timeout 5mStep 3: Verify
# Check all pods are running
kubectl get pods -n atlasai -l app=atlasai-tp
# Check the PodDisruptionBudget is in place
kubectl get pdb -n atlasai
# Check horizontal autoscaler
kubectl get hpa -n atlasai
# Hit the health endpoint through the service
kubectl run test --rm -it --image=curlimages/curl --restart=Never -n atlasai -- \
curl -s http://atlasai-tp/api/healthExpected pod output:
NAME READY STATUS RESTARTS AGE
atlasai-tp-7d8f9b4c5-2xqmn 1/1 Running 0 5m
atlasai-tp-7d8f9b4c5-6bktv 1/1 Running 0 5m
atlasai-tp-7d8f9b4c5-r9pvw 1/1 Running 0 5mDatabase high availability
Why PostgreSQL is required for HA
SQLite is a single-file database that cannot be shared across replicas. For multi-replica deployments you must use PostgreSQL.
Set the Postgres connection string:
TENANT_PLANE_DATABASE_URL=postgresql://username:password@hostname:5432/dbnameRecommended: managed Postgres with auto-failover
| Platform | Service | Notes |
|---|---|---|
| AWS | Amazon RDS Aurora PostgreSQL | Automatic failover, up to 15 read replicas |
| GCP | Cloud SQL for PostgreSQL | HA with automatic failover |
| Azure | Azure Database for PostgreSQL | Zone-redundant HA |
| Self-hosted | Patroni + etcd + HAProxy | Open-source HA stack |
| Self-hosted | Postgres Streaming Replication | Manual failover, simpler setup |
All of these work with AtlasAI. The Tenant Plane uses standard PostgreSQL wire protocol — no special extensions required beyond pgvector for AI search features.
Database connection pooling
AtlasAI automatically pools database connections. Configure the pool size per replica:
DB_POOL_MIN=2 # Minimum connections kept open (default: 2)
DB_POOL_MAX=10 # Maximum connections per replica (default: 10)
DB_CONNECT_TIMEOUT=5000 # Connection timeout in ms (default: 5000)
DB_IDLE_TIMEOUT=30000 # Idle connection timeout in ms (default: 30000)Total connections formula: replicas × DB_POOL_MAX
Example: 3 replicas × 10 connections = 30 max connections to Postgres. Size your Postgres max_connections accordingly (default is usually 100).
Health checks and monitoring
The /api/health endpoint is the canonical health check for all monitoring:
curl https://your-tenant-plane/api/health{
"status": "ok",
"plane": "tenant",
"version": "1.3.0",
"uptime_seconds": 14283,
"db": {
"enabled": true,
"reachable": true
},
"vector_db": "pgvector",
"timestamp": "2026-03-26T10:00:00.000Z"
}status: "ok"— replica is healthy and ready to serve trafficstatus: "degraded"— replica is running but some non-critical service is unavailable- HTTP 503 — replica is not ready; load balancer should stop sending traffic to it
Kubernetes probes
The Helm chart configures liveness and readiness probes automatically. Both point to /api/health:
- Liveness probe: checked every 30 seconds; restarts pod if failing for more than 3 cycles
- Readiness probe: checked every 10 seconds; removes pod from load balancer rotation when failing
Session and authentication
TP uses stateless JWT authentication. Here is what this means for HA:
- No sticky sessions needed — any replica validates any JWT independently
- No shared session store — tokens are self-contained and verified using
JWT_SECRET - All replicas must share the same
JWT_SECRET— if they differ, users logged into one replica cannot be authenticated by another
Make sure JWT_SECRET is identical on all replicas. In Kubernetes, this is set via the Helm secret automatically.
Redis (optional but recommended)
Redis is not required for HA but improves two things:
- Rate limiting consistency — without Redis, each replica tracks rate limits independently, meaning the effective limit is
per-replica limit × number of replicas. With Redis, the limit is global across all replicas. - JWT revocation — when a user’s session is forcibly terminated, Redis allows all replicas to instantly know the token is invalid.
Configure Redis:
REDIS_URL=redis://redis-hostname:6379For Redis HA, use Redis Sentinel or Redis Cluster.
Recommended production topology
┌──────────────────────────────────────────┐
│ Your Network │
│ │
│ ┌──────────────────────────────────┐ │
│ │ Load Balancer / Ingress │ │
│ │ (nginx / ALB / k8s Ingress) │ │
│ └────────┬────────────┬─────────────┘ │
│ │ │ │
│ ┌──────▼──┐ ┌──────▼──┐ │
│ │ TP #1 │ │ TP #2 │ (+ more) │
│ │ 1 CPU │ │ 1 CPU │ │
│ │ 1 GB RAM │ │ 1 GB RAM │ │
│ └────┬────┘ └────┬────┘ │
│ └─────┬──────┘ │
│ │ │
│ ┌────────────▼──────────────────────┐ │
│ │ PostgreSQL (Aurora / Patroni) │ │
│ │ Primary + Standby │ │
│ └────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────┐ │
│ │ Redis (optional, for rate limits) │ │
│ └────────────────────────────────────┘ │
└──────────────────────────────────────────┘This topology handles:
- Single replica failure: remaining replicas serve 100% of traffic
- Database primary failure: standby promotes automatically (Aurora < 30s, Patroni ~ 30-60s)
- Node failure: Kubernetes reschedules pods to healthy nodes
- Traffic spikes: HPA adds replicas based on CPU/memory