High Availability

This guide explains how to deploy the AtlasAI Tenant Plane (TP) in a highly available configuration for production on-prem and BYOC environments. For SaaS and Dedicated TP customers, AtlasAI manages HA automatically — this guide is for self-hosted deployments.

How TP handles availability

The Tenant Plane is designed to be stateless. This means:

All persistent data lives in PostgreSQL (or external Redis for rate limiting)
Authentication uses JWT tokens that are verified locally on each request — no session store or sticky sessions needed
Any replica can serve any request — the load balancer does not need session affinity
You can run as many replicas as needed and scale them up or down without downtime

Because TP is stateless, high availability simply means: run at least 2 replicas behind a load balancer, connected to a highly available database.

Minimum HA requirements

Component	Minimum for HA	Recommended
TP replicas	2	3+ (across multiple nodes)
Database	PostgreSQL with read replica	PostgreSQL with streaming replication + auto-failover (Patroni, Aurora, CloudSQL)
Load balancer	Any TCP/HTTP LB (nginx, HAProxy, ALB)	Application load balancer with health checks
Redis	Optional (improves rate-limit consistency)	Redis Sentinel or cluster for HA rate limiting

Architecture overview


Internet / Internal Network
         │
         ▼
┌────────────────────────┐
│    Load Balancer        │
│  (ALB / nginx / k8s)   │
│  Health: /api/health   │
└────────┬───────────────┘
         │
    ┌────┴────┐
    │         │
    ▼         ▼
┌────────┐ ┌────────┐   (add more replicas as needed)
│ TP #1  │ │ TP #2  │
│        │ │        │
└───┬────┘ └───┬────┘
    │           │
    └─────┬─────┘
          │
    ┌─────▼──────────────┐
    │  PostgreSQL (HA)    │
    │  Primary + Replica  │
    └────────────────────┘
          │
    ┌─────▼──────────────┐
    │  Redis (optional)   │
    │  Rate limit cache   │
    └────────────────────┘

Option 1: Docker Compose with multiple replicas

For on-prem deployments not using Kubernetes, use Docker Compose with a reverse proxy.

Step 1: Configure your `.env`


# Identity
TENANT_ID=acme-corp
JWT_SECRET=<generate: openssl rand -hex 32>
ENCRYPTION_KEY=<generate: openssl rand -hex 32>
 
# Database (shared between all replicas)
TENANT_PLANE_DATABASE_URL=postgresql://atlasusr:password@db:5432/atlas
 
# Optional: Redis for rate limiting consistency across replicas
REDIS_URL=redis://redis:6379
 
# License (on-prem)
ATLASAI_LICENSE_KEY=eyJhbGciOiJSUzI1NiJ9...
CP_LICENSE_PUBLIC_KEY=-----BEGIN PUBLIC KEY-----\n...\n-----END PUBLIC KEY-----

Step 2: Scale using Docker Compose


# docker-compose.yml
services:
  tenant-plane:
    image: atlasai/tenant-plane:1.3.0
    env_file: .env
    deploy:
      replicas: 3    # run 3 instances
    healthcheck:
      test: ["CMD", "wget", "-q", "--spider", "http://localhost:3000/api/health"]
      interval: 30s
      timeout: 5s
      retries: 3
 
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
    depends_on:
      - tenant-plane

Step 3: Configure nginx as load balancer


# nginx.conf
upstream tenant_plane {
    least_conn;
    server tenant-plane:3000;
    keepalive 32;
}
 
server {
    listen 80;
    server_name atlas.yourdomain.com;
 
    location /api/health {
        proxy_pass http://tenant_plane;
        access_log off;
    }
 
    location / {
        proxy_pass http://tenant_plane;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_http_version 1.1;
        proxy_read_timeout 90s;
    }
}

Start everything:


docker compose up -d --scale tenant-plane=3

Verify all replicas are healthy:


docker compose ps
# Should show 3 tenant-plane instances, all "Up (healthy)"

Option 2: Kubernetes with Helm (recommended for production)

Step 1: Prepare your values file

Create a values-production.yaml file:


# Number of replicas — minimum 2 for HA, 3+ recommended
replicaCount: 3
 
image:
  repository: atlasai/tenant-plane
  tag: "1.3.0"
  pullPolicy: IfNotPresent
 
# Database — REQUIRED for HA (cannot use SQLite with multiple replicas)
db:
  url: "postgresql://atlasusr:password@rds.internal:5432/atlas"
  poolMin: 2
  poolMax: 10
 
# Authentication secrets — must be identical on all replicas
auth:
  jwtSecret: "your-32-char-secret-here"
  encryptionKey: "your-32-char-encryption-key"
 
# Tenant identity
tenant:
  id: "acme-corp"
 
# ─── High Availability settings ──────────────────────────────────────────────
 
ha:
  # Spread replicas across different nodes (preferred, not required)
  podAntiAffinity: true
  # Ensure at least 1 pod is always running during updates or node drains
  podDisruptionBudget:
    enabled: true
    minAvailable: 1
  # Spread replicas across availability zones
  topologySpreadConstraints: true
 
# Auto-scaling: scale based on CPU/memory load
autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 20
  targetCPUUtilizationPercentage: 70
  targetMemoryUtilizationPercentage: 80
 
# Resource requests and limits per replica
resources:
  requests:
    cpu: 250m
    memory: 512Mi
  limits:
    cpu: "1"
    memory: 1Gi
 
# License (for on-prem/BYOC)
license:
  publicKey: "-----BEGIN PUBLIC KEY-----\n...\n-----END PUBLIC KEY-----"
  key: "eyJhbGciOiJSUzI1NiJ9..."

Step 2: Install


helm install atlasai-tp deploy/helm/atlasai-tp \
  -f values-production.yaml \
  --namespace atlasai \
  --create-namespace \
  --wait \
  --timeout 5m

Step 3: Verify


# Check all pods are running
kubectl get pods -n atlasai -l app=atlasai-tp
 
# Check the PodDisruptionBudget is in place
kubectl get pdb -n atlasai
 
# Check horizontal autoscaler
kubectl get hpa -n atlasai
 
# Hit the health endpoint through the service
kubectl run test --rm -it --image=curlimages/curl --restart=Never -n atlasai -- \
  curl -s http://atlasai-tp/api/health

Expected pod output:


NAME                         READY   STATUS    RESTARTS   AGE
atlasai-tp-7d8f9b4c5-2xqmn  1/1     Running   0          5m
atlasai-tp-7d8f9b4c5-6bktv  1/1     Running   0          5m
atlasai-tp-7d8f9b4c5-r9pvw  1/1     Running   0          5m

Database high availability

Why PostgreSQL is required for HA

SQLite is a single-file database that cannot be shared across replicas. For multi-replica deployments you must use PostgreSQL.

Set the Postgres connection string:


TENANT_PLANE_DATABASE_URL=postgresql://username:password@hostname:5432/dbname

Recommended: managed Postgres with auto-failover

Platform	Service	Notes
AWS	Amazon RDS Aurora PostgreSQL	Automatic failover, up to 15 read replicas
GCP	Cloud SQL for PostgreSQL	HA with automatic failover
Azure	Azure Database for PostgreSQL	Zone-redundant HA
Self-hosted	Patroni + etcd + HAProxy	Open-source HA stack
Self-hosted	Postgres Streaming Replication	Manual failover, simpler setup

All of these work with AtlasAI. The Tenant Plane uses standard PostgreSQL wire protocol — no special extensions required beyond pgvector for AI search features.

Database connection pooling

AtlasAI automatically pools database connections. Configure the pool size per replica:


DB_POOL_MIN=2       # Minimum connections kept open (default: 2)
DB_POOL_MAX=10      # Maximum connections per replica (default: 10)
DB_CONNECT_TIMEOUT=5000   # Connection timeout in ms (default: 5000)
DB_IDLE_TIMEOUT=30000     # Idle connection timeout in ms (default: 30000)

Total connections formula: replicas × DB_POOL_MAX

Example: 3 replicas × 10 connections = 30 max connections to Postgres. Size your Postgres max_connections accordingly (default is usually 100).

Health checks and monitoring

The /api/health endpoint is the canonical health check for all monitoring:


curl https://your-tenant-plane/api/health


{
  "status": "ok",
  "plane": "tenant",
  "version": "1.3.0",
  "uptime_seconds": 14283,
  "db": {
    "enabled": true,
    "reachable": true
  },
  "vector_db": "pgvector",
  "timestamp": "2026-03-26T10:00:00.000Z"
}

status: "ok" — replica is healthy and ready to serve traffic
status: "degraded" — replica is running but some non-critical service is unavailable
HTTP 503 — replica is not ready; load balancer should stop sending traffic to it

Kubernetes probes

The Helm chart configures liveness and readiness probes automatically. Both point to /api/health:

Liveness probe: checked every 30 seconds; restarts pod if failing for more than 3 cycles
Readiness probe: checked every 10 seconds; removes pod from load balancer rotation when failing

Session and authentication

TP uses stateless JWT authentication. Here is what this means for HA:

No sticky sessions needed — any replica validates any JWT independently
No shared session store — tokens are self-contained and verified using JWT_SECRET
All replicas must share the same JWT_SECRET — if they differ, users logged into one replica cannot be authenticated by another

Make sure JWT_SECRET is identical on all replicas. In Kubernetes, this is set via the Helm secret automatically.

Redis (optional but recommended)

Redis is not required for HA but improves two things:

Rate limiting consistency — without Redis, each replica tracks rate limits independently, meaning the effective limit is per-replica limit × number of replicas. With Redis, the limit is global across all replicas.
JWT revocation — when a user’s session is forcibly terminated, Redis allows all replicas to instantly know the token is invalid.

Configure Redis:


REDIS_URL=redis://redis-hostname:6379

For Redis HA, use Redis Sentinel or Redis Cluster.

Recommended production topology


             ┌──────────────────────────────────────────┐
             │          Your Network                     │
             │                                          │
             │   ┌──────────────────────────────────┐   │
             │   │       Load Balancer / Ingress     │   │
             │   │    (nginx / ALB / k8s Ingress)    │   │
             │   └────────┬────────────┬─────────────┘   │
             │            │            │                  │
             │     ┌──────▼──┐  ┌──────▼──┐             │
             │     │  TP #1   │  │  TP #2   │  (+ more)  │
             │     │ 1 CPU    │  │ 1 CPU    │             │
             │     │ 1 GB RAM │  │ 1 GB RAM │             │
             │     └────┬────┘  └────┬────┘             │
             │          └─────┬──────┘                   │
             │                │                          │
             │   ┌────────────▼──────────────────────┐   │
             │   │  PostgreSQL (Aurora / Patroni)     │   │
             │   │  Primary    +    Standby           │   │
             │   └────────────────────────────────────┘   │
             │                                          │
             │   ┌────────────────────────────────────┐   │
             │   │  Redis (optional, for rate limits) │   │
             │   └────────────────────────────────────┘   │
             └──────────────────────────────────────────┘

This topology handles:

Single replica failure: remaining replicas serve 100% of traffic
Database primary failure: standby promotes automatically (Aurora < 30s, Patroni ~ 30-60s)
Node failure: Kubernetes reschedules pods to healthy nodes
Traffic spikes: HPA adds replicas based on CPU/memory