Rollback & Recovery

This guide explains how to revert the Tenant Plane to a previous version after a failed or problematic upgrade. The rollback process restores the container image — database schema changes are not reversed by default.

When to roll back

Roll back when:

The health endpoint returns errors after upgrading
Users cannot log in or encounter broken functionality
An integration stopped working and you suspect the upgrade is the cause
A critical performance regression is observed after upgrading

How rollback works

The upgrade script records the previous version before every upgrade. To roll back:

The script restores the docker-compose.yml image tag to the previous version
It restarts the service with the old image
It polls the health endpoint to confirm the rollback succeeded
Database schema changes are NOT reversed — the schema stays at the migrated state

The last point is important: if the new version added a database column, that column remains after rollback. The previous version will simply ignore it. This is safe because AtlasAI migrations are purely additive — they never drop or rename columns that the old code expects.

If you need to restore the database to its pre-upgrade state, you must restore from a database backup.

Option 1: Automatic rollback during upgrade

If you used upgrade-tp.sh and the health check failed, the script automatically rolled back for you. Check the upgrade log to confirm:


# View the most recent upgrade log
ls -lt /var/log/atlas/upgrade-*.log | head -1
cat /var/log/atlas/upgrade-<timestamp>.log
 
# Check upgrade history
cat ./data/.atlas-upgrade-history
# Output example:
# 20260326T100000 1.2.2 -> 1.3.0 SUCCESS log:/var/log/atlas/upgrade-20260326T100000.log
# 20260326T120000 ROLLBACK 1.3.0 -> 1.2.2 log:/var/log/atlas/rollback-20260326T120000.log

Option 2: Manual rollback (Docker Compose)

Use this when the upgrade script auto-rollback did not run, or when you decide to roll back after the upgrade initially seemed fine.

Step 1: Identify the previous version


# The upgrade script records this automatically
cat ./data/.atlas-previous-version
# Output: 1.2.2

If the file is missing, check the upgrade history:


cat ./data/.atlas-upgrade-history

Or check Docker image history:


docker images atlasai/tenant-plane --format "table {{.Tag}}\t{{.CreatedAt}}" | sort

Step 2: Run the rollback script


# Auto-detect previous version
bash scripts/rollback-tp.sh
 
# Or specify the target version explicitly
ATLAS_VERSION=1.2.2 bash scripts/rollback-tp.sh

The rollback script:

Detects the previous version from .atlas-previous-version (or uses ATLAS_VERSION)
Updates the image tag in docker-compose.yml
Checks if the old image is available locally (if not, pulls it)
Restarts the service
Polls the health endpoint for up to 60 seconds
Records the rollback in .atlas-upgrade-history

Step 3: Verify


# Check version
docker compose exec tenant-plane cat /app/.atlas-version
# Should show the previous version
 
# Check health
curl -s http://localhost:3000/api/health

Option 3: Kubernetes rollback

Kubernetes keeps the previous Deployment revision, making rollback instant.

Instant rollback (< 1 minute)


# Roll back to the immediately previous revision
kubectl rollout undo deployment/atlasai-tp -n atlasai
 
# Monitor the rollback
kubectl rollout status deployment/atlasai-tp -n atlasai
 
# Verify
kubectl get pods -n atlasai -l app=atlasai-tp

Roll back to a specific revision


# List available revisions
kubectl rollout history deployment/atlasai-tp -n atlasai
 
# Roll back to revision 3
kubectl rollout undo deployment/atlasai-tp --to-revision=3 -n atlasai

Roll back via Helm


# List Helm release history
helm history atlasai-tp -n atlasai
 
# Roll back to the previous Helm release
helm rollback atlasai-tp -n atlasai
 
# Roll back to a specific Helm revision number
helm rollback atlasai-tp 2 -n atlasai --wait

Restoring from a database backup

Roll back the database only if:

The upgrade introduced a data corruption issue
The new schema changed data in a way that breaks the old code
You need to fully reproduce the pre-upgrade state for debugging

Warning: restoring the database overwrites all data written after the backup was taken. Any incidents, runbook executions, or configuration changes made after the upgrade will be lost.

Step 1: Stop the Tenant Plane


# Docker Compose
docker compose stop tenant-plane
 
# Kubernetes
kubectl scale deployment/atlasai-tp --replicas=0 -n atlasai

Step 2: Find your backup

The upgrade script creates a backup before every upgrade:


ls -lh /var/log/atlas/db-backup-*.sql.gz
# Output:
# -rw-r--r-- 1 user user 45M Mar 26 10:00 db-backup-1.2.2-20260326T100000.sql.gz

Step 3: Restore


# Using the rollback script with restore flag
RESTORE_DB=1 \
DB_BACKUP_FILE=/var/log/atlas/db-backup-1.2.2-20260326T100000.sql.gz \
DB_URL=postgresql://atlasusr:password@localhost:5432/atlas \
bash scripts/rollback-tp.sh
 
# Manual restore (alternative)
zcat /var/log/atlas/db-backup-1.2.2-20260326T100000.sql.gz \
  | psql postgresql://atlasusr:password@localhost:5432/atlas

Step 4: Restart the Tenant Plane


# Docker Compose
docker compose up -d tenant-plane
 
# Kubernetes
kubectl scale deployment/atlasai-tp --replicas=3 -n atlasai

Recovery checklist

After any rollback:

Health endpoint returns "status": "ok"
Correct version is shown in GET /api/admin/license or Settings page
Users can log in
Edge agents reconnect (check last-seen timestamps in Settings → Edge Agents)
Review the upgrade log to understand what went wrong before retrying

Getting help

If rollback fails or the system is still not healthy after rollback, contact support@atlasai.com with:

Your TP version (before and after upgrade attempt)
The contents of the upgrade log (/var/log/atlas/upgrade-*.log)
The output of GET /api/health
Any relevant Docker / Kubernetes logs (docker compose logs tenant-plane or kubectl logs -n atlasai deploy/atlasai-tp)