Skip to Content
AdministrationRollback & Recovery

Rollback & Recovery

This guide explains how to revert the Tenant Plane to a previous version after a failed or problematic upgrade. The rollback process restores the container image — database schema changes are not reversed by default.


When to roll back

Roll back when:

  • The health endpoint returns errors after upgrading
  • Users cannot log in or encounter broken functionality
  • An integration stopped working and you suspect the upgrade is the cause
  • A critical performance regression is observed after upgrading

How rollback works

The upgrade script records the previous version before every upgrade. To roll back:

  1. The script restores the docker-compose.yml image tag to the previous version
  2. It restarts the service with the old image
  3. It polls the health endpoint to confirm the rollback succeeded
  4. Database schema changes are NOT reversed — the schema stays at the migrated state

The last point is important: if the new version added a database column, that column remains after rollback. The previous version will simply ignore it. This is safe because AtlasAI migrations are purely additive — they never drop or rename columns that the old code expects.

If you need to restore the database to its pre-upgrade state, you must restore from a database backup.


Option 1: Automatic rollback during upgrade

If you used upgrade-tp.sh and the health check failed, the script automatically rolled back for you. Check the upgrade log to confirm:

# View the most recent upgrade log ls -lt /var/log/atlas/upgrade-*.log | head -1 cat /var/log/atlas/upgrade-<timestamp>.log # Check upgrade history cat ./data/.atlas-upgrade-history # Output example: # 20260326T100000 1.2.2 -> 1.3.0 SUCCESS log:/var/log/atlas/upgrade-20260326T100000.log # 20260326T120000 ROLLBACK 1.3.0 -> 1.2.2 log:/var/log/atlas/rollback-20260326T120000.log

Option 2: Manual rollback (Docker Compose)

Use this when the upgrade script auto-rollback did not run, or when you decide to roll back after the upgrade initially seemed fine.

Step 1: Identify the previous version

# The upgrade script records this automatically cat ./data/.atlas-previous-version # Output: 1.2.2

If the file is missing, check the upgrade history:

cat ./data/.atlas-upgrade-history

Or check Docker image history:

docker images atlasai/tenant-plane --format "table {{.Tag}}\t{{.CreatedAt}}" | sort

Step 2: Run the rollback script

# Auto-detect previous version bash scripts/rollback-tp.sh # Or specify the target version explicitly ATLAS_VERSION=1.2.2 bash scripts/rollback-tp.sh

The rollback script:

  1. Detects the previous version from .atlas-previous-version (or uses ATLAS_VERSION)
  2. Updates the image tag in docker-compose.yml
  3. Checks if the old image is available locally (if not, pulls it)
  4. Restarts the service
  5. Polls the health endpoint for up to 60 seconds
  6. Records the rollback in .atlas-upgrade-history

Step 3: Verify

# Check version docker compose exec tenant-plane cat /app/.atlas-version # Should show the previous version # Check health curl -s http://localhost:3000/api/health

Option 3: Kubernetes rollback

Kubernetes keeps the previous Deployment revision, making rollback instant.

Instant rollback (< 1 minute)

# Roll back to the immediately previous revision kubectl rollout undo deployment/atlasai-tp -n atlasai # Monitor the rollback kubectl rollout status deployment/atlasai-tp -n atlasai # Verify kubectl get pods -n atlasai -l app=atlasai-tp

Roll back to a specific revision

# List available revisions kubectl rollout history deployment/atlasai-tp -n atlasai # Roll back to revision 3 kubectl rollout undo deployment/atlasai-tp --to-revision=3 -n atlasai

Roll back via Helm

# List Helm release history helm history atlasai-tp -n atlasai # Roll back to the previous Helm release helm rollback atlasai-tp -n atlasai # Roll back to a specific Helm revision number helm rollback atlasai-tp 2 -n atlasai --wait

Restoring from a database backup

Roll back the database only if:

  • The upgrade introduced a data corruption issue
  • The new schema changed data in a way that breaks the old code
  • You need to fully reproduce the pre-upgrade state for debugging

Warning: restoring the database overwrites all data written after the backup was taken. Any incidents, runbook executions, or configuration changes made after the upgrade will be lost.

Step 1: Stop the Tenant Plane

# Docker Compose docker compose stop tenant-plane # Kubernetes kubectl scale deployment/atlasai-tp --replicas=0 -n atlasai

Step 2: Find your backup

The upgrade script creates a backup before every upgrade:

ls -lh /var/log/atlas/db-backup-*.sql.gz # Output: # -rw-r--r-- 1 user user 45M Mar 26 10:00 db-backup-1.2.2-20260326T100000.sql.gz

Step 3: Restore

# Using the rollback script with restore flag RESTORE_DB=1 \ DB_BACKUP_FILE=/var/log/atlas/db-backup-1.2.2-20260326T100000.sql.gz \ DB_URL=postgresql://atlasusr:password@localhost:5432/atlas \ bash scripts/rollback-tp.sh # Manual restore (alternative) zcat /var/log/atlas/db-backup-1.2.2-20260326T100000.sql.gz \ | psql postgresql://atlasusr:password@localhost:5432/atlas

Step 4: Restart the Tenant Plane

# Docker Compose docker compose up -d tenant-plane # Kubernetes kubectl scale deployment/atlasai-tp --replicas=3 -n atlasai

Recovery checklist

After any rollback:

  • Health endpoint returns "status": "ok"
  • Correct version is shown in GET /api/admin/license or Settings page
  • Users can log in
  • Edge agents reconnect (check last-seen timestamps in Settings → Edge Agents)
  • Review the upgrade log to understand what went wrong before retrying

Getting help

If rollback fails or the system is still not healthy after rollback, contact support@atlasai.com with:

  • Your TP version (before and after upgrade attempt)
  • The contents of the upgrade log (/var/log/atlas/upgrade-*.log)
  • The output of GET /api/health
  • Any relevant Docker / Kubernetes logs (docker compose logs tenant-plane or kubectl logs -n atlasai deploy/atlasai-tp)