Runbook Generation

AtlasAI’s AI can generate remediation runbooks automatically from RCA results. Generated runbooks contain step-by-step actions, risk classifications, target specifications, and rollback instructions — ready for human review or automated execution.

How Generation Works

When you click Generate Runbook after an RCA completes, the AI:

Analyzes the root cause hypothesis — Understands what needs to be fixed and which systems are involved
Retrieves historical runbooks — Searches the RAG knowledge base for runbooks that previously resolved similar issues
Composes a step sequence — Generates an ordered list of remediation steps
Classifies risk — Each step is tagged Low, Medium, or High based on the action type and target
Adds rollback instructions — For reversible steps, generates the corresponding undo action
Inserts approval gates — Automatically places approval checkpoints before High-risk steps

Generated Runbook Structure

Each generated runbook contains:


Runbook: Resolve high CPU on prod-api-03
Generated from: INC-00042 RCA Hypothesis #1

Step 1: [LOW RISK] Check current CPU usage
  Target: prod-api-03 (Edge Agent)
  Command: top -bn1 | head -20
  Rollback: N/A (read-only)

Step 2: [LOW RISK] Identify top CPU-consuming processes
  Target: prod-api-03 (Edge Agent)
  Command: ps aux --sort=-%cpu | head -10
  Rollback: N/A (read-only)

Step 3: [MEDIUM RISK] Restart the API service
  Target: prod-api-03 (Edge Agent)
  Command: systemctl restart api-server
  Rollback: systemctl restart api-server
  ⚠️ Requires approval at L1/L2

Step 4: [LOW RISK] Verify service recovery
  Target: prod-api-03 (Edge Agent)
  Command: curl -s http://localhost:8080/health
  Rollback: N/A (read-only)

Customizing Generated Runbooks

Generated runbooks are starting points — you can edit them before saving:

Add steps — Insert additional verification or notification steps
Remove steps — Delete steps that don’t apply to your environment
Reorder steps — Drag steps to change execution order
Change risk levels — Override the AI’s risk classification if you disagree
Edit commands — Modify commands to match your specific environment
Add conditions — Insert conditional branches (e.g., “If CPU > 90%, then scale up; else, restart”)

Saving and Reusing

After review, save the runbook to the Runbook Library. Future incidents with matching RCA patterns will automatically suggest this runbook — and if the AI generated it from a high-confidence RCA, the suggested version will already be pre-customized for the specific incident context.

Generation Quality

The quality of generated runbooks improves over time as AtlasAI learns from:

Operator edits — When you modify a generated runbook, the AI learns your preferences
Execution outcomes — Successful runbook executions reinforce the step patterns; failures trigger learning
Feedback — Explicit thumbs-up/thumbs-down on generated runbooks adjusts the generation model