Your First Incident
This tutorial walks you through the complete incident lifecycle in AtlasAI — from creation to resolution. You will create an incident, attach evidence, run AI-powered root cause analysis, generate a runbook, and review trust decisions.
Prerequisites
- An AtlasAI tenant (any deployment model)
- At least one data source connected (even a test Edge Agent is sufficient)
Tip: Use the command palette to jump to Incidents or Command Center quickly.
Step 1: Create an Incident
- Open the Command Center from the left sidebar
- Click New Incident in the top-right corner
- Fill in the incident details:
- Title: “High CPU usage on production API server”
- Severity: P2 — High
- Category: Performance
- Affected Service: Select a service from your CMDB, or type a name
- Click Create Incident
The incident is assigned an ID (e.g., INC-00042) and appears in the Active Incidents list.
Step 2: Add Evidence
Evidence gives the AI engine context to perform accurate root cause analysis. Click into your incident, then:
- Click Add Evidence in the Evidence panel
- Choose one or more evidence types:
- Metrics — Attach a time-range snapshot (e.g., CPU metrics for the last 2 hours)
- Logs — Attach a log query result (e.g., error logs from the affected host)
- Alerts — Link correlated alerts that fired around the same time
- Topology — Attach the service dependency graph for the affected service
- Click Attach for each piece of evidence
The more evidence you provide, the more accurate the RCA will be. AtlasAI can also auto-collect evidence if Edge Agents are installed on the affected hosts.
Step 3: Run Root Cause Analysis
With evidence attached, you are ready to run RCA:
- Click Run RCA in the incident toolbar
- AtlasAI’s reasoning engine will:
- Analyze all attached evidence
- Cross-reference with historical incidents via RAG
- Walk the service topology to identify upstream/downstream impact
- Generate a ranked list of probable root causes
- Results typically appear within 15–45 seconds
The RCA panel displays:
- Root Cause Hypothesis — The most likely cause with a confidence score
- Evidence Chain — How the AI arrived at the conclusion, with links to specific data points
- Related Incidents — Historical incidents with similar patterns
- Impact Analysis — Upstream and downstream services affected
![]()
Capture from: Tenant Plane → Incidents → open incident → Run RCA. Add as public/img/rca-result.png.
Step 4: Generate a Runbook
Once you have a root cause hypothesis, AtlasAI can generate an automated remediation runbook:
- Click Generate Runbook below the RCA result
- Review the proposed steps — each step shows:
- Action — What will be executed (e.g., restart a service, scale a deployment)
- Target — Which host or service the action targets
- Risk Level — Low / Medium / High
- Reversible — Whether the action can be rolled back
- Edit any steps if needed — you can add, remove, or reorder steps
The generated runbook is saved to your Runbook Library for future use.
Step 5: Review the Trust Decision
AtlasAI uses a progressive autonomy model (L0–L5) to determine how much authority the AI has:
| Level | Name | Behavior |
|---|---|---|
| L0 | Inform | AI reports findings, human does everything |
| L1 | Suggest | AI suggests actions, human approves each step |
| L2 | Act & Report | AI executes low-risk actions, reports results |
| L3 | Act & Alert | AI executes most actions, alerts on high-risk ones |
| L4 | Full Auto | AI handles incident end-to-end, human reviews after |
| L5 | Closed Loop | AI resolves and closes without human intervention |
For your first incident, the system defaults to L1 (Suggest). You will see each proposed action with an Approve or Reject button. Click Approve on a step to execute it, or Approve All to run the entire runbook.
Step 6: Resolve the Incident
After the runbook executes successfully:
- Verify the issue is resolved by checking metrics and logs
- Click Resolve on the incident
- Add a resolution summary (the AI pre-fills one based on the actions taken)
- Set the root cause category (e.g., “Resource Exhaustion”)
- Click Close Incident
The resolution details, including the RCA result and runbook execution log, are stored in AtlasAI’s knowledge base. Future incidents with similar patterns will benefit from this resolution — the AI will reference it during RCA and suggest the same runbook with higher confidence.
What’s Next
- Using the interface — Command palette, sidebar, forms, and error handling
- Alert to resolution — Full journey with when/why for each step
- Explore the RCA Lab for advanced root cause analysis features
- Learn about Runbook automation levels
- Set up Correlation rules to auto-group related alerts into incidents
- Configure the AI Copilot for natural-language incident investigation