Your First Incident

This tutorial walks you through the complete incident lifecycle in AtlasAI — from creation to resolution. You will create an incident, attach evidence, run AI-powered root cause analysis, generate a runbook, and review trust decisions.

Prerequisites

An AtlasAI tenant (any deployment model)
At least one data source connected (even a test Edge Agent is sufficient)

Tip: Use the command palette to jump to Incidents or Command Center quickly.

Step 1: Create an Incident

Open the Command Center from the left sidebar
Click New Incident in the top-right corner
Fill in the incident details:
- Title: “High CPU usage on production API server”
- Severity: P2 — High
- Category: Performance
- Affected Service: Select a service from your CMDB, or type a name
Click Create Incident

The incident is assigned an ID (e.g., INC-00042) and appears in the Active Incidents list.

Step 2: Add Evidence

Evidence gives the AI engine context to perform accurate root cause analysis. Click into your incident, then:

Click Add Evidence in the Evidence panel
Choose one or more evidence types:
- Metrics — Attach a time-range snapshot (e.g., CPU metrics for the last 2 hours)
- Logs — Attach a log query result (e.g., error logs from the affected host)
- Alerts — Link correlated alerts that fired around the same time
- Topology — Attach the service dependency graph for the affected service
Click Attach for each piece of evidence

The more evidence you provide, the more accurate the RCA will be. AtlasAI can also auto-collect evidence if Edge Agents are installed on the affected hosts.

Step 3: Run Root Cause Analysis

With evidence attached, you are ready to run RCA:

Click Run RCA in the incident toolbar
AtlasAI’s reasoning engine will:
- Analyze all attached evidence
- Cross-reference with historical incidents via RAG
- Walk the service topology to identify upstream/downstream impact
- Generate a ranked list of probable root causes
Results typically appear within 15–45 seconds

The RCA panel displays:

Root Cause Hypothesis — The most likely cause with a confidence score
Evidence Chain — How the AI arrived at the conclusion, with links to specific data points
Related Incidents — Historical incidents with similar patterns
Impact Analysis — Upstream and downstream services affected

Capture from: Tenant Plane → Incidents → open incident → Run RCA. Add as public/img/rca-result.png.

Step 4: Generate a Runbook

Once you have a root cause hypothesis, AtlasAI can generate an automated remediation runbook:

Click Generate Runbook below the RCA result
Review the proposed steps — each step shows:
- Action — What will be executed (e.g., restart a service, scale a deployment)
- Target — Which host or service the action targets
- Risk Level — Low / Medium / High
- Reversible — Whether the action can be rolled back
Edit any steps if needed — you can add, remove, or reorder steps

The generated runbook is saved to your Runbook Library for future use.

Step 5: Review the Trust Decision

AtlasAI uses a progressive autonomy model (L0–L5) to determine how much authority the AI has:

Level	Name	Behavior
L0	Inform	AI reports findings, human does everything
L1	Suggest	AI suggests actions, human approves each step
L2	Act & Report	AI executes low-risk actions, reports results
L3	Act & Alert	AI executes most actions, alerts on high-risk ones
L4	Full Auto	AI handles incident end-to-end, human reviews after
L5	Closed Loop	AI resolves and closes without human intervention

For your first incident, the system defaults to L1 (Suggest). You will see each proposed action with an Approve or Reject button. Click Approve on a step to execute it, or Approve All to run the entire runbook.

Step 6: Resolve the Incident

After the runbook executes successfully:

Verify the issue is resolved by checking metrics and logs
Click Resolve on the incident
Add a resolution summary (the AI pre-fills one based on the actions taken)
Set the root cause category (e.g., “Resource Exhaustion”)
Click Close Incident

The resolution details, including the RCA result and runbook execution log, are stored in AtlasAI’s knowledge base. Future incidents with similar patterns will benefit from this resolution — the AI will reference it during RCA and suggest the same runbook with higher confidence.

What’s Next

Using the interface — Command palette, sidebar, forms, and error handling
Alert to resolution — Full journey with when/why for each step
Explore the RCA Lab for advanced root cause analysis features
Learn about Runbook automation levels
Set up Correlation rules to auto-group related alerts into incidents
Configure the AI Copilot for natural-language incident investigation