Skip to Content
Getting StartedYour First Incident

Your First Incident

This tutorial walks you through the complete incident lifecycle in AtlasAI — from creation to resolution. You will create an incident, attach evidence, run AI-powered root cause analysis, generate a runbook, and review trust decisions.

Prerequisites

  • An AtlasAI tenant (any deployment model)
  • At least one data source connected (even a test Edge Agent is sufficient)

Tip: Use the command palette to jump to Incidents or Command Center quickly.

Step 1: Create an Incident

  1. Open the Command Center from the left sidebar
  2. Click New Incident in the top-right corner
  3. Fill in the incident details:
    • Title: “High CPU usage on production API server”
    • Severity: P2 — High
    • Category: Performance
    • Affected Service: Select a service from your CMDB, or type a name
  4. Click Create Incident

The incident is assigned an ID (e.g., INC-00042) and appears in the Active Incidents list.

Step 2: Add Evidence

Evidence gives the AI engine context to perform accurate root cause analysis. Click into your incident, then:

  1. Click Add Evidence in the Evidence panel
  2. Choose one or more evidence types:
    • Metrics — Attach a time-range snapshot (e.g., CPU metrics for the last 2 hours)
    • Logs — Attach a log query result (e.g., error logs from the affected host)
    • Alerts — Link correlated alerts that fired around the same time
    • Topology — Attach the service dependency graph for the affected service
  3. Click Attach for each piece of evidence

The more evidence you provide, the more accurate the RCA will be. AtlasAI can also auto-collect evidence if Edge Agents are installed on the affected hosts.

Step 3: Run Root Cause Analysis

With evidence attached, you are ready to run RCA:

  1. Click Run RCA in the incident toolbar
  2. AtlasAI’s reasoning engine will:
    • Analyze all attached evidence
    • Cross-reference with historical incidents via RAG
    • Walk the service topology to identify upstream/downstream impact
    • Generate a ranked list of probable root causes
  3. Results typically appear within 15–45 seconds

The RCA panel displays:

  • Root Cause Hypothesis — The most likely cause with a confidence score
  • Evidence Chain — How the AI arrived at the conclusion, with links to specific data points
  • Related Incidents — Historical incidents with similar patterns
  • Impact Analysis — Upstream and downstream services affected

RCA result — root cause hypothesis, evidence chain, confidence

Capture from: Tenant Plane → Incidents → open incident → Run RCA. Add as public/img/rca-result.png.

Step 4: Generate a Runbook

Once you have a root cause hypothesis, AtlasAI can generate an automated remediation runbook:

  1. Click Generate Runbook below the RCA result
  2. Review the proposed steps — each step shows:
    • Action — What will be executed (e.g., restart a service, scale a deployment)
    • Target — Which host or service the action targets
    • Risk Level — Low / Medium / High
    • Reversible — Whether the action can be rolled back
  3. Edit any steps if needed — you can add, remove, or reorder steps

The generated runbook is saved to your Runbook Library for future use.

Step 5: Review the Trust Decision

AtlasAI uses a progressive autonomy model (L0–L5) to determine how much authority the AI has:

LevelNameBehavior
L0InformAI reports findings, human does everything
L1SuggestAI suggests actions, human approves each step
L2Act & ReportAI executes low-risk actions, reports results
L3Act & AlertAI executes most actions, alerts on high-risk ones
L4Full AutoAI handles incident end-to-end, human reviews after
L5Closed LoopAI resolves and closes without human intervention

For your first incident, the system defaults to L1 (Suggest). You will see each proposed action with an Approve or Reject button. Click Approve on a step to execute it, or Approve All to run the entire runbook.

Step 6: Resolve the Incident

After the runbook executes successfully:

  1. Verify the issue is resolved by checking metrics and logs
  2. Click Resolve on the incident
  3. Add a resolution summary (the AI pre-fills one based on the actions taken)
  4. Set the root cause category (e.g., “Resource Exhaustion”)
  5. Click Close Incident

The resolution details, including the RCA result and runbook execution log, are stored in AtlasAI’s knowledge base. Future incidents with similar patterns will benefit from this resolution — the AI will reference it during RCA and suggest the same runbook with higher confidence.

What’s Next