Skip to Content
AI & ReasoningAI Overview

AI & Reasoning Overview

AtlasAI’s AI system is purpose-built for IT operations. Unlike general-purpose AI tools, it is trained and tuned specifically for infrastructure reasoning — understanding metrics anomalies, correlating log patterns, traversing service topologies, and generating safe remediation actions.

AI Architecture

The AI system consists of several interconnected components:

┌─────────────────────────────────────────────┐ │ AI Gateway │ │ │ │ ┌─────────┐ ┌──────────┐ ┌───────────┐ │ │ │ RCA │ │ Runbook │ │ Copilot │ │ │ │ Engine │ │ Generator│ │ Engine │ │ │ └────┬────┘ └────┬─────┘ └─────┬─────┘ │ │ │ │ │ │ │ ┌────▼────────────▼──────────────▼─────┐ │ │ │ RAG Knowledge Base │ │ │ │ (incidents, runbooks, docs, CMDB) │ │ │ └──────────────────────────────────────┘ │ │ │ │ ┌──────────────────────────────────────┐ │ │ │ Model Router & Orchestrator │ │ │ │ (selects model, manages context) │ │ │ └──────────────────────────────────────┘ │ └─────────────────────────────────────────────┘

Core AI Capabilities

1. Root Cause Analysis (RCA)

The RCA engine analyzes multi-signal evidence to produce ranked root cause hypotheses. It combines:

  • Signal analysis — Pattern recognition across metrics, logs, and traces
  • Topology reasoning — Graph traversal of service dependencies to trace failure propagation
  • Historical correlation — RAG retrieval of similar past incidents and their resolutions
  • Change correlation — Linking incidents to recent deployments, config changes, or infrastructure modifications

Learn more about RCA Reasoning →

2. Runbook Generation

The AI generates structured remediation playbooks based on RCA output and historical resolutions. Generated runbooks include risk-classified steps, rollback instructions, and approval gates.

Learn more about Runbook Generation →

3. RAG Knowledge Base

Every incident resolution, runbook execution, and operator interaction is indexed into a vector knowledge base. The AI retrieves relevant context from this knowledge base during RCA and Copilot interactions.

Learn more about RAG & Knowledge →

4. Trust & Autonomy

AtlasAI’s progressive autonomy system (L0–L5) controls how much authority the AI has. Organizations start conservative and increase autonomy as trust builds.

Learn more about Trust & Autonomy →

5. Bring Your Own LLM

Enterprise customers can connect their own LLM infrastructure (Azure OpenAI, AWS Bedrock, self-hosted models) for data sovereignty and compliance.

Learn more about BYOC LLM →

AI Safety Principles

AtlasAI’s AI operates under strict safety constraints:

  • No unsupervised destructive actions — Actions that delete data or terminate production resources always require explicit approval (even at L5)
  • Explainability — Every AI decision includes an evidence chain showing how it was reached
  • Confidence thresholds — The AI only takes automated action when confidence exceeds configurable thresholds
  • Human override — Operators can always override, pause, or roll back AI decisions
  • Audit trail — Every AI action is logged with the model used, input context, and output