Skip to Content
Edge AgentTroubleshooting

Edge Agent Troubleshooting

This guide covers common issues with the Edge Agent and how to resolve them.

Agent Not Starting

Symptom: The agent service fails to start or crashes immediately.

Check the service status:

sudo systemctl status atlasai-agent

Check the agent logs:

sudo journalctl -u atlasai-agent --no-pager -n 50

Common causes:

IssueSolution
Configuration file missingRun the installer again or create /etc/atlasai/agent.yaml manually
Invalid YAML syntaxValidate with atlas-agent validate-config
Port already in useCheck if another process is using the metrics port (9100)
Permission deniedEnsure the agent runs as root or a user with appropriate permissions
Binary not foundVerify /usr/local/bin/atlas-agent exists and is executable

Agent Not Connecting to Tenant Plane

Symptom: Agent is running but shows as “Disconnected” in the UI.

Diagnose connectivity:

curl -v https://<tenant-plane-url>:8443/api/health

Common causes:

IssueSolution
Wrong Tenant URLVerify tenant.url in /etc/atlasai/agent.yaml
Invalid API keyRegenerate the API key in Settings → Edge Agents
Firewall blockingEnsure outbound access to the Tenant Plane on port 8443
TLS certificate errorSet tls_skip_verify: true temporarily for debugging, or install the CA cert
Proxy requiredSet HTTPS_PROXY environment variable in the systemd service file

To add proxy settings:

sudo systemctl edit atlasai-agent

Add:

[Service] Environment="HTTPS_PROXY=http://proxy.corp.example.com:8080"

Then restart:

sudo systemctl daemon-reload sudo systemctl restart atlasai-agent

High Resource Usage

Symptom: Agent consuming more CPU or memory than expected.

Normal resource usage:

ResourceExpected
CPU< 1% average, < 5% during collection bursts
Memory30–60 MB RSS
Disk I/OMinimal (log file reads + buffer writes)

If usage is higher:

  1. Reduce collection frequency — Increase intervals in agent.yaml:
    collectors: system: interval: 30s process: interval: 60s
  2. Limit log paths — Narrow logs.paths to only the files you need
  3. Reduce process count — Lower process.top_n from 20 to 10
  4. Enable debug logging temporarily — Set logging.level: debug to identify which collector is consuming resources, then revert

Missing Metrics

Symptom: Some expected metrics are not appearing in the Tenant Plane.

Verify collectors are enabled:

atlas-agent status

This shows the state of each collector (running, stopped, error) and the last collection timestamp.

Common causes:

IssueSolution
Collector disabledEnable the collector in agent.yaml
Permission deniedThe agent needs read access to /proc, /sys, and log files
Mount point not listedAdd the mount point to disk.mount_points or leave empty for all
Network interface not detectedVerify interface names match what the OS reports via ip link

Log Forwarding Issues

Symptom: Logs not appearing in the AtlasAI Logs module.

Check log collector status:

atlas-agent status --collector logs

Common causes:

IssueSolution
File not matching globVerify the file path matches a pattern in logs.paths
File permissionsAgent needs read access to the log files
File excludedCheck logs.exclude_paths
Multiline misconfiguredVerify multiline.pattern matches your log format
Buffer fullIf the agent was offline, the buffer may be full — check transport.buffer_size

Runbook Execution Failures

Symptom: Runbook steps fail when executed on the agent.

Check execution logs:

sudo journalctl -u atlasai-agent -g "runbook" --no-pager -n 50

Common causes:

IssueSolution
Command blockedCheck runbook_executor.blocked_commands in the config
TimeoutIncrease runbook_executor.timeout for long-running commands
Permission deniedThe agent’s runbook executor runs as the agent user — verify permissions
Executor disabledSet runbook_executor.enabled: true

Getting Help

If these steps don’t resolve your issue:

  1. Collect a diagnostic bundle: atlas-agent diagnostics --output /tmp/atlas-diag.tar.gz
  2. Open a support ticket at support.atlastechlab.com 
  3. Attach the diagnostic bundle to the ticket