Creating Monitoring Policies

Monitoring policies define what you monitor (targets), how you collect (metrics, logs, events), and when to alert. This guide walks you through creating and managing them in an enterprise setup.

Where to find monitoring policies

Network Monitoring (or Monitoring, Infrastructure) in the left sidebar — Policies, targets, and alert rules.
Policies tab — List of monitoring policies (name, description, target type, enabled).
Targets — Per-policy list of CIs or resources being monitored (e.g., hosts, network devices).
Alert rules — Rules that evaluate metrics and create incidents when thresholds are breached.

Capture from: Tenant Plane → CONFIGURE / Network Monitoring → Policies. Add as public/img/monitoring-policies-list.png.

Creating a monitoring policy

Go to Network Monitoring (or Monitoring) → Policies.
Click New policy or Create monitoring policy.
Fill in:
- Name — e.g., “Production Linux servers.”
- Description — Short note on scope (e.g., “All RHEL hosts in prod”).
- Target type — What this policy applies to (e.g., host, network device, Kubernetes node). Depends on your CMDB and discovery.
- Target filters (optional) — Narrow by:
  - CI types — e.g., linux_server, network_device
  - Service IDs — Specific services
  - Environments — e.g., production, staging
- Collection config:
  - Events — Collect events (e.g., state changes).
  - Metrics — Collect metrics (CPU, memory, custom).
  - Logs — Collect logs from targets.
  - Traces — Collect traces if applicable.
  - Interval (seconds) — How often to collect (e.g., 60).
Enable the policy and save.

Capture from: Tenant Plane → Network Monitoring → Create policy. Add as public/img/monitoring-policy-form.png.

Adding targets to a policy

Targets are the actual CIs (hosts, devices) that the policy applies to. They can be added manually or by enforcing the policy (syncing from CMDB/discovery).

Open the policy (click its name).
Go to the Targets tab or section.
Option A — Manual: Click Add target. Select or enter CI ID, collector type (e.g., Prometheus, SNMP, WMI), and save.
Option B — Enforce: Click Enforce (or Sync targets). The system matches the policy’s target type and filters against CMDB/discovery and creates or updates targets. Review the result (e.g., “5 targets created, 2 updated”).

Capture from: Tenant Plane → Monitoring policy → Targets. Add as public/img/monitoring-targets.png.

Creating alert rules

Alert rules define conditions that create incidents (e.g., CPU > 80% for 5 minutes).

From the policy (or the Alert rules area), click New alert rule or Add rule.
Configure:
- Name — e.g., “High CPU on Linux.”
- Metric — The metric key (e.g., cpu_pct, memory_usage). Must be one your collectors emit.
- Operator — e.g., >, <, >=, <=.
- Threshold — Numeric value (e.g., 80).
- Severity — e.g., Critical, Warning.
- Scope — This policy only, or a specific target (if supported).
Enable and save. Rules are evaluated on a schedule (e.g., every minute); when the condition is met, an incident is created (and optionally a major incident can be declared).

Capture from: Tenant Plane → Monitoring policy → Alert rules → New rule. Add as public/img/alert-rule-form.png.

Evaluating and testing rules

Some UIs offer Evaluate or Test for a policy or rule — this runs the rule against current metric data and shows which targets would fire.
Use this to confirm thresholds and avoid noise before enabling in production.

Running collection

After targets exist, collection runs on a schedule (e.g., via cron or a collector runner). You can also trigger a one-off collection for a policy to verify:

Open the policy.
Click Run collection (or Collect now). Results show targets processed, succeeded, failed, and metrics ingested.

Example: End-to-end monitoring policy

Create policy “Production Linux servers” — target type Host, filters: CI type linux_server, environment production. Collect metrics + events, interval 60s.
Enforce — 20 targets created from CMDB.
Add alert rule “High CPU” — metric cpu_pct, >, 85, severity Critical.
Enable policy and rules. When CPU exceeds 85% on any of the 20 hosts, an incident is created; from there you can run RCA and execute a runbook.

Next steps

Runbooks — Remediate when alerts fire.
Dashboard Design — Visualize the same metrics.
Command Center — Unified ops view.