> For the complete documentation index, see [llms.txt](https://docs.agentalk.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.agentalk.io/monitoring-and-analytics/monitor-system-health.md).

# Monitor system health

System health monitoring tells you, in one place, whether your agents are operating correctly. Instead of scanning call logs for failures, you create **monitors** that watch specific events, then raise an **issue** when a threshold is crossed.

Each threshold can send alerts to a webhook, one or more email addresses, or both. This lets you route different severities to different destinations.

### What system health monitoring is for

The Health hub has four areas:

### The dashboard — `/health`

#### What it does

The dashboard gives you a real-time summary of your system state.

At the top of the page, a large status banner shows the current health state:

* **Green** when everything is operating normally.
* **Yellow** or **Red** when active warning or critical issues exist.

When issues are active, severity chips such as **3 Critical** or **2 Warning** appear beside the banner heading.

The banner also shows:

* Total calls in the last 24 hours
* Impacted calls
* Impact percentage
* Affected bots
* **Updated X ago**

Click **Refresh** to fetch the latest snapshot immediately. The page also refreshes automatically every 30 seconds.

#### How to use it

1. Open **Health** from the sidebar.
2. Check the status banner first.
3. If the banner is yellow or red, scroll to **What's Wrong?**
4. Review the issue cards in severity order.
5. Click a card body to open the issue details page.

Below the banner, the page is split into three sections:

**What's Wrong?**

This section shows up to 10 active issue cards.

Each card includes:

* A severity badge and category icon
* The issue time
* The monitor that triggered it
* The affected bot
* **Acknowledge**
* **Resolve**
* **Show Tech Details**

Click **Acknowledge** to mark the issue as seen. Click **Resolve** to mark it fixed and remove it from the active list.

Click **Show Tech Details** to expand the card. The expanded area shows:

* Raw `event_type`
* Observed value
* Trigger value
* Source JSON payload

Use this when you need to share exact details with engineering.

If more than 10 active issues exist, click **Load More Issues** to open the full list.

**What Happened Recently?**

This section summarizes recent health activity across:

* Last 1 hour
* Last 24 hours
* Last 7 days

When all three windows are clear, the section collapses into a single all-clear card.

**What Do I Do?**

This section gives you two shortcuts:

* **View All Events**
* **Manage Monitors**

Use them when you want to move from summary into investigation or configuration.

### Issues — `/health/issues`

#### What it does

The Issues page is the complete, filterable history of every issue the system raised.

Use it to triage problems, review issue history, and process older incidents.

#### How to use it

1. Open **Issues** from the dashboard.
2. Use the filter row at the top to narrow the list.
3. Use the sort dropdown to change the order.
4. Use the action buttons in each row to update issue state.
5. Click the chevron at the end of a row to open the full issue detail page.

You can filter by:

* **Severity** — Critical, Warning, Info
* **Category** — Infrastructure, Technical, Effectiveness, Compliance
* **Status** — Active, Acknowledged, Resolved
* **Event Type**
* **Monitor**
* **Voicebot**

Click **Clear Filters** to reset the list. This button appears only when one or more filters are active.

The **Actions** column changes by issue state:

* **Active** — **Acknowledge**, **Resolve**
* **Acknowledged** — **Resolve**, **Reopen**
* **Resolved** — **Reopen**

The list paginates 25 rows at a time. Use **Previous** and **Next** at the bottom of the page to move through the history.

#### Issue detail — `/health/issues/:id`

#### What it does

The issue detail page shows the full context behind a single incident.

#### How to use it

1. Open any issue from the dashboard or issues list.
2. Start in **What Happened** to confirm the detection time, bot, and monitor.
3. Review **Impact** to compare the observed value against the configured trigger.
4. Click **Load** in **Source Events** to fetch related events from the last six hours.
5. Click **Open in Events** to jump into the Events page with the filter already applied.
6. Use the buttons at the bottom to **Acknowledge**, **Resolve**, or **Reopen** the issue.

The page includes four sections:

* **What Happened** — issue summary and remediation hint
* **Impact** — observed value, trigger value, and state change history
* **Source Events** — related events with a matching `event_type`
* **Actions** — state-aware issue controls

### Monitors — `/health/monitors`

#### What it does

The Monitors page lists every monitor in your organization. It also lets you create, edit, enable, disable, and delete monitors.

#### How to use it

1. Open **Monitors** from the Health hub.
2. Click **Create Monitor** to add a new rule.
3. Review the monitor table to see what each rule watches.
4. Use the **Active** toggle to pause a monitor without deleting it.
5. Use the pencil icon to edit a monitor.
6. Use the trash icon to delete a monitor.

The table shows these columns:

* **Name** — click the name to edit
* **Voicebot** — one bot or **All Voicebots**
* **Events** — event type badges
* **Active** — on or off
* **Issues** — total issues raised by that monitor
* **Actions** — edit and delete controls

If a monitor watches more than three event types, the table shows the first three and a `+N` badge for the rest.

When you click the trash icon, a confirmation modal opens with the monitor name. Click **Cancel** to keep it, or **Delete** to remove it.

If your organization has no monitors yet, the page shows **No monitors created**.

### Create or edit a monitor — `/health/monitors/new`

This is the main configuration screen for health alerts.

The page title shows **Create Monitor** for a new rule, or **Edit Monitor** for an existing one. A back arrow at the top returns you to the monitor list without saving.

### Step 1 — Basic info

#### What it does

This section defines the monitor name, scope, and active state.

#### How to use it

Fill in these fields:

* **Name** — required
* **Voicebot** — defaults to **All Voicebots**
* **Is Active** — enabled by default

Type a clear name in **Name**. This label appears in monitor lists, issues, and notifications.

Open **Voicebot** and choose one specific bot if you want the monitor to apply only there. Leave it on **All Voicebots** if the rule should apply across the organization.

Use **Is Active** to pause a monitor without deleting it.

### Step 2 — Pick the events to watch

#### What it does

This section tells the monitor which event types to evaluate.

#### How to use it

1. Scroll to **Event Types**.
2. Select one or more checkboxes.
3. Hover disabled items labeled **Coming Soon** to see why they are unavailable.

Each event type shows a label and a short description. Some options are visible but semi-transparent because they are not selectable yet.

A monitor must watch at least one event type. If you leave this section empty, saving fails with a validation error.

### Step 3 — Define thresholds

#### What it does

Thresholds define when a monitor raises an issue.

Each threshold has its own:

* Severity
* Trigger rule
* Check schedule
* Webhook URL
* Alert email list

This means **Critical**, **Warning**, and **Info** alerts can each go to different destinations.

#### How to use it

When the form opens, one **Warning** threshold is added by default.

1. Click **+ Add Threshold** to add another threshold.
2. Stop at three thresholds. The button disables after that.
3. Click the trash icon beside a threshold to remove it.

You can only remove a threshold when more than one threshold exists.

For each threshold, fill in the following fields:

| Field               | What it controls                       |
| ------------------- | -------------------------------------- |
| **Severity**        | Critical, Warning, or Info             |
| **Trigger Type**    | **Count** or **Percent**               |
| **Trigger Value**   | The number that raises the issue       |
| **Check Frequency** | How often the threshold is evaluated   |
| **Lookback**        | The time window used during evaluation |
| **Webhook URL**     | Optional endpoint for alert delivery   |

Use **Severity** to define the alert level. You cannot use the same severity twice in one monitor. Already-used severities appear disabled in the dropdown.

Use **Trigger Type** to choose how the threshold behaves:

* **Count** raises an issue after a number of matching events.
* **Percent** raises an issue after a percentage of affected calls.

Use **Trigger Value** to set the threshold number:

* For **Percent**, enter a value from 1 to 100.
* For **Count**, enter any value greater than or equal to 1.

Use **Check Frequency** to control how often the system evaluates the rule. Use **Lookback** to control the time window.

For example:

* **Count = 10**
* **Lookback = 1 hour**

This means the system raises an issue after 10 matching events in the last hour.

#### Webhook URL

Use **Webhook URL** to send alerts for this threshold to an external system.

1. Click the **Webhook URL** field.
2. Paste an HTTPS endpoint.
3. Save the monitor.

When this threshold is breached, the system sends a JSON alert to that endpoint. Leave the field blank if you do not want webhook delivery for that severity.

#### Alert Emails

Use **Alert Emails** to send email alerts for this threshold.

1. Scroll below the threshold field grid.
2. Type an email address into **Alert Emails**.
3. Click **Add**, or press **Enter**.
4. Repeat for each recipient you want.
5. Click the **×** on a chip to remove a recipient.

Each email becomes a chip below the input.

Because each threshold has its own **Webhook URL** and **Alert Emails** list, you can route **Critical** alerts to one destination and **Warning** alerts to another.

### Step 4 — Save

#### What it does

This step validates the monitor and stores the configuration.

#### How to use it

1. Review the form for validation errors.
2. Fix any issue shown in the red error card.
3. Click **Save**.
4. Wait for the save to complete.
5. Confirm the success toast, then continue from the monitors list.

If the form is invalid, a red error card appears above the action buttons. It can include problems such as:

* Missing name
* No event types selected
* Duplicate severities
* Invalid trigger values
* Invalid email addresses

Click **Cancel** to discard your changes and return to the monitor list.

The **Save** button disables while the request is in progress.

### Events — `/health/events`

#### What it does

The Events page is the raw event feed behind system health.

Every meaningful system action appears here. Monitors react to these events. Use this page when you need to inspect the source data directly.

#### How to use it

1. Open **Events** from the Health hub.
2. Use the quick filter chips above the table to narrow the feed.
3. Use the main filter card to refine time, severity, and bot scope.
4. Click **Advanced** to expose deeper filters.
5. Click any row to open the event detail dialog.
6. Use **View Related** inside the dialog to follow events with the same trace.

The quick filter chips are:

* **Errors**
* **AI**
* **Webhooks**
* **Billing**

The main filters include:

* **Severity** — Debug, Info, Warn, Error, Critical
* **Voicebot**
* **Time From**
* **Time To**

Use the quick time buttons to move faster:

* **Last 24h**
* **Last 7d**
* **Clear time**

Click **Advanced** to expose:

* **Source**
* **Category**
* **Trace ID**
* **Event ID**
* **Fault Class**

Use **Trace ID** when you want to follow a single call across every related event.

Click **Clear Filters** to reset the page. Click **Refresh** to reload the current view.

The table shows:

* **Time**
* **Severity**
* **Category**
* **Source**
* **Event Type**
* **Message**

At the bottom of the page, use **Previous** and **Next** to move through paginated results.

If you opened Events from an issue using **Open in Events**, the page opens with matching filters already applied. Remove the filter chip with the **×** icon when you want to broaden the view again.

### Next steps

Here are the recommended next steps after completing this guide:

* [**Listen to and join live calls**](/monitoring-and-analytics/listen-to-and-join-live-calls.md): Watch active calls in real time.
* [**Read the analytics dashboard**](/monitoring-and-analytics/read-the-analytics-dashboard.md): Track performance trends across your workspace.
* [**Review call history and logs**](/monitoring-and-analytics/review-call-history-and-logs.md): Investigate individual calls in detail.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.agentalk.io/monitoring-and-analytics/monitor-system-health.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
