Server Agent

Agent Log Search Monitoring

Configure the Alert24 server agent to search log files, the systemd journal, and Windows Event Logs locally and alert on error spikes, error rate, log floods, and log silence — without shipping your logs.

The Alert24 server agent — the same lightweight script you install for CPU, memory, disk, and service-status monitoring — can also search your logs. Matching happens entirely on your server. Only match counts and up to 5 sample lines per search are sent to Alert24. Your raw logs never leave the host: no log shipping, no central log store, and no per-GB ingestion pricing.

When a rule fires, the matched sample lines are attached to the incident, so on-call sees what's actually in the log — not just "error count exceeded."

Plan requirement: Server agents are included on every plan (3 on the free plan, 5 per unit on paid plans). Log search monitoring requires a paid plan.

Agent update required: Log search is part of newer agent builds. If log_searches is ignored, update the agent first — see Troubleshooting.

How it works

  1. You define one or more log searches in the agent config (a glob path or system-log source plus a regex pattern).
  2. Each interval, the agent reads only the new lines added since the previous run, matches them against your patterns, and counts the hits.
  3. The agent sends the counts (and up to 5 sample matched lines) to Alert24.
  4. Alert rules evaluate those metrics and open or resolve incidents automatically.

The agent reads incrementally and handles log rotation, so it never re-alerts on historical logs. On first run it establishes a baseline at the current end of each file — it does not alert on lines written before the search existed.

Configuring log_searches

Add a log_searches array to the agent's JSON config. Each entry needs a unique name, a source, and source-specific fields.

"log_searches": [
  { "name": "app_errors", "source": "file", "path": "/var/log/myapp/*.log", "pattern": "ERROR|FATAL" },
  { "name": "nginx_5xx", "source": "file", "path": "/var/log/nginx/access.log", "pattern": "\" 5\\d\\d " },
  { "name": "ssh_failures", "source": "journald", "unit": "sshd", "pattern": "Failed password" },
  { "name": "failed_logins", "source": "windows_event", "channel": "Security", "event_ids": [4625] },
  { "name": "system_errors", "source": "windows_event", "channel": "System", "min_level": "error" }
]

Build your config interactively → Use the Log Search Config Builder to pick a source, test your pattern against sample lines, add alert rules, and copy ready-to-paste JSON — all in your browser.

Source: file (Linux, macOS, Windows)

Searches plain-text log files.

Field Required Description
name Yes Unique identifier referenced by alert rules.
source Yes "file".
path Yes File path or glob (e.g. /var/log/myapp/*.log). All matches are read.
pattern Yes Regex evaluated against each new line. Counts the lines that match.

Because pattern is a regex inside JSON, backslashes must be escaped. For example, to match an nginx access-log 5xx status (" 5xx "), the regex " 5\d\d becomes "\" 5\\d\\d " in JSON.

Source: journald (Linux)

Searches the systemd journal — no file logging configuration required.

Field Required Description
source Yes "journald".
unit Yes The systemd unit to read (e.g. sshd, nginx, myapp).
pattern Yes Regex evaluated against each new journal entry's message.

Source: windows_event (Windows)

Searches the Windows Event Log. Reads the System, Application, and Security channels.

Field Required Description
source Yes "windows_event".
channel Yes System, Application, or Security.
event_ids Optional Array of event IDs to match (e.g. [4625] for failed logins).
min_level Optional Minimum level: information, warning, error, or critical.
pattern Optional Regex evaluated against the event message for finer matching.

Provide event_ids, min_level, and/or pattern — an event must satisfy all of the criteria you specify. Common examples:

  • Failed loginschannel: "Security", event_ids: [4625]
  • System errorschannel: "System", min_level: "error"
  • Application crasheschannel: "Application", event_ids: [1000]

Alert rules

Rules turn search metrics into incidents. A rule can live in the agent config or be set in the Alert24 UI. Each rule references a log_search by name, supports a sustained-duration threshold and a severity, and auto-resolves its incident when the condition clears.

{ "metric": "log_error_rate", "log_search": "nginx_5xx", "operator": "gt", "threshold": 2, "duration_seconds": 300, "severity": "high" }

This fires a high-severity incident when more than 2% of lines match the nginx_5xx search for 5 minutes straight.

Rule fields

Field Description
metric One of the four metric types below.
log_search The name of the log search this rule evaluates.
operator Comparison operator, typically gt (greater than) or lt (less than).
threshold The value to compare against (count, percentage, or lines-per-minute).
duration_seconds How long the condition must hold before firing — filters out brief blips.
severity Incident severity, e.g. low, medium, high, critical.

Metric types

Metric Fires when… Threshold means… Example
log_match_count Match count crosses a threshold. Number of matching lines. App logs ERROR|FATAL more than 20 times in the interval.
log_match_rate Matches per minute crosses a threshold. Matches per minute. More than 5 failed SSH logins per minute.
log_error_rate Matches as a percent of total lines crosses a threshold. Percentage (e.g. 2 = 2%). More than 2% of nginx requests are 5xx.
log_volume Total lines per minute is too high — or too low (use lt). Lines per minute. A retry storm writes 50,000 lines/minute — or a chatty app goes silent.

Error spike — log_match_count

{ "metric": "log_match_count", "log_search": "app_errors", "operator": "gt", "threshold": 20, "duration_seconds": 300, "severity": "high" }

Error rate — log_error_rate

{ "metric": "log_error_rate", "log_search": "nginx_5xx", "operator": "gt", "threshold": 2, "duration_seconds": 300, "severity": "high" }

Log volume / flood — log_volume + gt

{ "metric": "log_volume", "log_search": "app_errors", "operator": "gt", "threshold": 5000, "duration_seconds": 120, "severity": "medium" }

Log silence (deadman) — log_volume + lt

{ "metric": "log_volume", "log_search": "app_heartbeat_lines", "operator": "lt", "threshold": 1, "duration_seconds": 180, "severity": "high" }

For a silence alert, point a search at a log line your app emits continuously under normal operation, then alert when log_volume drops below a floor. If the app stops logging, it has likely stopped working — even though uptime and ping checks against the host still pass.

Sample lines in the incident

Each interval the agent includes up to 5 sample matched lines per search. When a rule opens an incident, those samples are attached, so the on-call responder sees the actual offending log lines without SSH-ing into the box. To limit exposure of sensitive data, keep patterns specific; only matching lines are ever sampled, and only up to five of them.

Troubleshooting

log_searches is ignored / no log metrics appear. Your agent predates log search. Re-run the installer to update to the latest build, then restart the agent. Confirm the config file contains a top-level log_searches array (not nested under another key).

No alerts on existing errors right after setup. Expected. On first run the agent baselines at the current end of each file/journal and only evaluates new lines from that point forward. It never alerts on historical logs. Trigger a fresh log line to verify.

Log rotation. The agent detects rotation (truncation or inode change) and continues from the new file without missing lines or re-reading the rotated-out file. No configuration is needed.

Regex isn't matching. Remember the pattern is a regex inside JSON — escape backslashes (\\d, not \d) and quotes (\"). Test your pattern against a sample line before deploying.

Too many false positives. Increase duration_seconds so the condition must be sustained, raise the threshold, or make the pattern more specific.

No matching files for a glob. If a path glob matches no files, that search reports zero matches rather than erroring. Verify the path and the agent user's read permissions on the log files.

Related