Alex Rubenstein via observium wrote on 01/10/2022 19:25:

Adam’s reply was: “We have to wait for a cron job to be run to send the alerts, because we can't do it in the syslog processing script. Iirc you can cron alerter.php to do it more frequently!”

 

This is an acceptable solution to me, but I have a couple question:

 

  1. Is there risk with how often this is run? If I cron’ed this for every minute, is there a risk of some sort of collision with the 5 minute poller?
The only real risk is that if you generate a lot of alerts in one go, the alerter.php may still be running when another one runs, and we don't put a /lot/ of effort into preventing overlaps because we generally expect alerter.php to be run after poller.php per-device, as that's the fastest way to do it.

If this is for a limited number of hosts, you could actually just run alerter.php -h <device_ids> to limit it to a set of hosts.

Worst case is a dupe alert occasionally, I think. Alerter.php is unlikely to contribute to the overload runaway scenarios you get when you overload the MySQL server with hundreds of threads!

  1. Is it possible, as a feature request, to call alerter.php -h from syslog.php when an entry comes in?
We can't do this because of the rate at which syslog entries are ingested in a lot of systems (100s per second), we don't have any time to wait for alerter to run. PHP is single-threaded.

You might have some luck persuading Mike to add per-device alerter.php invokation via the job queuing system. I'm not sure exactly if this would make sense, or if it'd make it significantly quicker, but it's probably better than just spamming alerter.php all the time.

This system allows you to queue up a job (like adding a host remotely), and it gets picked up when discovery.php -h new runs. This means it'll have a delay of up to a minute.

While I think Observium is an excellent piece of software, I wish it had just a little more ability to handle real-time alerts coming in. There are many situations that occur where an outage or event could be substantially shorter than the poller interval (like, a short UPS power loss), and I would like to have a little better quasi-real time alerting of these events.


"Real time" is the enemy of "collecting a lot of data".

The two are somewhat mutually exclusive, especially with the data collection and storage technologies we're stuck with for historical reasons (SNMP & RRD).

adam.