Hi,
I’m looking for pointers on how to implement alert checkers with the following requirements:
(A) test condition: [ sensor_value greater @sensor_limit ] + buffer - the [ ] part itself works, but I need to add a buffer (offset) for some checks not to throw false positives - I’m aware of the delay option, but this doesn’t help in my case - spelling out something like "sensor_value greater @sensor_limit + 5” doesn’t work for me
(B) aggregate / pool over devices -check a number of devices and only alert me if, eg. 5/10 are over specific criteria, e.g.: - check all internal PDU temperatures - if all sensor_value greater X --> alert
- I know how to check each device individually: (device.device_id in 82|83|84|85|86|87|88|89|90|91|92|93|94|95|96|97|98|99 AND sensor.sensor_descr regexp internal)
- how do I convert this into a “if true for all devices” statement?
Lastly, I’m looking for documentation and examples on how to invoke scripts from alert checkers. The documentation appears to be quite sparse [1] in this regard. I’m especially interested on how to extract / grab additional data from observium (ie from other devices) if specify test conditions fail.
cheers, Andreas