Hi All,
Sometimes I get host down notifications from Observium at the start of a 5 minute poll. Five minutes later I get a host up alert for the same host. I believe that sometimes a busy host will drop the 5 minute "probe" from Observium which is presumably a UDP SNMP packet. As soon as I get such a notification I check the host in question, it is usually up unless it really has gone down, just a bit busy perhaps. [1]
Where is this initial checking of device up/down state happening in the code? I have been looking through include/polling and I can see various scripts for different device types. Is there a generic host up/down check that happens for all devices, I'd like to hack in a "if a device recorded as up seems down, wait and check again a couple more times, then mark as down".
Cheers, James.
[1] So that is a separate issue I need to tackle, ensuring there is no packet loss to hosts and ensuring they aren't too busy to respond. I don't wish to discuss that here though.