Hi All,
Sometimes I get host down notifications from Observium at the start of
a 5 minute poll. Five minutes later I get a host up alert for the same
host. I believe that sometimes a busy host will drop the 5 minute
"probe" from Observium which is presumably a UDP SNMP packet. As soon
as I get such a notification I check the host in question, it is
usually up unless it really has gone down, just a bit busy perhaps.
[1]
Where is this initial checking of device up/down state happening in
the code? I have been looking through include/polling and I can see
various scripts for different device types. Is there a generic host
up/down check that happens for all devices, I'd like to hack in a "if
a device recorded as up seems down, wait and check again a couple more
times, then mark as down".
Cheers,
James.
[1] So that is a separate issue I need to tackle, ensuring there is no
packet loss to hosts and ensuring they aren't too busy to respond. I
don't wish to discuss that here though.