Hi James,
Net-SNMP already has a $retries * $timeout delay before marking it as not receiving a response. You can try upping those settings in your configuration file, however note that depending on how you toggle it this may severely impact your polling durations when the device really is down. Make sure to use poller-wrapper instead of simple parallel pollers in this case, so one down device will not bother the rest of the devices polled.
The up/down code is a simple snmpget for sysDescr, I believe, executed at the start of the poll (includes/polling/functions.inc.php).
Tom
On 08/20/2013 12:14 PM, James Bensley wrote:
Hi All,
Sometimes I get host down notifications from Observium at the start of a 5 minute poll. Five minutes later I get a host up alert for the same host. I believe that sometimes a busy host will drop the 5 minute "probe" from Observium which is presumably a UDP SNMP packet. As soon as I get such a notification I check the host in question, it is usually up unless it really has gone down, just a bit busy perhaps. [1]
Where is this initial checking of device up/down state happening in the code? I have been looking through include/polling and I can see various scripts for different device types. Is there a generic host up/down check that happens for all devices, I'd like to hack in a "if a device recorded as up seems down, wait and check again a couple more times, then mark as down".
Cheers, James.
[1] So that is a separate issue I need to tackle, ensuring there is no packet loss to hosts and ensuring they aren't too busy to respond. I don't wish to discuss that here though. _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium