![](https://secure.gravatar.com/avatar/1ad1fea3507644b06eec6b0404862d2e.jpg?s=120&d=mm&r=g)
Hi All,
Sometimes I get host down notifications from Observium at the start of a 5 minute poll. Five minutes later I get a host up alert for the same host. I believe that sometimes a busy host will drop the 5 minute "probe" from Observium which is presumably a UDP SNMP packet. As soon as I get such a notification I check the host in question, it is usually up unless it really has gone down, just a bit busy perhaps. [1]
Where is this initial checking of device up/down state happening in the code? I have been looking through include/polling and I can see various scripts for different device types. Is there a generic host up/down check that happens for all devices, I'd like to hack in a "if a device recorded as up seems down, wait and check again a couple more times, then mark as down".
Cheers, James.
[1] So that is a separate issue I need to tackle, ensuring there is no packet loss to hosts and ensuring they aren't too busy to respond. I don't wish to discuss that here though.
![](https://secure.gravatar.com/avatar/21caf0a08d095be7196a1648d20942be.jpg?s=120&d=mm&r=g)
Hi James,
Net-SNMP already has a $retries * $timeout delay before marking it as not receiving a response. You can try upping those settings in your configuration file, however note that depending on how you toggle it this may severely impact your polling durations when the device really is down. Make sure to use poller-wrapper instead of simple parallel pollers in this case, so one down device will not bother the rest of the devices polled.
The up/down code is a simple snmpget for sysDescr, I believe, executed at the start of the poll (includes/polling/functions.inc.php).
Tom
On 08/20/2013 12:14 PM, James Bensley wrote:
Hi All,
Sometimes I get host down notifications from Observium at the start of a 5 minute poll. Five minutes later I get a host up alert for the same host. I believe that sometimes a busy host will drop the 5 minute "probe" from Observium which is presumably a UDP SNMP packet. As soon as I get such a notification I check the host in question, it is usually up unless it really has gone down, just a bit busy perhaps. [1]
Where is this initial checking of device up/down state happening in the code? I have been looking through include/polling and I can see various scripts for different device types. Is there a generic host up/down check that happens for all devices, I'd like to hack in a "if a device recorded as up seems down, wait and check again a couple more times, then mark as down".
Cheers, James.
[1] So that is a separate issue I need to tackle, ensuring there is no packet loss to hosts and ensuring they aren't too busy to respond. I don't wish to discuss that here though. _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
![](https://secure.gravatar.com/avatar/1ad1fea3507644b06eec6b0404862d2e.jpg?s=120&d=mm&r=g)
On 20 August 2013 11:18, Tom Laermans tom.laermans@powersource.cx wrote:
Net-SNMP already has a $retries * $timeout delay before marking it as not receiving a response.
Hi Tom,
Many thanks for the info. That is exactly what I needed!
None of my devices have these values configured individually, and they aren't currently configured in config.php.
If I echo them out I get not a variable defined error (for $config['snmp']['timeout'] and $config['snmp']['retires'])
Do you know what the default values are for these when not defined? Also, is the timeout value in seconds do you know?
Many thanks, James.
![](https://secure.gravatar.com/avatar/596df2eb40eb45cea0d6291bc6b4860e.jpg?s=120&d=mm&r=g)
Hey James,
Check that 'retries' is spelt correctly.
Mike Y.
-----Original Message----- From: observium [mailto:observium-bounces@observium.org] On Behalf Of James Bensley Sent: Tuesday, August 20, 2013 8:35 AM To: Observium Network Observation System Subject: Re: [Observium] Host up/down polling
On 20 August 2013 11:18, Tom Laermans tom.laermans@powersource.cx wrote:
Net-SNMP already has a $retries * $timeout delay before marking it as not receiving a response.
Hi Tom,
Many thanks for the info. That is exactly what I needed!
None of my devices have these values configured individually, and they aren't currently configured in config.php.
If I echo them out I get not a variable defined error (for $config['snmp']['timeout'] and $config['snmp']['retires'])
Do you know what the default values are for these when not defined? Also, is the timeout value in seconds do you know?
Many thanks, James. _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
ARDEN A Global Company Celebrating over 48 years of making your life more comfortable!
This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message.
This OUTBOUND E-mail and Document(s) has been scanned by an Antivirus Server.
![](https://secure.gravatar.com/avatar/1ad1fea3507644b06eec6b0404862d2e.jpg?s=120&d=mm&r=g)
Hi Mike,
Yeah I saw that after sending the email, I was typing if from memory, but thank you for pointing that out :)
Cheers, James.
participants (3)
-
James Bensley
-
Micah Young
-
Tom Laermans