Device check and down alert question

I have noticed that I see down notices in Observium often when the snmp check fails (I have only seen this when checking servers (Windows/Linux). The device is up and a ping/icmp check would see this but because the snmp check had a problem I get a down indication. Sometimes it has to deal with the snmp service on the host being checked hanging or crashing, other times I have seen load or congestion cause it.
Is there an option or would it make sense to add a feature that would check snmp first and in the event of an error fall back and do a ping/icmp check to see if the device is up or not? It would be beneficial to see if the server is up and get a notice about a snmp check failure. I could also see a check flow that would check snmp first, if it errored it would check with a ping, if the ping returns wait for the next snmp check before sending the notice or marking the device down.
Ron Culler

Hi Ron,
We always ping first (unless you disable ping) because that way the poller can skip down devices faster than waiting for SNMP (retries +1) * timeouts.
Do I understand your question is that you would like to know if the device is not down, but SNMP on it is?
Have I got this for you:
save image
+
save image
This will give you different alerts based on whether ping or snmp is the issue.
Tom
On 2015-12-03 14:44, Ron Culler wrote:
I have noticed that I see down notices in Observium often when the snmp check fails (I have only seen this when checking servers (Windows/Linux). The device is up and a ping/icmp check would see this but because the snmp check had a problem I get a down indication. Sometimes it has to deal with the snmp service on the host being checked hanging or crashing, other times I have seen load or congestion cause it.
Is there an option or would it make sense to add a feature that would check snmp first and in the event of an error fall back and do a ping/icmp check to see if the device is up or not? It would be beneficial to see if the server is up and get a notice about a snmp check failure. I could also see a check flow that would check snmp first, if it errored it would check with a ping, if the ping returns wait for the next snmp check before sending the notice or marking the device down.
Ron Culler
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
participants (2)
-
Ron Culler
-
Tom Laermans