We have an SE installation with about 260 devices being checked.

We have alerting setup for various services or checks.

Often we see false positives on the SNMP device down alert, for example devices will report SNMP down when they are not really down.

After a second or even third SNMP check we see a recovery and the uptime shows no outage.

 

I also verify this by doing and snmpwalk from observium while the “alert” is active.

 

Is there anything we can check or begin to look to find the cause. The server is not overloaded, with either memory or processor.

 

We are running 25 poller wrappers with a complete polling time of about 300 seconds.

 

 

Thanks,

 

Tim