Re: [Observium] Many false positives

23 Dec 2014

      There is no time frame to report; it didn't return a ping (in time) at 
the exact moment the message is logged.
These are the default ping config settings:
#$config['ping']['retries'] = 3;    // How many times to retry ping (1 - 10)
#$config['ping']['timeout'] = 500;  // Timeout in milliseconds (50 - 2000)
So, 3 missed-or->500ms pings, it would seem.
There are currently no device-specific ping settings, this is on my 
to-do somewhere.
You can try to enable this:
$config['ping']['debug']        = TRUE;    // If TRUE store ping errors 
into logs/debug.log file
and check the debug log.
Tom
On 22/12/2014 23:55, John Brown wrote:
...
Yes, I see the "Device Status changed to Down (PING)" in the log.
The conflict I have with this that it doesn't provide any more 
detailed information.
How many pings failed, time frame, etc
I am running TCPDUMP on a monitor/span port that the ONOS is connected 
to and I see ICMP packets going out to devices and I see their reply 
packets come back.
Over a 15 minute period of time a host will be reported as DOWN, yet 
the ICMP packet flow shows echo_request / echo_reply pairs without 
undo delay.
Other machines on the same LAN subnet as the ONOS host also show no 
dropped ICMP packets.
Hence why I'm asking about additional debugging tools within ONOS..
Thanks
On Mon, Dec 22, 2014 at 3:39 PM, Tom Laermans 
<tom.laermans@powersource.cx mailto:tom.laermans@powersource.cx> wrote:
Observium... Bonitoring(?) does tell you why it's down. It doesn't
receive a reply either over ICMP echo or over SNMP; this is noted
in the event log when the host goes down.

Tom

On 22/12/2014 23:05, John Brown wrote:

...
Hi

I'm trying to troubleshoot the many false positives we are
receiving from OB.

The system will report a host as down, yet our legacy Nagios and
out-of-band Pingdom do not show the host as down.

It doesn't appear that OB records in the log what specifically is
making OB think the host is down.

I've increased the SNMP time out value to 3 seconds (which seems
very long) and that has helped with some hosts, mostly Mikrotiks.

But I doubt that our Juniper MX480's (which are lightly loaded)
should need such long time frame to respond.

How can I get OB to report what is the actual trigger for its
"Host Down" alerts ??

Are there tweaks for performance monitoring / testing ??

Thank you in advance..

_______________________________________________
observium mailing list
observium@observium.org  <mailto:observium@observium.org>
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

_______________________________________________
observium mailing list
observium@observium.org <mailto:observium@observium.org>
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium