Yes, I see the "Device Status changed to Down (PING)" in the log.   

The conflict I have with this that it doesn't provide any more detailed information.
How many pings failed, time frame, etc

I am running TCPDUMP on a monitor/span port that the ONOS is connected to and I see ICMP packets going out to devices and I see their reply packets come back.

Over a 15 minute period of time a host will be reported as DOWN, yet the ICMP packet flow shows echo_request / echo_reply pairs without undo delay.

Other machines on the same LAN subnet as the ONOS host also show no dropped ICMP packets.

Hence why I'm asking about additional debugging tools within ONOS..

Thanks

On Mon, Dec 22, 2014 at 3:39 PM, Tom Laermans <tom.laermans@powersource.cx> wrote:
Observium... Bonitoring(?) does tell you why it's down. It doesn't receive a reply either over ICMP echo or over SNMP; this is noted in the event log when the host goes down.

Tom

On 22/12/2014 23:05, John Brown wrote:
Hi

I'm trying to troubleshoot the many false positives we are receiving from OB.

The system will report a host as down, yet our legacy Nagios and out-of-band Pingdom do not show the host as down.

It doesn't appear that OB records in the log what specifically is making OB think the host is down.

I've increased the SNMP time out value to 3 seconds (which seems very long) and that has helped with some hosts, mostly Mikrotiks.

But I doubt that our Juniper MX480's (which are lightly loaded) should need such long time frame to respond.

How can I get OB to report what is the actual trigger for its "Host Down" alerts ??

Are there tweaks for performance monitoring / testing ??


Thank you in advance..




_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium


_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium