In my previous testing (which was on a mikrotik, but I didn't find out if it was the end device or an intermediate device), the issue occured because the device stopped replying to pings for 10 seconds every few minutes.
This will produce frequent false alerts with even 1 second timeouts and multiple retries not ignoring.
Personally, I think we're quite justified in marking a device down if it doesn't reply to ping after 5 seconds :)
adam.
------ Original Message ------ From: "John Brown" john@citylinkfiber.com To: "Observium Network Observation System" observium@observium.org Sent: 12/22/2014 5:12:18 PM Subject: Re: [Observium] Many false positives
Thanks.
I did crank the timeout and retires up to 1000ms and 5 retries, prior to my original post. That didn't seem to help much.
I now have the debug value set and will watch the debug log for any hints.
Thanks
On Mon, Dec 22, 2014 at 4:04 PM, Tom Laermans tom.laermans@powersource.cx wrote:
There is no time frame to report; it didn't return a ping (in time) at the exact moment the message is logged.
These are the default ping config settings: #$config['ping']['retries'] = 3; // How many times to retry ping (1
#$config['ping']['timeout'] = 500; // Timeout in milliseconds (50 - 2000)
So, 3 missed-or->500ms pings, it would seem.
There are currently no device-specific ping settings, this is on my to-do somewhere.
You can try to enable this: $config['ping']['debug'] = TRUE; // If TRUE store ping errors into logs/debug.log file
and check the debug log.
Tom
On 22/12/2014 23:55, John Brown wrote:
Yes, I see the "Device Status changed to Down (PING)" in the log.
The conflict I have with this that it doesn't provide any more detailed information. How many pings failed, time frame, etc
I am running TCPDUMP on a monitor/span port that the ONOS is connected to and I see ICMP packets going out to devices and I see their reply packets come back.
Over a 15 minute period of time a host will be reported as DOWN, yet the ICMP packet flow shows echo_request / echo_reply pairs without undo delay.
Other machines on the same LAN subnet as the ONOS host also show no dropped ICMP packets.
Hence why I'm asking about additional debugging tools within ONOS..
Thanks