I've seen similar issues (not
necessarily via Observium) with overloaded MikroTik routerboards -
especially on the low-end/older boards.
Given RouterOS's powerful featureset vs the relatively low
CPU/memory available, it is very common to find a board sitting at
100% for hours on end or running out of memory every few
minutes/hours. The only saving grace, typically, is that the board
either recovers periodically or that the watchdog timers force it
to reboot.
Of course, as you (John) are using Observium, you should be able
to tell if the MikroTik is being overloaded or not. ;)
One alternative to attempt to confirm the behaviour (assuming
you've not catching it yourself) is to monitor the problematic
hosts with a a smokeping installation. Smokeping does one thing
and does it well - ping services and print pretty
latency/packet-loss graphs. :)
On 2014/12/23 09:10, Adam Armstrong wrote:
In my previous testing (which was on a mikrotik, but I didn't
find out if it was the end device or an intermediate device),
the issue occured because the device stopped replying to pings
for 10 seconds every few minutes.
This will produce frequent false alerts with even 1 second
timeouts and multiple retries not ignoring.
Personally, I think we're quite justified in marking a device
down if it doesn't reply to ping after 5 seconds :)
adam.
------ Original Message ------
Sent: 12/22/2014 5:12:18 PM
Subject: Re: [Observium] Many false positives
Thanks.
I did crank the timeout and retires up to 1000ms and 5
retries, prior to my original post.
That didn't seem to help much.
I now have the debug value set and will watch the debug
log for any hints.
Thanks
_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium