[Observium] Re: Capture gaps when retrieving interface traffic data …

18 Mar 2024


      This is usually caused by one of three things:
1. Poller hardware is insufficient to poll all devices every 5 minutes 
consistently. This is usually easy to figure out from the massive load 
on the server.
2. Poller has insufficient threads configured to poll all devices every 
5 minutes consistently, despite having enough resources. You can tell if 
this is the case because many poller-wrapper.py processes will be 
running, but the load on the server won't be critical.
3. Some intervening device on the network is throttling/filtering 
traffic intermittently, causing periodic failed snmpwalks and missing 
data for the odd poller cycle
Poller wrapper is given a number of threads to start (either in the cron 
job as an argument or in the web config), it'll run this number of 
poller.php processes. When the poller wrapper process starts, it checks 
to see how many of itself are running, if too many of it are running, it 
dies. These numbers are in the poller config section.
If you have insufficient threads or the server is too slow, the devices 
aren't polled in enough time, so another poller-wrapper starts before 
it's finished. This isn't always bad, some devices just take a REALLY 
long time to poll, but when they start overlapping multiple times, 
that's usually pretty dire.
We prevent more than a certain number of wrapper processes running, so 
if you already have x running because of some slow devices or 
insufficient threads or whatever, it'll refuse to start a process for 
that period, and you'll lose one period of polling data, causing a gap.
The network caused stuff is much harder to diagnose, because no one ever 
wants to admit that their pet firewall platform is useless. :D
adam.
Stefan Schmidt via observium wrote on 2024-03-18 10:33:
...
Hello!
We query with snmp Observium the port data from a Mikrotik router (CCR2004-16G-2S+) with RouterOS 7.14.1 (Level 6) via SNMP from 4 WAN interfaces with polling and alerts. However, we always have gaps in the data. Every now and then 5-10 minutes of data are missing, so that vertical lines without data appear in the rrd - of course this only becomes clear in the 6h overview.
We are surprised because we have the impression that it only started on this device (and others) almost a year ago. We therefore renewed the hardware, reinstalled the system (debian12) and only queried this router alone (also deleted the query from the old Observium to avoid parallel double queries).
We look at https://.../pollerlog/ and see our patient there:
Device Last Polled
10.X.X.1 100% 108.67s
(...) and others of the location with 1-19%...
Why 100%... where are the key data relevant here?
We would be happy if someone could help us adjust the Observium "Poller Wrapper" parameters if necessary. Is there potential for adaptation there?
Greetings Frankfurt/Main
Stefan
_______________________________________________
observium mailing list -- observium@lists.observium.org
To unsubscribe send an email to observium-leave@lists.observium.org

[Observium] Re: Capture gaps when retrieving interface traffic data …

Adam Armstrong