Re: [Observium] ID of graph changes and "jumps" between storage on server

6 Jul 2016


      This is fairly standard problem with SNMP on computers (as opposed to 
infrastructure where vendors know to solve it). It's caused by storage 
devices appearing in different orders and getting different IDs, this 
makes it really hard to identify them causing things to be added/removed 
and IDs change.
For us we should retain continuity with graphs, since our RRDs are named 
by storage description (mount point/drive).
If this happens between alerting rebuilds, you might see alerts being 
generated for an entity with the same snmp ID as an entity which was 
supposed to be alerted for. We do run alerting rebuilds after discovery, 
but these changes are picked up during poller, and it's too much of a 
performance hit to run rebuild this frequently.
Personally, I would not be allowing network-attached things to appear in 
alerting, since you can't really guarantee that they aren't going to 
move around in SNMP.
It'd be nice if net-snmp had some persistence for this, but it doesn't. 
Sadly.
adam.
On 2016-07-06 17:53, Henrik Cednert (Filmlance) wrote:
...
Hello
I have an issue that bugs the heck out of me. Not sure exactly what's
going on but in short:
First of all, when I write ID I mean the ID in the url when looking at
the graph for a storage attached to a server. Like
"....//graphs/to=1467822524/id=219/type=storage_usage/from=1467736124/"
where the ID is 219.
The storage I monitor on this OSX server is connected via iSCSI, NFS,
SNFS and Avid ISIS.
What happens is that when rebooting or if a storage is disconnected
and reconnected the above mentioned ID changes. Not alway but very
often and it gives some odd subsequent errors like alert checkers
being triggered for the wrong storage. Which I don't know how it's
even possible since I assume they're not based on ID in the
background? But on these two screenshots an alert was triggered for a
storage not in the alert checker.
https://www.dropbox.com/s/9ttr7b86hv28pnp/alertChecker01.png
https://www.dropbox.com/s/th7rqel88pntpvl/alertChecker02.png
It also messes with the minigraphs on the front page since they're
hardcoded to an ID. An ID that can have gone missing or changed...
It also seems messes with the graphs since in some situations looks
like it can get stuck in a mode where wrong storage are monitored and
graphed into another storages graph. Real odd and not sure that's
really whats going on or if it's snmpd on the server that reports
wrong. But it's timed with reboot and or the other isses above.
https://www.dropbox.com/s/rdrhetew8nod09o/graphID01.png
Have any one had similar issues?
Cheers and thanks
--
Henrik Cednert
cto | compositor
Filmlance International | www.filmlance.se
mobile [ + 46 (0)704 71 89 54 ]
skype  [ cednert ]

observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

Re: [Observium] ID of graph changes and "jumps" between storage on server

Adam Armstrong