This is fairly standard problem with SNMP on computers (as opposed to infrastructure where vendors know to solve it). It's caused by storage devices appearing in different orders and getting different IDs, this makes it really hard to identify them causing things to be added/removed and IDs change.
For us we should retain continuity with graphs, since our RRDs are named by storage description (mount point/drive).
If this happens between alerting rebuilds, you might see alerts being generated for an entity with the same snmp ID as an entity which was supposed to be alerted for. We do run alerting rebuilds after discovery, but these changes are picked up during poller, and it's too much of a performance hit to run rebuild this frequently.
Personally, I would not be allowing network-attached things to appear in alerting, since you can't really guarantee that they aren't going to move around in SNMP.
It'd be nice if net-snmp had some persistence for this, but it doesn't. Sadly.
adam.
On 2016-07-06 17:53, Henrik Cednert (Filmlance) wrote:
Hello
I have an issue that bugs the heck out of me. Not sure exactly what's going on but in short:
First of all, when I write ID I mean the ID in the url when looking at the graph for a storage attached to a server. Like "....//graphs/to=1467822524/id=219/type=storage_usage/from=1467736124/" where the ID is 219.
The storage I monitor on this OSX server is connected via iSCSI, NFS, SNFS and Avid ISIS.
What happens is that when rebooting or if a storage is disconnected and reconnected the above mentioned ID changes. Not alway but very often and it gives some odd subsequent errors like alert checkers being triggered for the wrong storage. Which I don't know how it's even possible since I assume they're not based on ID in the background? But on these two screenshots an alert was triggered for a storage not in the alert checker. https://www.dropbox.com/s/9ttr7b86hv28pnp/alertChecker01.png
https://www.dropbox.com/s/th7rqel88pntpvl/alertChecker02.png
It also messes with the minigraphs on the front page since they're hardcoded to an ID. An ID that can have gone missing or changed...
It also seems messes with the graphs since in some situations looks like it can get stuck in a mode where wrong storage are monitored and graphed into another storages graph. Real odd and not sure that's really whats going on or if it's snmpd on the server that reports wrong. But it's timed with reboot and or the other isses above. https://www.dropbox.com/s/rdrhetew8nod09o/graphID01.png
Have any one had similar issues?
Cheers and thanks
-- Henrik Cednert cto | compositor
Filmlance International | www.filmlance.se mobile [ + 46 (0)704 71 89 54 ] skype [ cednert ]
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium