Re: [Observium] Graph Gaps - aka I've horked something
![](https://secure.gravatar.com/avatar/21caf0a08d095be7196a1648d20942be.jpg?s=120&d=mm&r=g)
Sure sounds like it doesn't finish in 5min.. because of "I added some devices and..." ...
Adam Armstrong adama@memetic.org wrote:
The ping graph for localhost?
That is very odd. The ping graph doesn't even use snmp.
Are you using the poller wrapper or not?
adam.
On 2013-11-06 19:53, chip wrote:
That's what I thought at first as well, then noticed the issue happening with monitoring the localhost, including the ping response time. Looking at all the graphs, the gaps seem consistent across everything so it's like everything works or everything doesn't. I would assume that something intermittent would allow some hosts or oids to work and some do not. Very odd. My sysadmin skills aren't great but I'll keep poking.
Attached is the log for running sudo ./poller.php -d -h localhost | tee ~/log4.txt
Thanks for all the effort!
--chip
On Wed, Nov 6, 2013 at 2:29 PM, Adam Armstrong adama@memetic.org wrote:
The fact that you get any data at all suggests that it's not a code issue.
The graph looks exactly how i'd expect for something where the snmp queries are imtermittently failing, either because of poor network or something like a firewall with constrained sessions.
Have you tried running the poller in debug mode and seeing what output you get?
adam.
On 2013-11-06 19:12, chip wrote:
Hi all,
Yesterday afternoon I added several devices and since then I'm getting gaps in *all* my graphs. Even those for the localhost. I upgraded to the latest (commercial) version, just to check, and still having issues. I'm not quite sure what to make of this.
I have nfsen and cacti running on the same box and their graphs all look ok so it seems something specific to observium. My disk io is a bit high but not too bad, cpu is good, ram is fine. See attached screen shot for example of gaps.
8gb of RAM Ubuntu 12.04 750gb sata disk (ST3750640NS) Intel(R) Xeon(R) CPU X5450 @ 3.00GHz
*couple of notes*
- a few days ago I tweaked some mysql settings
< #table_cache = 64
table_cache = 600 65c65
< query_cache_size = 16M
query_cache_size = 32M
- Also in order to group some devices together I manually edited the
observium.device.location column in the db for some of the devices.
Thoughts? Direction?
Thanks all!
--chip
-- Just my $.02, your mileage may vary, batteries not included, etc.... _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [1] _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [1]
-- Just my $.02, your mileage may vary, batteries not included, etc....
Links:
[1] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
![](https://secure.gravatar.com/avatar/0fa97865a0e1ab36152b6b2299eedb49.jpg?s=120&d=mm&r=g)
On 2013-11-06 20:45, Tom Laermans wrote:
Sure sounds like it doesn't finish in 5min.. because of "I added some devices and..." ...
He's running the poller wrapper with only a single process, that probably explains it...
:D
adam.
![](https://secure.gravatar.com/avatar/25870236ed95bd801c64df8864069f65.jpg?s=120&d=mm&r=g)
Ok, just to circle back, I've made some changes:
1: Set the number of poller-wrapper.py processes to 25, this greatly speeded up things. Thanks!
2: I still noticed I had a few devices that would take 8-12 minutes to finish a poll. "ps kstart_time -ef | grep "wrap|poller.php" will help identify when processes started and what devices were taking so long. Incidentally this is found in the "observium.log" file as well, but this is a bit more real time. The devices that were taking a really long time have only a handful of interfaces but around 2500 vlans and associated SVI/VE interfaces. The hardware doesn't supply traffic stats on these interfaces so I disabled lots of polling and discovery modules. There are lots of ARP entries on these devices as well, I don't really require those so I disabled those modules as well. Looked through the "includes/defaults.inc.php" and put a lot of the stuff I didn't need or want into "config.php" and set to disable. The devices still take around 3 minutes to finish, but at least its not 5!
I think things are much better now, thanks for the help!
--chip
On Wed, Nov 6, 2013 at 3:55 PM, Adam Armstrong adama@memetic.org wrote:
On 2013-11-06 20:45, Tom Laermans wrote:
Sure sounds like it doesn't finish in 5min.. because of "I added some devices and..." ...
He's running the poller wrapper with only a single process, that probably explains it...
:D
adam.
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
![](https://secure.gravatar.com/avatar/0fa97865a0e1ab36152b6b2299eedb49.jpg?s=120&d=mm&r=g)
25 is a very, very high number. It's probably too high if you're not running off a very expensive SSD or ramdisk.
You can see how long devices take to poll in the web interface, look under the globe menu,
adam.
On 2013-11-06 23:36, chip wrote:
Ok, just to circle back, I've made some changes:
1: Set the number of poller-wrapper.py processes to 25, this greatly speeded up things. Thanks!
2: I still noticed I had a few devices that would take 8-12 minutes to finish a poll. "ps kstart_time -ef | grep "wrap|poller.php" will help identify when processes started and what devices were taking so long. Incidentally this is found in the "observium.log" file as well, but this is a bit more real time. The devices that were taking a really long time have only a handful of interfaces but around 2500 vlans and associated SVI/VE interfaces. The hardware doesn't supply traffic stats on these interfaces so I disabled lots of polling and discovery modules. There are lots of ARP entries on these devices as well, I don't really require those so I disabled those modules as well. Looked through the "includes/defaults.inc.php" and put a lot of the stuff I didn't need or want into "config.php" and set to disable. The devices still take around 3 minutes to finish, but at least its not 5!
I think things are much better now, thanks for the help!
--chip
On Wed, Nov 6, 2013 at 3:55 PM, Adam Armstrong adama@memetic.org wrote:
On 2013-11-06 20:45, Tom Laermans wrote:
Sure sounds like it doesn't finish in 5min.. because of "I added some devices and..." ...
He's running the poller wrapper with only a single process, that probably explains it...
:D
adam.
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [1]
-- Just my $.02, your mileage may vary, batteries not included, etc....
Links:
[1] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
![](https://secure.gravatar.com/avatar/25870236ed95bd801c64df8864069f65.jpg?s=120&d=mm&r=g)
Ah, I had found in the mailing list some people using '64'. I've dropped it back to 10 and will see how things do.
One additional thing. I noticed in the defaults.inc.php file there was an option to enable the "-Cr#" flag for snmpbulkwalk and there was indication this would greatly speed up things. The default is set to off however. Doing some testing I've found a walk that took almost 6 minutes "-m IF-MIB -M /data/observium/mibs for ifEntry" was dropped to around a minute when "-Cr5" was set. I did this on the command line, not in Obs. Just curious what the downside is, google-fu isn't revealing much.
Thanks again,
--chip
On Wed, Nov 6, 2013 at 6:43 PM, Adam Armstrong adama@memetic.org wrote:
25 is a very, very high number. It's probably too high if you're not running off a very expensive SSD or ramdisk.
You can see how long devices take to poll in the web interface, look under the globe menu,
adam.
On 2013-11-06 23:36, chip wrote:
Ok, just to circle back, I've made some changes:
1: Set the number of poller-wrapper.py processes to 25, this greatly speeded up things. Thanks!
2: I still noticed I had a few devices that would take 8-12 minutes to finish a poll. "ps kstart_time -ef | grep "wrap|poller.php" will help identify when processes started and what devices were taking so long. Incidentally this is found in the "observium.log" file as well, but this is a bit more real time. The devices that were taking a really long time have only a handful of interfaces but around 2500 vlans and associated SVI/VE interfaces. The hardware doesn't supply traffic stats on these interfaces so I disabled lots of polling and discovery modules. There are lots of ARP entries on these devices as well, I don't really require those so I disabled those modules as well. Looked through the "includes/defaults.inc.php" and put a lot of the stuff I didn't need or want into "config.php" and set to disable. The devices still take around 3 minutes to finish, but at least its not 5!
I think things are much better now, thanks for the help!
--chip
On Wed, Nov 6, 2013 at 3:55 PM, Adam Armstrong adama@memetic.org wrote:
On 2013-11-06 20:45, Tom Laermans wrote:
Sure sounds like it doesn't finish in 5min.. because of "I added some devices and..." ...
He's running the poller wrapper with only a single process, that probably explains it...
:D
adam.
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [1]
-- Just my $.02, your mileage may vary, batteries not included, etc....
Links:
[1] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
![](https://secure.gravatar.com/avatar/0fa97865a0e1ab36152b6b2299eedb49.jpg?s=120&d=mm&r=g)
On 2013-11-06 23:53, chip wrote:
Ah, I had found in the mailing list some people using '64'. I've dropped it back to 10 and will see how things do.
One additional thing. I noticed in the defaults.inc.php file there was an option to enable the "-Cr#" flag for snmpbulkwalk and there was indication this would greatly speed up things. The default is set to off however. Doing some testing I've found a walk that took almost 6 minutes "-m IF-MIB -M /data/observium/mibs for ifEntry" was dropped to around a minute when "-Cr5" was set. I did this on the command line, not in Obs. Just curious what the downside is, google-fu isn't revealing much.
I'm running it as standard, as are a few other people I know of.
We define the settings used there in the per-device config, so we've only added it to devices we've tested. It's not on as default yet because there may still be variants or versions of systems we've tested where it doesn't work.
If it works fine for you when you turn it on, it's safe to use. It makes well written SNMP stacks much faster to poll.
It doesn't work properly at all on JunOS, because JunOS is derp. It works very well on Cisco's many SNMP stacks, on Arista and on UNIX-like systems running net-snmpd. It doesn't work properly with FreeBSD's bsnmpd.
adam.
participants (3)
-
Adam Armstrong
-
chip
-
Tom Laermans