I was seeing gaps in my Observium graphs, so I knew that something was wrong with my system.
The output from the Observium "poller-wrapper.py" cron job has statistics on how long it takes to complete each polling cycle. These are scheduled every 5 minutes (300 seconds). Here is what I found:
"INFO: poller-wrapper polled 312 devices in 1142 seconds with 20 workers WARNING: the process took more than 5 minutes to finish, you need faster hardware or more threads INFO: in sequential style polling the elapsed time would have been: 21663 seconds WARNING: Consider setting a minimum of 77 threads. (This does not constitute professional advice!)"
I tried increasing threads on the poller-wrapper.py cron job, but this did not speed things up. The "top" command showed me that this system was always waiting for disk I/O, which is indicated by a high value for "wa". I decided to give rrdcached a try, after reading this blog:
http://blog.best-practice.se/2014/10/using-rrdcached-with-observium.html
With rrdcached enabled, I am finally completing my polls with 5-minutes!
INFO: poller-wrapper polled 312 devices in 51 seconds with 20 workers
Yes, it went from 1142 seconds to 51 seconds, over 22 times faster! I know people have said the web interface would be slower, but from my perspective it is loading much faster now that rrdcached is saving some disk IOPS for Apache. (Disclaimer: I have compiled and installed rrdtool 1.4.9 to try to maximize speed on the web-interface)
I hope this help people who are bottle-necked on IOPS from their disks, like I was.
Cheers,
Tristan
*Tristan Rhodes* Network Engineer Weber State University 801.626.8549