Hello everyone,
I just wanted to introduce myself to the list and complement the authors on a fantastic tool!
As a little background on our deployment, I work for a Midwestern US ISP/NSP and I found Observium by looking for a replacement solution for our severely aging 95th percentile burstable billing system. However I was blown away by Observium once I got it running. We're now looking at replacing several systems with it focusing on interface/data polling, we have a separate system that we will continue maintain to do SNMP trap handling.
Right now we have just shy of 16k interfaces across 30 nodes, with another dozen or two to add to complete our cisco L3/L2 devices. I'm interested in adding new device types for our Cisco 15454 SONET and MSTP systems (and have read the Developing/NewOS document), however if these devices were to be added it would take our device count up north of 600, with a massive increase in interfaces. So I want to make sure I've got things setup well before going down that road.
While performance is doing great thus far I'm concerned about things I can do to scale the system. Currently the system is a Xen VM with 4 cores and 4GB of ram, and the load average is staying right around 4 with 40% average CPU usage.
Since we're not interested in alerting for interface/node up/down with Observium I've configured each device to ignore and disable alerting. I've also took out the various menu items poller modules for things that we don't need as well. And I need to look into interface names that can be ignored etc as well.
I've not yet tried rrdcached but I'd like to see what impact that has on existing load.
Does anyone else have any recommendations or additional best practices?
I have some thoughts ideas on potential features that could be useful for other ISP users as well, however I don't want to flood the list right off the bat with an even more rambling email.
Observium is a great tool, and though I'm not a programmer I'd love to help out however I can.
Thanks, Berant