Hello everyone,
I just wanted to introduce myself to the list and complement the authors on
a fantastic tool!
As a little background on our deployment, I work for a Midwestern US
ISP/NSP and I found Observium by looking for a replacement solution for
our severely aging 95th percentile burstable billing system. However I was
blown away by Observium once I got it running. We're now looking at
replacing several systems with it focusing on interface/data polling, we
have a separate system that we will continue maintain to do SNMP trap
handling.
Right now we have just shy of 16k interfaces across 30 nodes, with another
dozen or two to add to complete our cisco L3/L2 devices. I'm interested in
adding new device types for our Cisco 15454 SONET and MSTP systems (and
have read the Developing/NewOS document), however if these devices were to
be added it would take our device count up north of 600, with a massive
increase in interfaces. So I want to make sure I've got things setup well
before going down that road.
While performance is doing great thus far I'm concerned about things I can
do to scale the system. Currently the system is a Xen VM with 4 cores and
4GB of ram, and the load average is staying right around 4 with 40% average
CPU usage.
Since we're not interested in alerting for interface/node up/down with
Observium I've configured each device to ignore and disable alerting. I've
also took out the various menu items poller modules for things that we
don't need as well. And I need to look into interface names that can be
ignored etc as well.
I've not yet tried rrdcached but I'd like to see what impact that has on
existing load.
Does anyone else have any recommendations or additional best practices?
I have some thoughts ideas on potential features that could be useful for
other ISP users as well, however I don't want to flood the list right off
the bat with an even more rambling email.
Observium is a great tool, and though I'm not a programmer I'd love to help
out however I can.
Thanks,
Berant