Re: [Observium] New user introduction - Scaling questions

11 Nov 2011


      On Fri, 11 Nov 2011 12:22:42 -0500, Berant Lemmenes berant@lemmenes.com
wrote:
...
Hello everyone,
I just wanted to introduce myself to the list and complement the authors
on
...
a fantastic tool!
As a little background on our deployment, I work for a Midwestern US
ISP/NSP and I found Observium by looking for a replacement solution for
our severely aging 95th percentile burstable billing system. However I
was
...
blown away by Observium once I got it running. We're now looking at
replacing several systems with it focusing on interface/data polling, we
have a separate system that we will continue maintain to do SNMP trap
handling.
Right now we have just shy of 16k interfaces across 30 nodes, with
another
...
dozen or two to add to complete our cisco L3/L2 devices. I'm interested
in
...
adding new device types for our Cisco 15454 SONET and MSTP systems (and
have read the Developing/NewOS document), however if these devices were
to
...
be added it would take our device count up north of 600, with a massive
increase in interfaces. So I want to make sure I've got things setup
well
...
before going down that road.
While performance is doing great thus far I'm concerned about things I
can
...
do to scale the system. Currently the system is a Xen VM with 4 cores
and
...
4GB of ram, and the load average is staying right around 4 with 40%
average
...
CPU usage.
Firstly, don't run it in a VM. Observium scales primarily on I/O
throughput. If you have /very/ fast disks you might get away with something
that size in a VM.
I tend to put large deployments into a ramdisk that is synced periodically
to a physical disk (.tar.Z to reduce write time)
...
Since we're not interested in alerting for interface/node up/down with
Observium I've configured each device to ignore and disable alerting.
I've
...
also took out the various menu items poller modules for things that we
don't need as well. And I need to look into interface names that can be
ignored etc as well.
Please, please, please do *not* make any changes that won't be committed
back into the SVN. Observium is designed to be updated frequently from SVN
and has database and other update scripts to make this work. Code changes
will break this mechanism. They'll also mean you never update...
...
I've not yet tried rrdcached but I'd like to see what impact that has on
existing load.
In practice it makes relatively little difference. Far more impact can be
had from increasing disk I/O performance or moving the RRDs into a large
ramdisk (i've even had instances where the ramdisk has been on another host
due to that being how the RAM was available!)
...
Does anyone else have any recommendations or additional best practices?
Splitting RRD/MySQL across two different disks can help. I've not scaled
much past 10k interfaces on any deployments i've done. But really it's just
a function of how quickly you can write RRDs (and perform MySQL whilst the
disk isn't being eaten by RRD).
...
I have some thoughts ideas on potential features that could be useful
for
...
other ISP users as well, however I don't want to flood the list right
off
...
the bat with an even more rambling email.
One of the major things we're missing is good ideas on how to present
information, we're especially interested in ideas from other SPs.
adam.

Re: [Observium] New user introduction - Scaling questions

Adam Armstrong