My 2 cents... SAN and virtualisation should still be considered, it just depends on your budget and your requirements, along with experience/skill/knowledge and the ability to obtain what you don't have quickly.

The other 98 cents,

If you're running Observium and it's not revenue supporting in a tangible way or you're simply doing a proof of concept with full load, sure; a dedicated second hand box from eBay is probably the way to go.

Morten sounds like he's looking at running Observium in service provider land - 4000+ devices, 100,000+ ports, with a high potential for growth - it's likely going to be revenue supporting in a tangible way, either because support staff are going to be relying on it to troubleshoot things, or because customers will be logging into it directly.

Single box == single point of failure. Either due to motherboard/backplane/chassis, or due to physical location. Either way if it becomes critical to you in some way you really need to consider spending more time/money ensuring that you can keep it running as much as possible, and can recover from any failures quickly and with minimal impact.

Virtualisation with SAN based storage is an option that should be considered.

re "Don't use a SAN" - performance won't be an issue on a modern SAN with pure SSD or tiered storage options, a good storage plan/layout (if applicable), and multiple high speed 8/10Gbps+ interfaces. However, yes, it will be a waste of money if you're just going to run a dedicated Linux box anyway. It's easy enough these days to provide dedicated spindles and SSD then tier them yourself using the Linux based options available, one of which is now natively within LVM.

re "VM I/O overhead is a pain" - Correctly sizing resource allocations and segmenting VM operation to appropriate hosts can be a pain when you first deal with the concept. But VM storage I/O is negligible to the point of not existing on all modern hypervisors.

The reason I/O appears to be painful is generally because of contention for CPU resources, exacerbated by high vCPU allocations to one or more VM's on the host, which impacts everything in the VM including requests for I/O operations.

VMware has a whole swath of useful doco regarding this. Hit Google. Take a look at what the recommendations are for workloads like SQL Server, Oracle and Exchange.

If you need access to lots of physical CPU yes you will need to allocate more than just 1 or 2 vCPU's, in that case due to how scheduling and locking of pCPU's works the best practice is actually to run VM's with a high vCPU allocation on hosts without any contention e.g. run 4 x 16 vCPU VM's on 64 pCPU system with no other VM's.

That ups your cost to run each VM of course, basically to "cost of cheap physical box + virtualization software cost/licenses/support fees".... the total is likely to be more than what you could get away with by buying a bunch of pizza boxes, to start with, but you do get all the processing capability they need, along with all the benefits of an abstraction layer between the software & O/S, and the hardware it's running on, e.g. in VMware land...

1. GUI tools

2. Snapshot's and rollback, which can be integrated as part of a change control process

3. Flexibilty to move/migrate the system between storage, possibly while running but dependent upon shared storage and licensing options.

4. Flexibility to move/migrate the system between hosts, possibly while running but dependent upon shared storage and licensing options.

5. Flexibility to replicate the entire VM to another location, e.g. your own Disaster Recovery site, or cloud based storage where the VM can then be started up on someone else's cloud computing platform only when needed. Dependent upon VMware licensing options, possibly SAN features/licensing if you have to do block level replication there.

6. A whole swath of VM based backup software that can integrate with snapshots along with doing dedupe and compression. Can be part of 3. Other products required.

7. CLI tools and API's to automate a whole lot of 2,3,4,5,6.

8. Optional extra GUI tools so you can skip straight paste doing things at 7 yourself. Dependent upon licensing options and products purchased.

It might be good to start a discussion involving people who are actually running Observium within VM's? Some documented pointers on the website with links to reference documentation/architectures for large virtualised environments would not hurt.

-Colin

On 12 November 2014 07:23, Adam Armstrong <adama@memetic.org> wrote:

Don't use a SAN. Observium is the perfect storm of worst use-case for SANs. It has lots of tiny writes all over the disk and Observium will eat up the performance of your SAN far quicker than its sticker price might indicate. You're far better off with a few SSDs or even a RAM disks, if you can fit it in.

The ports page doesn't use as much RAM as it once did, so that requirement isn't there anymore. Mostly what you need to do is keep up I/O throughput and CPU throughput to handle enough parallel threads to poll all of your devices quickly enough.

I would aim to run without rrdcached, and only look at using it if you need to. It adds additional CPU and latency to the equation, which is not usually desired.

One of the major problems of modern servers, IMO, is that the single-core clock speeds are relatively slow. For web-ui performance, you want the fastest single core speed you can get. For poller performance, you want as many cores as you can efficiently spread your poller load over. 4,000 devices might require more than 12 cores, especially if they're only 2Ghz cores.

Don't try to run Observium on a VM. The VM I/O overhead is a pain, and you'll ruin the host system for any other application. You want a high-core, high-memory, high-io dedicated server.

Something like :

http://www.ebay.co.uk/itm/Refurbished-HP-ProLiant-DL585-G2-Web-Server-4-x-Quad-Core-16-Core-128GB-RAM-/121427534352

Put a couple of SSD in that and it /should/ suffice. Though, you might want faster cores, and you might want 256GB of RAM, so you can keep the RRDs in RAM.

It's difficult to gauge performance requirements on that scale because it depends upon how the devices behave and what's monitor(able/ed) on them.

Oh, and split MySQL off onto a separate server with fewer, faster cores. It's not worth doing this with the web gui because of the latency involved in dealing with RRDs over the network, but it's definitely worth doing with MySQL.

adam.

------ Original Message ------

From: "Morten Guldager" <morten.guldager@gmail.com>

To: "Observium Network Observation System" <observium@observium.org>

Sent: 11/11/2014 2:11:05 PM

Subject: Re: [Observium] Performance

Yeah I read that page too. But I'm uncertain how linear observium scales. 10 cores will be doable, but how much RAM will it take then. Guess I will have to use rrdcached to keep the disk IOs on a manageable level. The server guys will probably complain if I suck every available IO ops out of their SAN.

/Morten

On Tue, Nov 11, 2014 at 4:22 PM, Spencer Gaw <spencerg@frii.net> wrote:

I'm not sure how current this information is but it may answer some of your questions: http://www.observium.org/wiki/Hardware_Scaling

Regards,

SG

On 11/11/2014 5:55 AM, Morten Guldager wrote:

'Aloha!

I'm in the process of evaluating observium for my organisation's needs. We have some instances running already, but my task is to do the evaluation in a more structured way. I have some questions which I will keep in different posts to keep the threads clean.

Performance:

We are looking at a network currently consisting of 4000 devices with close to 100'000 ports. These devices are well known to observium and a subset of them got auto discovered just fine. But how about performance. Will it require vast amounts of computing power? Also, our network grows quite rapidly, might be 50% bigger 12 months ahead.

I found an old thread from July 2013 where Joe Hoh describing a complex multi server setup to scale observium to something approx 2-3 times my current needs. Will I have to go through similar "struggles" to get it working? Or has observium changed much to make it scale different than it did 1.5 year ago?

Pointers to information regarding scaling observium are most welcome.

_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium