On 2013-07-29 03:30, Joe Hoh wrote:
Thanks in advance for anyone who can help.
We are deploying Observium in a very large environment. We are upwards of 6,000 devices and 300,000 ports.
You seem to be the largest live installation that I know about.
I don't think we are going to be able to poll this in 5 minutes. This is what we have done so far:
Distributed polling
Polling - 8 VMs - 4core, 8GB RAM each
I would not do this in VMs. You're losing a few percent of CPU performance and making your RAM much less efficient. I assume all of these VMs are on two or more hosts running nothing else? Why not run the pollers directly on the hosts themselves?
VMware seems to have brainwashed the entire planet into thinking things get faster inside VMs, or that they make more efficient use of resources. Very odd!
I'd be trying quite hard to size a single poller server to accommodate the entire platform to remove complexity and the likelihood at that some point we'll change something which collapses your house of custom modification cards.
The poller needs aggregate throughput rather than single-core speed, so you would want to look for dual or quad socket 6/8/12-core systems.
- MySQL - 4 core, 8GB RAM
MySQL is the easiest place to make performance gains by offloading it to another server. I/O and CPU contention makes a lot of difference.
- Web Interface - 4 core, 8GB RAM
Normally I'd put the webui on the same device as the pollers, but in your instance you're going to have a couple of really slow to render pages which will benefit from a small number of very fast cores. You want the fastest single-process cpu you can get in here, a high-clock i7 would do nicely.
Probably just mounting it over NFS directly from the poller host would work.
- NFS Server (for centralized /opt/observium - except for the MIBS
directory, which is copied across each server) - moving to an EMC VNX in 2 weeks.
NFS adds a fair bit of overhead to the entire process. I would very much be trying to work out ways of fitting the whole storage subsystem directly on to the polling server. Lots of 2.5" SAS disks in the poller host might suffice, or some form of high-throughput, low-latency external storage medium.
I assume you already have some idea of how your existing NFS server copes with the load, do you think you could scale it up to 300k interfaces?
You could look at running off decent quality SSDs. I know of a few large installs which do this. Below I mention RRD structure options to minimise writes, which may be useful if you drop the whole install across a few very fast, large SSDs.
- rrdcached being implemented (any assistance here is helpful)
We found that RRDcached didn't add very much. I'm also not sure how safe it is to use in a multi-poller environment.
If you can fit them all on to a single host, you might gain a bit of i/o throughput by using it, though.
- Modified poller-wrapper.py to distribute the devices that poll
within 120s across multiple instances of poller-wrapper.py running on multiple hosts. Devices that poll in more than 120s are polled on separate servers at 15 minute intervals.
Why? I'm assuming it's related to the order that devices are polled in. We should remove the ordering by poller time and poll in device added order, this would provide more even load.
- poller-wrapper has been modified to allow for multiple instances
just like poller.php with MODULUS distribution
If you can build a single server large enough, you'd run a single poller-wrapper with 128 instances :)
- Each instance of poller-wrapper.py gets an instance number and the
total number of instances.
- All of the devices with the last poll time < 120 seconds are MOD'ed
with the device_id and the total number of instances and compared to the instance number - device_id MOD total_instances = this_instance
- Tuning threads in each poller-wrapper.py - currently at 16 threads
and 2 instances on each 4 vCPU server for 32 threads running at once or 8 threads per core
- The DC is on the west coast and that presents latency problems. We
may need to address with distributed polling
The issues we find with long-distance polling tend to come from network stability rather than latency. Some devices can be so slow to respond that we end up overlapping polling of a single host though (think a fully-loaded 6500 300ms away)
We do have some ideas about how to help solve the UDP-over-huge-distance problem involving HTTP-based proxying of requests, but that's quite a big job to rewrite our code to handle, so isn't something we're likely to get done in the near future.
- We are at 1024MB in php.ini
Seems a little excessive to need this much RAM for a PHP process. You may have reached the point at which our in-PHP sorting system becomes unusable. This stuff might need to be rewritten for your size of install.
- We are using xcache (tuning help is appreciated - or should we just
turn it off)?
Oh god don't turn it off! Your web interface would become unusable! :)
My questions are:
- How can I change the default RRD behavior to use 15 minute
intervals instead of 5 minute intervals
- 15 minutes x 2 weeks
- 2 hours x 5 weeks
- 4 hours x 12 months
- We want to keep the max/min/average/87.5th percentile (since only 8
measurements per 2 hours)
At the moment this isn't possible as the poller frequency is hard-coded in places around the code, but we could perhaps change that in the future.
I'm not sure what you mean by keeping a percentile, but in any case this isn't at all possible due to RRD's limitations.
It should be noted here that if you're storing your data on a rotational medium where speed isn't an issue, you might be better off aggregating as little as possible. A large amount of RRD's i/o load comes from when it aggregates high resolution data into low resolution data.
If you can afford the disk space, it might help you to store, say, 6 months of 5 minute data and then aggregate to 1 year of 2 hour data. This means that you're only generating a single aggregated data point every 2 hours, if you see what I mean?
Our RRDs are sized at the moment for my preference to run observium out of a RAM disk. You have long since passed the point where this is viable and are going to have to use a *lot* of spindles to get enough IOPS capacity, so perhaps removing the aggregation would work for you.
- I don't see the configuration items for that.
There aren't any yet :)
- Would we be better with a few big boxes rather than small VMs?
I think you'd be better off trying to size a single poller with a *lot* of cores and fast I/O, and supplementing this with a fast external MySQL server and a very high clock-speed webui server.
As with all free software projects, we listen to suggestions and requests, but at the end of the day, all that ever really gets implemented is what the individual development team members want for their own instances. Most of our installs are probably around the 10k ports mark, so we rarely work on things that would help the platform to scale to your size of installation.
We have special arrangements with a few large organisations where they sponsor development to add features and makes changes specifically for their requirements, this might be useful for you too. :)
adam.