Very Large Environment

newer
Re: [Observium] why vm

older
Re: [Observium] Graphs

Joe Hoh

29 Jul 2013 29 Jul '13

4:30 a.m.

Thanks in advance for anyone who can help.

We are deploying Observium in a very large environment. We are upwards of 6,000 devices and 300,000 ports.

I don't think we are going to be able to poll this in 5 minutes. This is what we have done so far:

- Distributed polling - Polling - 8 VMs - 4core, 8GB RAM each - MySQL - 4 core, 8GB RAM - Web Interface - 4 core, 8GB RAM - NFS Server (for centralized /opt/observium - except for the MIBS directory, which is copied across each server) - moving to an EMC VNX in 2 weeks. - rrdcached being implemented (any assistance here is helpful) - Modified poller-wrapper.py to distribute the devices that poll within 120s across multiple instances of poller-wrapper.py running on multiple hosts. Devices that poll in more than 120s are polled on separate servers at 15 minute intervals. - poller-wrapper has been modified to allow for multiple instances just like poller.php with MODULUS distribution - Each instance of poller-wrapper.py gets an instance number and the total number of instances. - All of the devices with the last poll time < 120 seconds are MOD'ed with the device_id and the total number of instances and compared to the instance number - device_id MOD total_instances = this_instance - Tuning threads in each poller-wrapper.py - currently at 16 threads and 2 instances on each 4 vCPU server for 32 threads running at once or 8 threads per core - The DC is on the west coast and that presents latency problems. We may need to address with distributed polling - We are at 1024MB in php.ini - We are using xcache (tuning help is appreciated - or should we just turn it off)?

My questions are:

- How can I change the default RRD behavior to use 15 minute intervals instead of 5 minute intervals - 15 minutes x 2 weeks - 2 hours x 5 weeks - 4 hours x 12 months - We want to keep the max/min/average/87.5th percentile (since only 8 measurements per 2 hours) - I don't see the configuration items for that. - Would we be better with a few big boxes rather than small VMs?

Attachments:

attachment.html (text/html — 3.4 KB)

Show replies by date

Peter Childs

29 Jul 29 Jul

5:29 a.m.

So given what you have done so far, what issues are you currently experiencing? I would assume your IO/NFS is just gone to shit?

I had have a few random thoughts about horizontal scaling but I haven't gone any further than thinking about it…

Some thoughts (not in any particular order)

1. Distributed pollers/RRD storage – someone wrote an interesting patch at one stage that would allow the decoupling of the RRD location from the aggregation/presentation layer (ie your WEB-UI nodes could pull in data from all your cluster nodes) -- https://lists.oetiker.ch/pipermail/rrd-developers/2008-May/002203.html 2. Most of the rrd access stuff seems pretty well encapsulated — it may be possible to replace this with a backend in some type of OpenTSDB and a front end in JavaScript (flot?) .. 3. There is an interesting article which could indicate the actual limits for a single node solution could potentially be dealt with some uber tuning -- http://code.google.com/p/epicnms/wiki/Scaling 4. Quasi similar to (1) have a look at using a distributed filesystem (gluster?) 5. Cache in code potentially in a layer so that for single node instances its just a in-php cache, and for multi-node instances hitting something like redis for things you don't really want to persist in your database, but you don't want to store for a short lifetime

Just some randomness thoughts

From: Joe Hoh <jhoh@costco.commailto:jhoh@costco.com> Reply-To: Observium <observium@observium.orgmailto:observium@observium.org> Date: Monday, 29 July 2013 12:00 PM To: Observium <observium@observium.orgmailto:observium@observium.org> Subject: [Observium] Very Large Environment

Thanks in advance for anyone who can help.

We are deploying Observium in a very large environment. We are upwards of 6,000 devices and 300,000 ports.

I don't think we are going to be able to poll this in 5 minutes. This is what we have done so far:

* Distributed polling * Polling - 8 VMs - 4core, 8GB RAM each * MySQL - 4 core, 8GB RAM * Web Interface - 4 core, 8GB RAM * NFS Server (for centralized /opt/observium - except for the MIBS directory, which is copied across each server) - moving to an EMC VNX in 2 weeks. * rrdcached being implemented (any assistance here is helpful) * Modified poller-wrapper.py to distribute the devices that poll within 120s across multiple instances of poller-wrapper.py running on multiple hosts. Devices that poll in more than 120s are polled on separate servers at 15 minute intervals. * poller-wrapper has been modified to allow for multiple instances just like poller.php with MODULUS distribution * Each instance of poller-wrapper.py gets an instance number and the total number of instances. * All of the devices with the last poll time < 120 seconds are MOD'ed with the device_id and the total number of instances and compared to the instance number - device_id MOD total_instances = this_instance * Tuning threads in each poller-wrapper.py - currently at 16 threads and 2 instances on each 4 vCPU server for 32 threads running at once or 8 threads per core * The DC is on the west coast and that presents latency problems. We may need to address with distributed polling * We are at 1024MB in php.ini * We are using xcache (tuning help is appreciated - or should we just turn it off)?

My questions are:

* How can I change the default RRD behavior to use 15 minute intervals instead of 5 minute intervals * 15 minutes x 2 weeks * 2 hours x 5 weeks * 4 hours x 12 months * We want to keep the max/min/average/87.5th percentile (since only 8 measurements per 2 hours) * I don't see the configuration items for that. * Would we be better with a few big boxes rather than small VMs?

Joe Hoh

7:35 a.m.

We are having problems with the NFS performance, yes. We are also having problems getting the polling nodes to be more utilized.

Nothing is working very hard, but throughput (measured in devices completed polling per second) is extremely low.

We are going to try some of the RRD Filesystem tuning recommended in the article - that is an EXCELLENT read. Giving the NFS server more ram and tuning the cache should help - unless it's use as NFS obviates that improvement.

What's the largest Observium deployment that you know of? 6,000 devices and 300,000 ports seems MASSIVE!

On Sun, Jul 28, 2013 at 11:29 PM, Peter Childs pchilds@staff.iinet.net.auwrote:

...

So given what you have done so far, what issues are you currently experiencing? I would assume your IO/NFS is just gone to shit?

I had have a few random thoughts about horizontal scaling but I haven't gone any further than thinking about it…

Some thoughts (not in any particular order)

Distributed pollers/RRD storage – someone wrote an interesting

patch at one stage that would allow the decoupling of the RRD location from the aggregation/presentation layer (ie your WEB-UI nodes could pull in data from all your cluster nodes) -- https://lists.oetiker.ch/pipermail/rrd-developers/2008-May/002203.html 2. Most of the rrd access stuff seems pretty well encapsulated — it may be possible to replace this with a backend in some type of OpenTSDB and a front end in JavaScript (flot?) .. 3. There is an interesting article which could indicate the actual limits for a single node solution could potentially be dealt with some uber tuning -- http://code.google.com/p/epicnms/wiki/Scaling 4. Quasi similar to (1) have a look at using a distributed filesystem (gluster?) 5. Cache in code potentially in a layer so that for single node instances its just a in-php cache, and for multi-node instances hitting something like redis for things you don't really want to persist in your database, but you don't want to store for a short lifetime

Just some randomness thoughts

From: Joe Hoh jhoh@costco.com Reply-To: Observium observium@observium.org Date: Monday, 29 July 2013 12:00 PM To: Observium observium@observium.org Subject: [Observium] Very Large Environment

Thanks in advance for anyone who can help.

We are deploying Observium in a very large environment. We are upwards of 6,000 devices and 300,000 ports.

I don't think we are going to be able to poll this in 5 minutes. This is what we have done so far:

Distributed polling

Polling - 8 VMs - 4core, 8GB RAM each

MySQL - 4 core, 8GB RAM

Web Interface - 4 core, 8GB RAM

NFS Server (for centralized /opt/observium - except for the MIBS

directory, which is copied across each server) - moving to an EMC VNX in 2 weeks.

rrdcached being implemented (any assistance here is helpful)

Modified poller-wrapper.py to distribute the devices that poll

within 120s across multiple instances of poller-wrapper.py running on multiple hosts. Devices that poll in more than 120s are polled on separate servers at 15 minute intervals.

poller-wrapper has been modified to allow for multiple instances

just like poller.php with MODULUS distribution - Each instance of poller-wrapper.py gets an instance number and the total number of instances. - All of the devices with the last poll time < 120 seconds are MOD'ed with the device_id and the total number of instances and compared to the instance number - device_id MOD total_instances = this_instance - Tuning threads in each poller-wrapper.py - currently at 16 threads and 2 instances on each 4 vCPU server for 32 threads running at once or 8 threads per core

The DC is on the west coast and that presents latency problems. We

may need to address with distributed polling

We are at 1024MB in php.ini

We are using xcache (tuning help is appreciated - or should we just

turn it off)?

My questions are:

How can I change the default RRD behavior to use 15 minute intervals

instead of 5 minute intervals - 15 minutes x 2 weeks - 2 hours x 5 weeks - 4 hours x 12 months - We want to keep the max/min/average/87.5th percentile (since only 8 measurements per 2 hours) - I don't see the configuration items for that. - Would we be better with a few big boxes rather than small VMs?

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

Peter Childs

10:19 a.m.

You could attempt to set the 'norrd' config option and see how your pollers handle the run-times of your load… I assume circa 1000 devices, or 50,000 ports per poller… that quantity of nodes is not a 'small' amount from what I can see in mailing list history

Might be worth approaching the authors for some professional-service(s) involvement — I think Tobi (the rrdtool author) was also looking for some funding for his RRDTOOL 2.X work (could be wrong)

You could also point your RRDCACHED to the central 'NFS' host, which might suck less than attempting to do those rrd updates over nfs (you could measure that I assume), leaving reads/creates via NFS …

...

From the design perspective having a jumbo super NFS box is really just like jamming it all on a single node — it doesn't really 'scale' horizontally (ie add more NMS-nodes, get more capacity) -- and thats a pretty tricky thing with RRDtool (see many cacti installations etc …)

From: Joe Hoh <jhoh@costco.commailto:jhoh@costco.com> Reply-To: Observium <observium@observium.orgmailto:observium@observium.org> Date: Monday, 29 July 2013 3:05 PM To: Observium <observium@observium.orgmailto:observium@observium.org> Subject: Re: [Observium] Very Large Environment

We are having problems with the NFS performance, yes. We are also having problems getting the polling nodes to be more utilized.

Nothing is working very hard, but throughput (measured in devices completed polling per second) is extremely low.

What's the largest Observium deployment that you know of? 6,000 devices and 300,000 ports seems MASSIVE!

On Sun, Jul 28, 2013 at 11:29 PM, Peter Childs <pchilds@staff.iinet.net.aumailto:pchilds@staff.iinet.net.au> wrote:

So given what you have done so far, what issues are you currently experiencing? I would assume your IO/NFS is just gone to shit?

I had have a few random thoughts about horizontal scaling but I haven't gone any further than thinking about it…

Some thoughts (not in any particular order)

Just some randomness thoughts

Thanks in advance for anyone who can help.

We are deploying Observium in a very large environment. We are upwards of 6,000 devices and 300,000 ports.

I don't think we are going to be able to poll this in 5 minutes. This is what we have done so far:

My questions are:

_______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

Adam Armstrong

1:39 p.m.

On 2013-07-29 03:30, Joe Hoh wrote:

...

Thanks in advance for anyone who can help.

We are deploying Observium in a very large environment. We are upwards of 6,000 devices and 300,000 ports.

You seem to be the largest live installation that I know about.

...

I don't think we are going to be able to poll this in 5 minutes. This is what we have done so far:

Distributed polling

Polling - 8 VMs - 4core, 8GB RAM each

I would not do this in VMs. You're losing a few percent of CPU performance and making your RAM much less efficient. I assume all of these VMs are on two or more hosts running nothing else? Why not run the pollers directly on the hosts themselves?

VMware seems to have brainwashed the entire planet into thinking things get faster inside VMs, or that they make more efficient use of resources. Very odd!

I'd be trying quite hard to size a single poller server to accommodate the entire platform to remove complexity and the likelihood at that some point we'll change something which collapses your house of custom modification cards.

The poller needs aggregate throughput rather than single-core speed, so you would want to look for dual or quad socket 6/8/12-core systems.

...

MySQL - 4 core, 8GB RAM

MySQL is the easiest place to make performance gains by offloading it to another server. I/O and CPU contention makes a lot of difference.

...

Web Interface - 4 core, 8GB RAM

Normally I'd put the webui on the same device as the pollers, but in your instance you're going to have a couple of really slow to render pages which will benefit from a small number of very fast cores. You want the fastest single-process cpu you can get in here, a high-clock i7 would do nicely.

Probably just mounting it over NFS directly from the poller host would work.

...

NFS Server (for centralized /opt/observium - except for the MIBS

directory, which is copied across each server) - moving to an EMC VNX in 2 weeks.

NFS adds a fair bit of overhead to the entire process. I would very much be trying to work out ways of fitting the whole storage subsystem directly on to the polling server. Lots of 2.5" SAS disks in the poller host might suffice, or some form of high-throughput, low-latency external storage medium.

I assume you already have some idea of how your existing NFS server copes with the load, do you think you could scale it up to 300k interfaces?

You could look at running off decent quality SSDs. I know of a few large installs which do this. Below I mention RRD structure options to minimise writes, which may be useful if you drop the whole install across a few very fast, large SSDs.

...

rrdcached being implemented (any assistance here is helpful)

We found that RRDcached didn't add very much. I'm also not sure how safe it is to use in a multi-poller environment.

If you can fit them all on to a single host, you might gain a bit of i/o throughput by using it, though.

...

Modified poller-wrapper.py to distribute the devices that poll

within 120s across multiple instances of poller-wrapper.py running on multiple hosts. Devices that poll in more than 120s are polled on separate servers at 15 minute intervals.

Why? I'm assuming it's related to the order that devices are polled in. We should remove the ordering by poller time and poll in device added order, this would provide more even load.

...

poller-wrapper has been modified to allow for multiple instances

just like poller.php with MODULUS distribution

If you can build a single server large enough, you'd run a single poller-wrapper with 128 instances :)

...

Each instance of poller-wrapper.py gets an instance number and the

total number of instances.

All of the devices with the last poll time < 120 seconds are MOD'ed

with the device_id and the total number of instances and compared to the instance number - device_id MOD total_instances = this_instance

Tuning threads in each poller-wrapper.py - currently at 16 threads

and 2 instances on each 4 vCPU server for 32 threads running at once or 8 threads per core

...

The DC is on the west coast and that presents latency problems. We

may need to address with distributed polling

The issues we find with long-distance polling tend to come from network stability rather than latency. Some devices can be so slow to respond that we end up overlapping polling of a single host though (think a fully-loaded 6500 300ms away)

We do have some ideas about how to help solve the UDP-over-huge-distance problem involving HTTP-based proxying of requests, but that's quite a big job to rewrite our code to handle, so isn't something we're likely to get done in the near future.

...

We are at 1024MB in php.ini

Seems a little excessive to need this much RAM for a PHP process. You may have reached the point at which our in-PHP sorting system becomes unusable. This stuff might need to be rewritten for your size of install.

...

We are using xcache (tuning help is appreciated - or should we just

turn it off)?

Oh god don't turn it off! Your web interface would become unusable! :)

...

My questions are:

How can I change the default RRD behavior to use 15 minute

intervals instead of 5 minute intervals

15 minutes x 2 weeks

2 hours x 5 weeks

4 hours x 12 months

We want to keep the max/min/average/87.5th percentile (since only 8

measurements per 2 hours)

At the moment this isn't possible as the poller frequency is hard-coded in places around the code, but we could perhaps change that in the future.

I'm not sure what you mean by keeping a percentile, but in any case this isn't at all possible due to RRD's limitations.

It should be noted here that if you're storing your data on a rotational medium where speed isn't an issue, you might be better off aggregating as little as possible. A large amount of RRD's i/o load comes from when it aggregates high resolution data into low resolution data.

If you can afford the disk space, it might help you to store, say, 6 months of 5 minute data and then aggregate to 1 year of 2 hour data. This means that you're only generating a single aggregated data point every 2 hours, if you see what I mean?

Our RRDs are sized at the moment for my preference to run observium out of a RAM disk. You have long since passed the point where this is viable and are going to have to use a *lot* of spindles to get enough IOPS capacity, so perhaps removing the aggregation would work for you.

...

I don't see the configuration items for that.

There aren't any yet :)

...

Would we be better with a few big boxes rather than small VMs?

I think you'd be better off trying to size a single poller with a *lot* of cores and fast I/O, and supplementing this with a fast external MySQL server and a very high clock-speed webui server.

As with all free software projects, we listen to suggestions and requests, but at the end of the day, all that ever really gets implemented is what the individual development team members want for their own instances. Most of our installs are probably around the 10k ports mark, so we rarely work on things that would help the platform to scale to your size of installation.

We have special arrangements with a few large organisations where they sponsor development to add features and makes changes specifically for their requirements, this might be useful for you too. :)

adam.

4388

Age (days ago)

4388

Last active (days ago)

List overview

Download

4 comments

3 participants

tags (0)

participants (3)

Adam Armstrong
Joe Hoh
Peter Childs