This brings up an interesting question. Are there any plans to implement a master/slave configuration like Opsview or Nagios uses?
Thanks, Scott Brawner
-----Original Message----- From: observium [mailto:observium-bounces@observium.org] On Behalf Of Jeremy Custenborder Sent: Thursday, February 06, 2014 12:35 PM To: Observium Network Observation System Subject: Re: [Observium] Scaling Observium with Rancid, Smokeping, and Syslog-ng
I think this is one I can speak to. At least running Observium on NFS. We have a commercial product for Rancid, Syslog-ng is another team, and we don't use smoke ping.
I use Observium to monitor about 700 remote locations and a few data centers with ~7k devices resulting in 300k ports. This is mostly Cisco devices, another team monitors server level metrics. Right off the bat I would recommend keeping the environment as small as possible if you can. As long as you can poll all of your devices in 5 minutes, I would not break it out. If you are not finishing in 5 minutes then consider this a path like this. It greatly increases the complexity of the environment and will put you in an unsupported configuration. Given our network is spread all over the country, our biggest problem is latency not raw hardware.
We queue all checks through RabbitMQ. Cron jobs determines which devices need to be polled or discovered and writes it a queue.
We're running 100% on VMWare on a Cisco UCS stack. The OS is Centos 6.5. 1x DB Server (4 core, 16 gb of ram) 1x NFS Server (4 core, 32gb ram, Fusion IO disk, XFS filesystem. Tuned for heavy write cache) 1x Web Server (4 core, 16 gb of ram, RabbitMQ, cron tasks to populate RabbitMQ) 16x Poller(2 core, 8gb of ram) 1x Nagios Server (Specific to this infrastructure)
We also have a nagios environment with about 100k checks against that are generated from the observium database based on IOS feature types, etc.
Observations:
1.) Pollers spend an insane amount of IO checking the mibs. We were seeing 4-10k nfs operations a second per poller when loading from NFS. We have a job that syncs the mibs from NFS to a cache directory on the pollers local disk. We override mibdir in config to point to this directory. This dropped our operations per second by 90%. We were noticing that even though these files were in cache, it sill resulted in a stat to nfs.
2.) You have to monitor the hell out of this. We watch for stale processes, number of processes, last write times, network availability.
3.) The NFS box is going to be 98% of your tuning effort. Our database stayed around 8gb as long as we truncated the discovery and poller log tables (Per Adam they are not used for anything). It grew by a couple million rows a day. We truncate these tables hourly. We give mysql 12gb of ram and 4gb is left to the OS. This seems to work fine.
4.) Mount options for the pollers are your friend. We used this: mount -o 'rw,async,noatime,nodiratime,noacl,rsize=32768,wsize=32768' nfsbox:/mount/path /mnt/something
5.) Consistent configuration is huge. Use puppet or chef.
6.) When the database has 300k ports page load is bad. Sometimes we saw 15 - 30 second page load times. A php profiler of the splash page was 100 megs. :) All of the port level access checks cause hundreds of thousands of calls to the database. We patched the code to not have port level security. We don't have a use for it.
On Tue, Feb 4, 2014 at 9:27 AM, Chip Pleasants wpleasants@gmail.com wrote:
Anyone have suggestions or real world experiences they could share scaling Observium with Rancid, Smokeping, and Syslog-ng. I have about 1000 router/switches in my network that equate to about 30k of interfaces. Half the interfaces will probably be down, although I'm not sure that makes a difference.
Basically I'm wondering if I should break out my tools to individual VMs and use something like NFS for Observium to be able to see Rancid,Smokeping, and Syslog-ng data. Feel free to shoot me a private note. I appreciate any feedback.
-Chip
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium