![](https://secure.gravatar.com/avatar/3dad2cdf144d076c29cf19288ca31729.jpg?s=120&d=mm&r=g)
Hi Adam,
Thanks for taking the time to provide feedback. We love Observium. I hope to break out the database to another VM very soon as well as double the cores to 8 on my current VM. We have about 20k ports currently with 218 devices, but with all our devices I'm guessing we'll have about 60k ports total. Does it help reduce CPU/Polling if the ports are admin down or down down verses shutdown? Right now I only have VMs at my disposal, therefore I'm attempting to maximize the environment.
-Chip
On Tue, Apr 8, 2014 at 1:06 PM, Adam Armstrong adama@memetic.org wrote:
Hi Chip,
The number of devices isn't so important, the number of ports is generally a more valid method of gauging requirements (100 Linux servers will probably poll faster than 10 cisco 6509s)
We have no clean mechanism of having multiple poller systems, you always need to share storage and MySQL which tends to remove most of the benefits, especially as it's not terribly difficult to get 24-core systems these days.
You will likely have 4 main scalability issues :
i) MySQL throughput ii) Disk I/O throughput iii) CPU time, mostly to run PHP and parse MIBs iv) SNMP response latency
There are a few strategies you can use to help mitigate these, and they're probably effective beyond the point that 99% of users will be trying to scale to:
i) Move MySQL to separate hardware, ask a MySQL guy how to make MySQL go superwooshfast. ii) Move RRD storage to a ram disk Reduce the amount of averaging in the RRD structure to reduce I/O at the expense of disk space Try rrdcached (never seems to make much difference for us though, tbh) Add more (faster) spindles to your system. More disks means more I/O capacity. Move RRD storage to a separate system with more, faster disks and better caching (adds latency) iii) Move to a faster system. We don't think it's worth the hassle to move polling to multiple systems until you're past the point that a single system can't comfortably scale. 12 core systems are easy. iv) Run multiple pollers in parallel. You can relatively easily scale through 4, 8, 12, 16 cores. Once we have more than a single user who needs 24 cores, maybe then it's worth looking at separate pollers.
A lot of people get hung up on how long it takes to finish polling a device. This isn't so important, so long as each part of each poller run is ~5 minutes apart. It's generally a good idea to run as many poller instances as your system will accomodate accounting for CPU, MySQL and I/O. If a system takes 15 minutes to poll because it's far away or replies slowly, that's no problem, because the pollers are started 5 minutes apart, so the CPU module will be run 5 minutes apart and the ports module will be run 5 minutes apart, etc.
adam.
On 2014-04-04 05:45, Chip Pleasants wrote:
Thank you for the ping tuning. I've added those to the config. However, as soon as turn on the external poller I get snmp down alerts from the poller that lives on the all in one server solution. I'm assuming the alerts emails get sent local to the poller? Not sure really where to go from here if I want to use observium for 800 more devices. I plan to break out the database, which shouldn't be difficult and should give some relief the server for polling, but if I can't get the external poller working it may be a show stopper for me. If it is possible I'm really wanting to know more about more about the multiple pollers. Any suggestions and time is appreciated.
Thanks, Chip
On Wed, Apr 2, 2014 at 6:47 PM, Mike Stupalov mike@observium.org wrote:
in includes/defaults.inc.php:
$config['fping'] = "/usr/bin/fping"; $config['fping6'] = "/usr/bin/fping6";
// PING Settings - Retries/Timeouts #$config['ping']['retries'] = 3; // How many times to retry ping (1 - 10) #$config['ping']['timeout'] = 500; // Timeout in milliseconds (50 - 2000)
On Thu, Apr 3, 2014 at 1:50 AM, Chip Pleasants wpleasants@gmail.com wrote:
When there are multiple pollers via NFS how do they pick the devices to poll? Basically how to they not step all over each other polling the same nodes? I'm seeing snmp and ping alerts like 15 or so every hour come in that didn't come in when it was a single server solution. It does seem to be the same 20 or so devices. These particular devices do typically take around 60 sec to poll. Looking at devices that generated alerts their cpu, snmp response time, and ping time was all over the place. Meaning cpu doubled (10% to 20%), ping times went up to 200ms from 10ms, and snmp response time average when from 50ms to 556ms. I'm wondering if I was polling these devices multiple times within 5 minutes? Would this be related to NFS IO issues? I reverted back to a single solution for now. Any assistance is greatly appreciated.
-Chip
Config Sniplet
$config['alerts']['email']['enable'] = TRUE; $config['poller-wrapper']['alerter'] = TRUE; $config['snmp']['timeout'] = 6;
$config['snmp']['retries'] = 3; $config['snmp']['max-rep'] = 10; $config['fping'] = "/usr/sbin/fping -t2000";
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [1]
-- Mike Stupalov http://observium.org/ [2]
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [1]
Links:
[1] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [2] http://observium.org/
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium