On 2013-10-09 12:35, F.Reenders@utwente.nl wrote:
Hi,
I'm having problems with the performance of the poller. We have 200 switches and almost 16000 ports included and the poller is having trouble getting them checked in 5 minutes. With 20 threads it's working with a load of 15. When I add more threads it takes longer to get the data from switches. With less threads it also doesn't check all switches in 5 minutes. I'm using new fast hardware.
I've implemented all the performance tuning tips on the observium site. I also disabled the checking of fbd-table, arp-table and mac-accounting. When I debug the poller the ports check takes the longest. About 90 % of the total time. 1 switch takes about 8 to 10 seconds now.
Is there a way to speed it up? Maybe I can extend the check cycle time to 10 minutes? Distributed poller instances?
Your bottleneck is almost certainly I/O. We write a *lot* of data.
The solution is either RAM disks or SSD.
20k ports should fit in 48GB of RAM. There are instructions on how to do this properly on the Wiki. With a RAM disk you don't worry about I/O at all, only CPU and network performance.
Alternatively you can use an SSD, which have much higher throughput than a harddisk (still not unlimited, and their write speeds can be a bit slow).
adam.