On 10/10/2013 11:36 PM, F.Reenders@utwente.nl wrote:
I've got 2 x Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz, 64 Gb ram and 2 x raid 1 SAS disks(4 disks) speed 10k. It is a HP DL380G8. Load is now 200. :) but that's because the 5 minutes is not enough.
I will try the noatime also.
I'm running on a pretty similar system; a Dell R720 with the same CPUs & RAM, and 2 x 7 15K SAS in RAID 10. We also V2P-ed our server when we upgraded to this machine, and I was pretty underwhelmed at the performance improvement considering it's a bit overkill for our environment (details below).
Then I started digging into the poller stats and found that some of my remote Linux servers (which run the PPPoE for the branch's ADSL connection) were running around 200-300 seconds for poll time. When I ran the poller in debug mode I found that the interfaces poll was taking a huge proportion of the poller's run time, even though they only have 3 NICs, plus ppp0 for ADSL and tun0 for OpenVPN. But because the kernel gives both ppp0 and tun0 a new interface id every time the connection goes down & up again, net-snmp was reporting this as a new interface (duly noting a warning in syslog), we were ending up with hundreds of interfaces per server over time, and net-snmp seems to be particularly inefficient at reporting them (or perhaps Observium is trying to poll too much data from non-existent ports?).
Regardless, I rolled out a script with puppet to restart net-snmp every time ppp0 or tun0 comes up. Now we have poll times for all those hosts < 20 secs and the load on our server doesn't go over about 1.5, even during polls with 32 concurrent pollers.
Regards, Paul
P.S. Device/port counts from our Observium installation:
Total Up Down Ignored Disabled *Devices http://observium.buq.org.au/devices/* 187 http://observium.buq.org.au/devices/ 165 up http://observium.buq.org.au/devices/status=1/ 3 down http://observium.buq.org.au/devices/status=0/ 5 ignored http://observium.buq.org.au/devices/ignore=1/ 14 disabled http://observium.buq.org.au/devices/disabled=1/ *Ports http://observium.buq.org.au/ports/* 2614 http://observium.buq.org.au/ports/ 1044 up http://observium.buq.org.au/ports/state=up/ 12 down http://observium.buq.org.au/ports/state=down/ 1275 ignored http://observium.buq.org.au/ports/ignore=1/ 173 shutdown http://observium.buq.org.au/ports/state=admindown/