Hi,

 

I solved my problem with the slow performance!

 

The problem was php cli. This is a known problem that was also an issue in 2007 I noticed when searching on google.

Can be noticed by trying to run: php –i on the command line. When it takes some time to display anything and waits 5 seconds before give the prompt back you have this problem.

Can by solved by removing modules from php to try which one is causing the problem on your system.

In my case it was the snmp.so module. When I remove this and type a simple php command line like php –i it just outputs all in 1 second.

 

When I run the poller now all my 412 devices with 32000 ports are checked in under 3 minutes.

 

Thanks for all the ideas for solving this.

 

Regards,

 

Frederik

 

From: observium [mailto:observium-bounces@observium.org] On Behalf Of Paul Gear
Sent: zaterdag 12 oktober 2013 23:24
To: Observium Network Observation System
Subject: Re: [Observium] poller performance

 

On 10/10/2013 11:36 PM, F.Reenders@utwente.nl wrote:

I've got 2 x Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz, 64 Gb ram and 2 x raid 1 SAS disks(4 disks) speed 10k.
It is a HP DL380G8.
Load is now 200. :) but that's because the 5 minutes is not enough.
 
I will try the noatime also.


I'm running on a pretty similar system; a Dell R720 with the same CPUs & RAM, and 2 x 7 15K SAS in RAID 10.  We also V2P-ed our server when we upgraded to this machine, and I was pretty underwhelmed at the performance improvement considering it's a bit overkill for our environment (details below).

Then I started digging into the poller stats and found that some of my remote Linux servers (which run the PPPoE for the branch's ADSL connection) were running around 200-300 seconds for poll time.  When I ran the poller in debug mode I found that the interfaces poll was taking a huge proportion of the poller's run time, even though they only have 3 NICs, plus ppp0 for ADSL and tun0 for OpenVPN.  But because the kernel gives both ppp0 and tun0 a new interface id every time the connection goes down & up again, net-snmp was reporting this as a new interface (duly noting a warning in syslog), we were ending up with hundreds of interfaces per server over time, and net-snmp seems to be particularly inefficient at reporting them (or perhaps Observium is trying to poll too much data from non-existent ports?).

Regardless, I rolled out a script with puppet to restart net-snmp every time ppp0 or tun0 comes up.  Now we have poll times for all those hosts < 20 secs and the load on our server doesn't go over about 1.5, even during polls with 32 concurrent pollers.

Regards,
Paul

P.S. Device/port counts from our Observium installation:

Total

Up

Down

Ignored

Disabled

Devices

187

165 up

3 down

5 ignored

14 disabled

Ports

2614

1044 up

12 down

1275 ignored

173 shutdown