Problem very LONG poll times

newer
Re: [Observium] Problem very LONG...

older
Re: [Observium] Problem very LONG...

Zach Underwood

22 Jun 2013 22 Jun '13

11:06 a.m.

Last night when I went to bed the avg polling on a device was 5-50 sec and then morning it is 300-600 sec per device. My system is a vmware VM with 4 vcpu. 4gb ram and 15k drives in a SAN. I have check the easy stuff like SAN load or CPU load and look fine. Please help.

-- Zach Underwood (RHCE,RHCSA,RHCT) My website http://zachunderwood.me My photes http://zunder1990.openphoto.me

Attachments:

attachment.html (text/html — 524 bytes)

Show replies by date

Adam Armstrong

22 Jun 22 Jun

12:37 p.m.

I'm not really sure what to suggest.

Do you have netscalers?

adam.

On 2013-06-22 16:06, Zach Underwood wrote:

...

Last night when I went to bed the avg polling on a device was 5-50 sec and then morning it is 300-600 sec per device. My system is a vmware VM with 4 vcpu. 4gb ram and 15k drives in a SAN. I have check the easy stuff like SAN load or CPU load and look fine. Please help.

-- Zach Underwood (RHCE,RHCSA,RHCT) My website [1]

My photes [2]

Links:

[1] http://zachunderwood.me [2] http://zunder1990.openphoto.me

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

Zach Underwood

12:57 p.m.

I am using the current version from SVN. I am running Centos 6.4 x64 ( os updates was not ran last night.) I have no netscalers. The devices I have are hp switches, Extreme switches, vmware hosts,Ubiquiti AP, Mikrotik routers.

On Sat, Jun 22, 2013 at 12:37 PM, Adam Armstrong adama@memetic.org wrote:

...

I'm not really sure what to suggest.

Do you have netscalers?

adam.

On 2013-06-22 16:06, Zach Underwood wrote:

...
Last night when I went to bed the avg polling on a device was 5-50 sec and then morning it is 300-600 sec per device. My system is a vmware VM with 4 vcpu. 4gb ram and 15k drives in a SAN. I have check the easy stuff like SAN load or CPU load and look fine. Please help.

-- Zach Underwood (RHCE,RHCSA,RHCT) My website [1]

My photes [2]

Links:

[1] http://zachunderwood.me [2] http://zunder1990.openphoto.me

______________________________**_________________ observium mailing list observium@observium.org http://postman.memetic.org/**cgi-bin/mailman/listinfo/**observium http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

______________________________**_________________ observium mailing list observium@observium.org http://postman.memetic.org/**cgi-bin/mailman/listinfo/**observium http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

-- Zach Underwood (RHCE,RHCSA,RHCT) My website http://zachunderwood.me My photes http://zunder1990.openphoto.me

Zach Underwood

1:13 p.m.

ok some more info. Here is the disk stats

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.36 2.30 2.57 2.38 65.42 37.43 20.79 0.01 2.29 0.96 0.48 dm-0 0.00 0.00 2.78 4.68 64.35 37.43 13.65 0.02 3.17 0.63 0.47 dm-1 0.00 0.00 0.03 0.00 0.28 0.00 8.00 0.00 1.79 1.06 0.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 16.40 0.00 9.20 0.00 204.80 22.26 0.02 2.26 0.59 0.54 dm-0 0.00 0.00 0.00 25.60 0.00 204.80 8.00 0.06 2.50 0.22 0.56 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 0.20 0.00 1.60 8.00 0.00 0.00 0.00 0.00 dm-0 0.00 0.00 0.00 0.20 0.00 1.60 8.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Here is a link to a debug I ran on one of the devices. http://zachunderwood.me/debug.txt

On Sat, Jun 22, 2013 at 12:57 PM, Zach Underwood zunder1990@gmail.comwrote:

...

I am using the current version from SVN. I am running Centos 6.4 x64 ( os updates was not ran last night.) I have no netscalers. The devices I have are hp switches, Extreme switches, vmware hosts,Ubiquiti AP, Mikrotik routers.

On Sat, Jun 22, 2013 at 12:37 PM, Adam Armstrong adama@memetic.orgwrote:

...
I'm not really sure what to suggest.

Do you have netscalers?

adam.

On 2013-06-22 16:06, Zach Underwood wrote:

...
Last night when I went to bed the avg polling on a device was 5-50 sec and then morning it is 300-600 sec per device. My system is a vmware VM with 4 vcpu. 4gb ram and 15k drives in a SAN. I have check the easy stuff like SAN load or CPU load and look fine. Please help.

-- Zach Underwood (RHCE,RHCSA,RHCT) My website [1]

My photes [2]

Links:

[1] http://zachunderwood.me [2] http://zunder1990.openphoto.me

______________________________**_________________ observium mailing list observium@observium.org http://postman.memetic.org/**cgi-bin/mailman/listinfo/**observium http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

______________________________**_________________ observium mailing list observium@observium.org http://postman.memetic.org/**cgi-bin/mailman/listinfo/**observium http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

-- Zach Underwood (RHCE,RHCSA,RHCT) My website http://zachunderwood.me My photes http://zunder1990.openphoto.me

-- Zach Underwood (RHCE,RHCSA,RHCT) My website http://zachunderwood.me My photes http://zunder1990.openphoto.me

Adam Armstrong

2:02 p.m.

The debug doesn't really help.

You have pretty low I/O load, so it's either cpu time related or network related.

What else is it doing?

adam.

On 2013-06-22 18:13, Zach Underwood wrote:

...

ok some more info. Here is the disk stats

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.36 2.30 2.57 2.38 65.42 37.43 20.79 0.01 2.29 0.96 0.48 dm-0 0.00 0.00 2.78 4.68 64.35 37.43 13.65 0.02 3.17 0.63 0.47 dm-1 0.00 0.00 0.03 0.00 0.28 0.00 8.00 0.00 1.79 1.06 0.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 16.40 0.00 9.20 0.00 204.80 22.26 0.02 2.26 0.59 0.54 dm-0 0.00 0.00 0.00 25.60 0.00 204.80 8.00 0.06 2.50 0.22 0.56 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 0.20 0.00 1.60 8.00 0.00 0.00 0.00 0.00 dm-0 0.00 0.00 0.00 0.20 0.00 1.60 8.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Here is a link to a debug I ran on one of the devices. http://zachunderwood.me/debug.txt [4]

On Sat, Jun 22, 2013 at 12:57 PM, Zach Underwood zunder1990@gmail.com wrote:

I am using the current version from SVN. I am running Centos 6.4 x64 ( os updates was not ran last night.) I have no netscalers. The devices I have are hp switches, Extreme switches, vmware hosts,Ubiquiti AP, Mikrotik routers.

On Sat, Jun 22, 2013 at 12:37 PM, Adam Armstrong adama@memetic.org wrote: I'm not really sure what to suggest.

Do you have netscalers?

adam.

On 2013-06-22 16:06, Zach Underwood wrote:

Last night when I went to bed the avg polling on a device was 5-50 sec and then morning it is 300-600 sec per device. My system is a vmware VM with 4 vcpu. 4gb ram and 15k drives in a SAN. I have check the easy stuff like SAN load or CPU load and look fine. Please help.

-- Zach Underwood (RHCE,RHCSA,RHCT) My website [1]

My photes [2]

Links:

[1] http://zachunderwood.me [1] [2] http://zunder1990.openphoto.me [2]

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [3] _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [3]

-- Zach Underwood (RHCE,RHCSA,RHCT) My website [1]

My photes [2]

-- Zach Underwood (RHCE,RHCSA,RHCT) My website [1]

My photes [2]

Links:

[1] http://zachunderwood.me [2] http://zunder1990.openphoto.me [3] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [4] http://zachunderwood.me/debug.txt

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

Zach Underwood

2:38 p.m.

Here is top. This was taken during a poll There is almost no network traffic. This vm is only uesd for observium. [root@observium init.d]# top top - 14:12:01 up 3:34, 4 users, load average: 0.00, 0.01, 0.00 Tasks: 143 total, 1 running, 142 sleeping, 0 stopped, 0 zombie Cpu(s): 0.4%us, 0.1%sy, 0.0%ni, 99.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3924572k total, 591700k used, 3332872k free, 42380k buffers Swap: 4063224k total, 0k used, 4063224k free, 282400k cached PID to kill: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10878 root 20 0 116m 3044 1656 S 1.0 0.1 0:00.06 /usr/bin/snmpget -v1 -c -Otv -M /opt/observium/mibs udp:p 10862 root 20 0 15024 1344 1012 R 0.5 0.0 0:00.06 top 1 root 20 0 19356 1572 1268 S 0.0 0.0 0:01.65 /sbin/init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 [kthreadd] 3 root RT 0 0 0 0 S 0.0 0.0 0:00.01 [migration/0] 4 root 20 0 0 0 0 S 0.0 0.0 0:00.09 [ksoftirqd/0]

-- Zach Underwood (RHCE,RHCSA,RHCT) My website http://zachunderwood.me My photes http://zunder1990.openphoto.me

Adam Armstrong

3:22 p.m.

Even less of an idea now.

What happens if you try to run one of the snmp queries it runs?

Have you noticed any particular module being slow?

On 2013-06-22 19:38, Zach Underwood wrote:

...

Here is top. This was taken during a poll There is almost no network traffic. This vm is only uesd for observium. [root@observium init.d]# top top - 14:12:01 up 3:34, 4 users, load average: 0.00, 0.01, 0.00 Tasks: 143 total, 1 running, 142 sleeping, 0 stopped, 0 zombie Cpu(s): 0.4%us, 0.1%sy, 0.0%ni, 99.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3924572k total, 591700k used, 3332872k free, 42380k buffers Swap: 4063224k total, 0k used, 4063224k free, 282400k cached PID to kill: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10878 root 20 0 116m 3044 1656 S 1.0 0.1 0:00.06 /usr/bin/snmpget -v1 -c -Otv -M /opt/observium/mibs udp:p 10862 root 20 0 15024 1344 1012 R 0.5 0.0 0:00.06 top 1 root 20 0 19356 1572 1268 S 0.0 0.0 0:01.65 /sbin/init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 [kthreadd] 3 root RT 0 0 0 0 S 0.0 0.0 0:00.01 [migration/0] 4 root 20 0 0 0 0 S 0.0 0.0 0:00.09 [ksoftirqd/0]

-- Zach Underwood (RHCE,RHCSA,RHCT) My website [1]

My photes [2]

Links:

[1] http://zachunderwood.me [2] http://zunder1990.openphoto.me

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

Zach Underwood

4:21 p.m.

Ok I pick one network switch and disable about 75% of the unused module on this switch and I got the poll down to 109 sec. This same 24 port switch was taking only 10sec to poll. In watch the debug poll on a few devices it looks like all of the modules are slow.

On Sat, Jun 22, 2013 at 3:22 PM, Adam Armstrong adama@memetic.org wrote:

...

Even less of an idea now.

What happens if you try to run one of the snmp queries it runs?

Have you noticed any particular module being slow?

On 2013-06-22 19:38, Zach Underwood wrote:

...
Here is top. This was taken during a poll There is almost no network traffic. This vm is only uesd for observium. [root@observium init.d]# top top - 14:12:01 up 3:34, 4 users, load average: 0.00, 0.01, 0.00 Tasks: 143 total, 1 running, 142 sleeping, 0 stopped, 0 zombie Cpu(s): 0.4%us, 0.1%sy, 0.0%ni, 99.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3924572k total, 591700k used, 3332872k free, 42380k buffers Swap: 4063224k total, 0k used, 4063224k free, 282400k cached PID to kill: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10878 root 20 0 116m 3044 1656 S 1.0 0.1 0:00.06 /usr/bin/snmpget -v1 -c -Otv -M /opt/observium/mibs udp:p 10862 root 20 0 15024 1344 1012 R 0.5 0.0 0:00.06 top 1 root 20 0 19356 1572 1268 S 0.0 0.0 0:01.65 /sbin/init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 [kthreadd] 3 root RT 0 0 0 0 S 0.0 0.0 0:00.01 [migration/0] 4 root 20 0 0 0 0 S 0.0 0.0 0:00.09 [ksoftirqd/0]

-- Zach Underwood (RHCE,RHCSA,RHCT) My website [1]

My photes [2]

Links:

[1] http://zachunderwood.me [2] http://zunder1990.openphoto.me

______________________________**_________________ observium mailing list observium@observium.org http://postman.memetic.org/**cgi-bin/mailman/listinfo/**observium http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

______________________________**_________________ observium mailing list observium@observium.org http://postman.memetic.org/**cgi-bin/mailman/listinfo/**observium http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

-- Zach Underwood (RHCE,RHCSA,RHCT) My website http://zachunderwood.me My photes http://zunder1990.openphoto.me

Zach Underwood

8:28 p.m.

If it is not better by the morning I will install it on new hardware. Is there any docs that show how to move to a new server.

On Sat, Jun 22, 2013 at 4:21 PM, Zach Underwood zunder1990@gmail.comwrote:

...

Ok I pick one network switch and disable about 75% of the unused module on this switch and I got the poll down to 109 sec. This same 24 port switch was taking only 10sec to poll. In watch the debug poll on a few devices it looks like all of the modules are slow.

On Sat, Jun 22, 2013 at 3:22 PM, Adam Armstrong adama@memetic.org wrote:

...
Even less of an idea now.

What happens if you try to run one of the snmp queries it runs?

Have you noticed any particular module being slow?

On 2013-06-22 19:38, Zach Underwood wrote:

...
Here is top. This was taken during a poll There is almost no network traffic. This vm is only uesd for observium. [root@observium init.d]# top top - 14:12:01 up 3:34, 4 users, load average: 0.00, 0.01, 0.00 Tasks: 143 total, 1 running, 142 sleeping, 0 stopped, 0 zombie Cpu(s): 0.4%us, 0.1%sy, 0.0%ni, 99.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3924572k total, 591700k used, 3332872k free, 42380k buffers Swap: 4063224k total, 0k used, 4063224k free, 282400k cached PID to kill: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10878 root 20 0 116m 3044 1656 S 1.0 0.1 0:00.06 /usr/bin/snmpget -v1 -c -Otv -M /opt/observium/mibs udp:p 10862 root 20 0 15024 1344 1012 R 0.5 0.0 0:00.06 top 1 root 20 0 19356 1572 1268 S 0.0 0.0 0:01.65 /sbin/init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 [kthreadd] 3 root RT 0 0 0 0 S 0.0 0.0 0:00.01 [migration/0] 4 root 20 0 0 0 0 S 0.0 0.0 0:00.09 [ksoftirqd/0]

-- Zach Underwood (RHCE,RHCSA,RHCT) My website [1]

My photes [2]

Links:

[1] http://zachunderwood.me [2] http://zunder1990.openphoto.me

______________________________**_________________ observium mailing list observium@observium.org http://postman.memetic.org/**cgi-bin/mailman/listinfo/**observium http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

______________________________**_________________ observium mailing list observium@observium.org http://postman.memetic.org/**cgi-bin/mailman/listinfo/**observium http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

-- Zach Underwood (RHCE,RHCSA,RHCT) My website http://zachunderwood.me My photes http://zunder1990.openphoto.me

-- Zach Underwood (RHCE,RHCSA,RHCT) My website http://zachunderwood.me My photes http://zunder1990.openphoto.me

Cameron Daniel

9:09 p.m.

What about if you snmpwalk your switch from the server. Does it respond quickly or is it slower than you'd expect, ie line by line on a host <10ms away?

That should help narrow down whether or not it's a local issue

On 2013-06-23 6:21 am, Zach Underwood wrote:

...

Ok I pick one network switch and disable about 75% of the unused module on this switch and I got the poll down to 109 sec. This same 24 port switch was taking only 10sec to poll. In watch the debug poll on a few devices it looks like all of the modules are slow.

On Sat, Jun 22, 2013 at 3:22 PM, Adam Armstrong adama@memetic.org wrote:

...
Even less of an idea now.

What happens if you try to run one of the snmp queries it runs?

Have you noticed any particular module being slow?

On 2013-06-22 19:38, Zach Underwood wrote:

...
Here is top. This was taken during a poll There is almost no network traffic. This vm is only uesd for observium. [root@observium init.d]# top top - 14:12:01 up 3:34, 4 users, load average: 0.00, 0.01, 0.00 Tasks: 143 total, 1 running, 142 sleeping, 0 stopped, 0 zombie Cpu(s): 0.4%us, 0.1%sy, 0.0%ni, 99.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3924572k total, 591700k used, 3332872k free, 42380k buffers Swap: 4063224k total, 0k used, 4063224k free, 282400k cached PID to kill: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10878 root 20 0 116m 3044 1656 S 1.0 0.1 0:00.06 /usr/bin/snmpget -v1 -c -Otv -M /opt/observium/mibs udp:p 10862 root 20 0 15024 1344 1012 R 0.5 0.0 0:00.06 top 1 root 20 0 19356 1572 1268 S 0.0 0.0 0:01.65 /sbin/init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 [kthreadd] 3 root RT 0 0 0 0 S 0.0 0.0 0:00.01 [migration/0] 4 root 20 0 0 0 0 S 0.0 0.0 0:00.09 [ksoftirqd/0]

-- Zach Underwood (RHCE,RHCSA,RHCT)

My website [1]

My photes [2]

Links:

[1] http://zachunderwood.me [1] [2] http://zunder1990.openphoto.me [2]

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [3]

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [3]

-- Zach Underwood (RHCE,RHCSA,RHCT) My website [1]

My photes [2]

Links:

[1] http://zachunderwood.me [2] http://zunder1990.openphoto.me [3] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

4660

Age (days ago)

4661

Last active (days ago)

List overview

Download

9 comments

3 participants

tags (0)

participants (3)

Adam Armstrong
Cameron Daniel
Zach Underwood