Problem very LONG poll times
Last night when I went to bed the avg polling on a device was 5-50 sec and then morning it is 300-600 sec per device. My system is a vmware VM with 4 vcpu. 4gb ram and 15k drives in a SAN. I have check the easy stuff like SAN load or CPU load and look fine. Please help.
I'm not really sure what to suggest.
Do you have netscalers?
adam.
On 2013-06-22 16:06, Zach Underwood wrote:
Last night when I went to bed the avg polling on a device was 5-50 sec and then morning it is 300-600 sec per device. My system is a vmware VM with 4 vcpu. 4gb ram and 15k drives in a SAN. I have check the easy stuff like SAN load or CPU load and look fine. Please help.
-- Zach Underwood (RHCE,RHCSA,RHCT) My website [1]
My photes [2]
Links:
[1] http://zachunderwood.me [2] http://zunder1990.openphoto.me
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
I am using the current version from SVN. I am running Centos 6.4 x64 ( os updates was not ran last night.) I have no netscalers. The devices I have are hp switches, Extreme switches, vmware hosts,Ubiquiti AP, Mikrotik routers.
On Sat, Jun 22, 2013 at 12:37 PM, Adam Armstrong adama@memetic.org wrote:
I'm not really sure what to suggest.
Do you have netscalers?
adam.
On 2013-06-22 16:06, Zach Underwood wrote:
Last night when I went to bed the avg polling on a device was 5-50 sec and then morning it is 300-600 sec per device. My system is a vmware VM with 4 vcpu. 4gb ram and 15k drives in a SAN. I have check the easy stuff like SAN load or CPU load and look fine. Please help.
-- Zach Underwood (RHCE,RHCSA,RHCT) My website [1]
My photes [2]
Links:
[1] http://zachunderwood.me [2] http://zunder1990.openphoto.me
______________________________**_________________ observium mailing list observium@observium.org http://postman.memetic.org/**cgi-bin/mailman/listinfo/**observiumhttp://postman.memetic.org/cgi-bin/mailman/listinfo/observium
______________________________**_________________ observium mailing list observium@observium.org http://postman.memetic.org/**cgi-bin/mailman/listinfo/**observiumhttp://postman.memetic.org/cgi-bin/mailman/listinfo/observium
ok some more info. Here is the disk stats
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.36 2.30 2.57 2.38 65.42 37.43 20.79 0.01 2.29 0.96 0.48 dm-0 0.00 0.00 2.78 4.68 64.35 37.43 13.65 0.02 3.17 0.63 0.47 dm-1 0.00 0.00 0.03 0.00 0.28 0.00 8.00 0.00 1.79 1.06 0.00
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 16.40 0.00 9.20 0.00 204.80 22.26 0.02 2.26 0.59 0.54 dm-0 0.00 0.00 0.00 25.60 0.00 204.80 8.00 0.06 2.50 0.22 0.56 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 0.20 0.00 1.60 8.00 0.00 0.00 0.00 0.00 dm-0 0.00 0.00 0.00 0.20 0.00 1.60 8.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Here is a link to a debug I ran on one of the devices. http://zachunderwood.me/debug.txt
On Sat, Jun 22, 2013 at 12:57 PM, Zach Underwood zunder1990@gmail.comwrote:
I am using the current version from SVN. I am running Centos 6.4 x64 ( os updates was not ran last night.) I have no netscalers. The devices I have are hp switches, Extreme switches, vmware hosts,Ubiquiti AP, Mikrotik routers.
On Sat, Jun 22, 2013 at 12:37 PM, Adam Armstrong adama@memetic.orgwrote:
I'm not really sure what to suggest.
Do you have netscalers?
adam.
On 2013-06-22 16:06, Zach Underwood wrote:
Last night when I went to bed the avg polling on a device was 5-50 sec and then morning it is 300-600 sec per device. My system is a vmware VM with 4 vcpu. 4gb ram and 15k drives in a SAN. I have check the easy stuff like SAN load or CPU load and look fine. Please help.
-- Zach Underwood (RHCE,RHCSA,RHCT) My website [1]
My photes [2]
Links:
[1] http://zachunderwood.me [2] http://zunder1990.openphoto.me
______________________________**_________________ observium mailing list observium@observium.org http://postman.memetic.org/**cgi-bin/mailman/listinfo/**observiumhttp://postman.memetic.org/cgi-bin/mailman/listinfo/observium
______________________________**_________________ observium mailing list observium@observium.org http://postman.memetic.org/**cgi-bin/mailman/listinfo/**observiumhttp://postman.memetic.org/cgi-bin/mailman/listinfo/observium
-- Zach Underwood (RHCE,RHCSA,RHCT) My website http://zachunderwood.me My photes http://zunder1990.openphoto.me
The debug doesn't really help.
You have pretty low I/O load, so it's either cpu time related or network related.
What else is it doing?
adam.
On 2013-06-22 18:13, Zach Underwood wrote:
ok some more info. Here is the disk stats
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.36 2.30 2.57 2.38 65.42 37.43 20.79 0.01 2.29 0.96 0.48 dm-0 0.00 0.00 2.78 4.68 64.35 37.43 13.65 0.02 3.17 0.63 0.47 dm-1 0.00 0.00 0.03 0.00 0.28 0.00 8.00 0.00 1.79 1.06 0.00
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 16.40 0.00 9.20 0.00 204.80 22.26 0.02 2.26 0.59 0.54 dm-0 0.00 0.00 0.00 25.60 0.00 204.80 8.00 0.06 2.50 0.22 0.56 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 0.20 0.00 1.60 8.00 0.00 0.00 0.00 0.00 dm-0 0.00 0.00 0.00 0.20 0.00 1.60 8.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Here is a link to a debug I ran on one of the devices. http://zachunderwood.me/debug.txt [4]
On Sat, Jun 22, 2013 at 12:57 PM, Zach Underwood zunder1990@gmail.com wrote:
I am using the current version from SVN. I am running Centos 6.4 x64 ( os updates was not ran last night.) I have no netscalers. The devices I have are hp switches, Extreme switches, vmware hosts,Ubiquiti AP, Mikrotik routers.
On Sat, Jun 22, 2013 at 12:37 PM, Adam Armstrong adama@memetic.org wrote: I'm not really sure what to suggest.
Do you have netscalers?
adam.
On 2013-06-22 16:06, Zach Underwood wrote:
Last night when I went to bed the avg polling on a device was 5-50 sec and then morning it is 300-600 sec per device. My system is a vmware VM with 4 vcpu. 4gb ram and 15k drives in a SAN. I have check the easy stuff like SAN load or CPU load and look fine. Please help.
-- Zach Underwood (RHCE,RHCSA,RHCT) My website [1]
My photes [2]
Links:
[1] http://zachunderwood.me [1] [2] http://zunder1990.openphoto.me [2]
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [3] _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [3]
-- Zach Underwood (RHCE,RHCSA,RHCT) My website [1]
My photes [2]
-- Zach Underwood (RHCE,RHCSA,RHCT) My website [1]
My photes [2]
Links:
[1] http://zachunderwood.me [2] http://zunder1990.openphoto.me [3] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [4] http://zachunderwood.me/debug.txt
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
Here is top. This was taken during a poll There is almost no network traffic. This vm is only uesd for observium. [root@observium init.d]# top top - 14:12:01 up 3:34, 4 users, load average: 0.00, 0.01, 0.00 Tasks: 143 total, 1 running, 142 sleeping, 0 stopped, 0 zombie Cpu(s): 0.4%us, 0.1%sy, 0.0%ni, 99.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3924572k total, 591700k used, 3332872k free, 42380k buffers Swap: 4063224k total, 0k used, 4063224k free, 282400k cached PID to kill: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10878 root 20 0 116m 3044 1656 S 1.0 0.1 0:00.06 /usr/bin/snmpget -v1 -c -Otv -M /opt/observium/mibs udp:p 10862 root 20 0 15024 1344 1012 R 0.5 0.0 0:00.06 top 1 root 20 0 19356 1572 1268 S 0.0 0.0 0:01.65 /sbin/init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 [kthreadd] 3 root RT 0 0 0 0 S 0.0 0.0 0:00.01 [migration/0] 4 root 20 0 0 0 0 S 0.0 0.0 0:00.09 [ksoftirqd/0]
Even less of an idea now.
What happens if you try to run one of the snmp queries it runs?
Have you noticed any particular module being slow?
On 2013-06-22 19:38, Zach Underwood wrote:
Here is top. This was taken during a poll There is almost no network traffic. This vm is only uesd for observium. [root@observium init.d]# top top - 14:12:01 up 3:34, 4 users, load average: 0.00, 0.01, 0.00 Tasks: 143 total, 1 running, 142 sleeping, 0 stopped, 0 zombie Cpu(s): 0.4%us, 0.1%sy, 0.0%ni, 99.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3924572k total, 591700k used, 3332872k free, 42380k buffers Swap: 4063224k total, 0k used, 4063224k free, 282400k cached PID to kill: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10878 root 20 0 116m 3044 1656 S 1.0 0.1 0:00.06 /usr/bin/snmpget -v1 -c -Otv -M /opt/observium/mibs udp:p 10862 root 20 0 15024 1344 1012 R 0.5 0.0 0:00.06 top 1 root 20 0 19356 1572 1268 S 0.0 0.0 0:01.65 /sbin/init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 [kthreadd] 3 root RT 0 0 0 0 S 0.0 0.0 0:00.01 [migration/0] 4 root 20 0 0 0 0 S 0.0 0.0 0:00.09 [ksoftirqd/0]
-- Zach Underwood (RHCE,RHCSA,RHCT) My website [1]
My photes [2]
Links:
[1] http://zachunderwood.me [2] http://zunder1990.openphoto.me
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
Ok I pick one network switch and disable about 75% of the unused module on this switch and I got the poll down to 109 sec. This same 24 port switch was taking only 10sec to poll. In watch the debug poll on a few devices it looks like all of the modules are slow.
On Sat, Jun 22, 2013 at 3:22 PM, Adam Armstrong adama@memetic.org wrote:
Even less of an idea now.
What happens if you try to run one of the snmp queries it runs?
Have you noticed any particular module being slow?
On 2013-06-22 19:38, Zach Underwood wrote:
Here is top. This was taken during a poll There is almost no network traffic. This vm is only uesd for observium. [root@observium init.d]# top top - 14:12:01 up 3:34, 4 users, load average: 0.00, 0.01, 0.00 Tasks: 143 total, 1 running, 142 sleeping, 0 stopped, 0 zombie Cpu(s): 0.4%us, 0.1%sy, 0.0%ni, 99.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3924572k total, 591700k used, 3332872k free, 42380k buffers Swap: 4063224k total, 0k used, 4063224k free, 282400k cached PID to kill: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10878 root 20 0 116m 3044 1656 S 1.0 0.1 0:00.06 /usr/bin/snmpget -v1 -c -Otv -M /opt/observium/mibs udp:p 10862 root 20 0 15024 1344 1012 R 0.5 0.0 0:00.06 top 1 root 20 0 19356 1572 1268 S 0.0 0.0 0:01.65 /sbin/init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 [kthreadd] 3 root RT 0 0 0 0 S 0.0 0.0 0:00.01 [migration/0] 4 root 20 0 0 0 0 S 0.0 0.0 0:00.09 [ksoftirqd/0]
-- Zach Underwood (RHCE,RHCSA,RHCT) My website [1]
My photes [2]
Links:
[1] http://zachunderwood.me [2] http://zunder1990.openphoto.me
______________________________**_________________ observium mailing list observium@observium.org http://postman.memetic.org/**cgi-bin/mailman/listinfo/**observiumhttp://postman.memetic.org/cgi-bin/mailman/listinfo/observium
______________________________**_________________ observium mailing list observium@observium.org http://postman.memetic.org/**cgi-bin/mailman/listinfo/**observiumhttp://postman.memetic.org/cgi-bin/mailman/listinfo/observium
If it is not better by the morning I will install it on new hardware. Is there any docs that show how to move to a new server.
On Sat, Jun 22, 2013 at 4:21 PM, Zach Underwood zunder1990@gmail.comwrote:
Ok I pick one network switch and disable about 75% of the unused module on this switch and I got the poll down to 109 sec. This same 24 port switch was taking only 10sec to poll. In watch the debug poll on a few devices it looks like all of the modules are slow.
On Sat, Jun 22, 2013 at 3:22 PM, Adam Armstrong adama@memetic.org wrote:
Even less of an idea now.
What happens if you try to run one of the snmp queries it runs?
Have you noticed any particular module being slow?
On 2013-06-22 19:38, Zach Underwood wrote:
Here is top. This was taken during a poll There is almost no network traffic. This vm is only uesd for observium. [root@observium init.d]# top top - 14:12:01 up 3:34, 4 users, load average: 0.00, 0.01, 0.00 Tasks: 143 total, 1 running, 142 sleeping, 0 stopped, 0 zombie Cpu(s): 0.4%us, 0.1%sy, 0.0%ni, 99.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3924572k total, 591700k used, 3332872k free, 42380k buffers Swap: 4063224k total, 0k used, 4063224k free, 282400k cached PID to kill: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10878 root 20 0 116m 3044 1656 S 1.0 0.1 0:00.06 /usr/bin/snmpget -v1 -c -Otv -M /opt/observium/mibs udp:p 10862 root 20 0 15024 1344 1012 R 0.5 0.0 0:00.06 top 1 root 20 0 19356 1572 1268 S 0.0 0.0 0:01.65 /sbin/init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 [kthreadd] 3 root RT 0 0 0 0 S 0.0 0.0 0:00.01 [migration/0] 4 root 20 0 0 0 0 S 0.0 0.0 0:00.09 [ksoftirqd/0]
-- Zach Underwood (RHCE,RHCSA,RHCT) My website [1]
My photes [2]
Links:
[1] http://zachunderwood.me [2] http://zunder1990.openphoto.me
______________________________**_________________ observium mailing list observium@observium.org http://postman.memetic.org/**cgi-bin/mailman/listinfo/**observiumhttp://postman.memetic.org/cgi-bin/mailman/listinfo/observium
______________________________**_________________ observium mailing list observium@observium.org http://postman.memetic.org/**cgi-bin/mailman/listinfo/**observiumhttp://postman.memetic.org/cgi-bin/mailman/listinfo/observium
-- Zach Underwood (RHCE,RHCSA,RHCT) My website http://zachunderwood.me My photes http://zunder1990.openphoto.me
What about if you snmpwalk your switch from the server. Does it respond quickly or is it slower than you'd expect, ie line by line on a host <10ms away?
That should help narrow down whether or not it's a local issue
On 2013-06-23 6:21 am, Zach Underwood wrote:
Ok I pick one network switch and disable about 75% of the unused module on this switch and I got the poll down to 109 sec. This same 24 port switch was taking only 10sec to poll. In watch the debug poll on a few devices it looks like all of the modules are slow.
On Sat, Jun 22, 2013 at 3:22 PM, Adam Armstrong adama@memetic.org wrote:
Even less of an idea now.
What happens if you try to run one of the snmp queries it runs?
Have you noticed any particular module being slow?
On 2013-06-22 19:38, Zach Underwood wrote:
Here is top. This was taken during a poll There is almost no network traffic. This vm is only uesd for observium. [root@observium init.d]# top top - 14:12:01 up 3:34, 4 users, load average: 0.00, 0.01, 0.00 Tasks: 143 total, 1 running, 142 sleeping, 0 stopped, 0 zombie Cpu(s): 0.4%us, 0.1%sy, 0.0%ni, 99.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3924572k total, 591700k used, 3332872k free, 42380k buffers Swap: 4063224k total, 0k used, 4063224k free, 282400k cached PID to kill: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10878 root 20 0 116m 3044 1656 S 1.0 0.1 0:00.06 /usr/bin/snmpget -v1 -c -Otv -M /opt/observium/mibs udp:p 10862 root 20 0 15024 1344 1012 R 0.5 0.0 0:00.06 top 1 root 20 0 19356 1572 1268 S 0.0 0.0 0:01.65 /sbin/init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 [kthreadd] 3 root RT 0 0 0 0 S 0.0 0.0 0:00.01 [migration/0] 4 root 20 0 0 0 0 S 0.0 0.0 0:00.09 [ksoftirqd/0]
-- Zach Underwood (RHCE,RHCSA,RHCT)
My website [1]
My photes [2]
Links:
[1] http://zachunderwood.me [1] [2] http://zunder1990.openphoto.me [2]
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [3]
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [3]
-- Zach Underwood (RHCE,RHCSA,RHCT) My website [1]
My photes [2]
Links:
[1] http://zachunderwood.me [2] http://zunder1990.openphoto.me [3] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
participants (3)
-
Adam Armstrong
-
Cameron Daniel
-
Zach Underwood