Polling Taking Longer than 5 minutes for a few devices
I have a few devices that take longer than 5 minutes to poll. Looking at the polling information Observium provides it reports 999.99s for two of the three and the third one reports 975.37s under Last Polled. I believe the result of taking to long o poll is the breaks I see in my graphs for only these three devices.
These boxes have about 150 vlan interfaces and are fully populated 6509s. I have adjusted the poller-wrapper.py to 10 and $config['snmp']['max-rep'] = 10 right now. A 4 processor VM averages about 60 cpu utilization constantly. The Server's Disk IO Ops/sec average is about 110. Today I only have 52 out of 1300 devices loaded into it.
Any suggestions to speed up the polling on these few boxes or is the only solution add another polling server and do odd even or something of that nature. Maybe there is another issue I'm not seeing? Thanks in advance for any assistance/direction.
-Chip
Hi Chip, We noticed with some devices that polling via SNMP v1 can take dramatically longer than polling via SNMP v2/3 - I'm not sure if this helps in your scenario. Cheers, Nathan
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Chip Pleasants Sent: Tuesday, 4 March 2014 7:26 AM To: Observium Network Observation System Subject: [Observium] Polling Taking Longer than 5 minutes for a few devices
I have a few devices that take longer than 5 minutes to poll. Looking at the polling information Observium provides it reports 999.99s for two of the three and the third one reports 975.37s under Last Polled. I believe the result of taking to long o poll is the breaks I see in my graphs for only these three devices.
These boxes have about 150 vlan interfaces and are fully populated 6509s. I have adjusted the poller-wrapper.py to 10 and $config['snmp']['max-rep'] = 10 right now. A 4 processor VM averages about 60 cpu utilization constantly. The Server's Disk IO Ops/sec average is about 110. Today I only have 52 out of 1300 devices loaded into it.
Any suggestions to speed up the polling on these few boxes or is the only solution add another polling server and do odd even or something of that nature. Maybe there is another issue I'm not seeing? Thanks in advance for any assistance/direction.
-Chip
Iirc 6500 series really slow populate arp/fdb tables via snmp. So you may disable these modules to speed up
On 04 марта 2014 г., at 0:25, Chip Pleasants wpleasants@gmail.com wrote:
I have a few devices that take longer than 5 minutes to poll. Looking at the polling information Observium provides it reports 999.99s for two of the three and the third one reports 975.37s under Last Polled. I believe the result of taking to long o poll is the breaks I see in my graphs for only these three devices.
These boxes have about 150 vlan interfaces and are fully populated 6509s. I have adjusted the poller-wrapper.py to 10 and $config['snmp']['max-rep'] = 10 right now. A 4 processor VM averages about 60 cpu utilization constantly. The Server's Disk IO Ops/sec average is about 110. Today I only have 52 out of 1300 devices loaded into it.
Any suggestions to speed up the polling on these few boxes or is the only solution add another polling server and do odd even or something of that nature. Maybe there is another issue I'm not seeing? Thanks in advance for any assistance/direction.
-Chip
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
I confirmed I'm only polling using v2c. Can I disable features per device? I'm guessing it would it be under the Device Details and Poller Modules (fdb-table and arp-table)?
-Chip
On Mon, Mar 3, 2014 at 5:43 PM, Nikolay Shopik shopik@inblock.ru wrote:
Iirc 6500 series really slow populate arp/fdb tables via snmp. So you may disable these modules to speed up
On 04 марта 2014 г., at 0:25, Chip Pleasants wpleasants@gmail.com
wrote:
I have a few devices that take longer than 5 minutes to poll. Looking
at the polling information Observium provides it reports 999.99s for two of the three and the third one reports 975.37s under Last Polled. I believe the result of taking to long o poll is the breaks I see in my graphs for only these three devices.
These boxes have about 150 vlan interfaces and are fully populated
6509s. I have adjusted the poller-wrapper.py to 10 and $config['snmp']['max-rep'] = 10 right now. A 4 processor VM averages about 60 cpu utilization constantly. The Server's Disk IO Ops/sec average is about 110. Today I only have 52 out of 1300 devices loaded into it.
Any suggestions to speed up the polling on these few boxes or is the
only solution add another polling server and do odd even or something of that nature. Maybe there is another issue I'm not seeing? Thanks in advance for any assistance/direction.
-Chip
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
On 2014-03-03 17:41, Chip Pleasants wrote:
I confirmed I'm only polling using v2c. Can I disable features per device? I'm guessing it would it be under the Device Details and Poller Modules (fdb-table and arp-table)?
Yes. On 6500s you probably want to disable these two modules as they seem to communicate with their linecards to collect them, which isn't fast.
adam.
You can see which modules take how long in the device's time reports; compare notes to Nikolay's email and go from there :-)
On 03/03/2014 21:25, Chip Pleasants wrote:
I have a few devices that take longer than 5 minutes to poll. Looking at the polling information Observium provides it reports 999.99s for two of the three and the third one reports 975.37s under Last Polled. I believe the result of taking to long o poll is the breaks I see in my graphs for only these three devices.
These boxes have about 150 vlan interfaces and are fully populated 6509s. I have adjusted the poller-wrapper.py to 10 and $config['snmp']['max-rep'] = 10 right now. A 4 processor VM averages about 60 cpu utilization constantly. The Server's Disk IO Ops/sec average is about 110. Today I only have 52 out of 1300 devices loaded into it.
Any suggestions to speed up the polling on these few boxes or is the only solution add another polling server and do odd even or something of that nature. Maybe there is another issue I'm not seeing? Thanks in advance for any assistance/direction.
-Chip
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
Thanks for the input! I didn't even know the time reports per module were even there! The down side is the ports module took 67% or 581.9524s and the fdb-table took 14% or 122.0170s to complete. Doesn't look like disabling the fdb-table module will do much good if the ports modules it taking almost 600s. These are almost fully populated 6509s, but should they really be taking 10 minutes to poll all the interfaces? Manually polling using snmpbulkwalk show the slow response from these devices. Odd part is the cpu isn't high on the RP nor is the SNMP Engine or IP SNMP processes showing high cpu. They are running pretty new code 122-33.SXJ6.
-Chip
On Mon, Mar 3, 2014 at 6:53 PM, Tom Laermans tom.laermans@powersource.cxwrote:
You can see which modules take how long in the device's time reports; compare notes to Nikolay's email and go from there :-)
On 03/03/2014 21:25, Chip Pleasants wrote:
I have a few devices that take longer than 5 minutes to poll. Looking at the polling information Observium provides it reports 999.99s for two of the three and the third one reports 975.37s under Last Polled. I believe the result of taking to long o poll is the breaks I see in my graphs for only these three devices.
These boxes have about 150 vlan interfaces and are fully populated 6509s. I have adjusted the poller-wrapper.py to 10 and $config['snmp']['max-rep'] = 10 right now. A 4 processor VM averages about 60 cpu utilization constantly. The Server's Disk IO Ops/sec average is about 110. Today I only have 52 out of 1300 devices loaded into it.
Any suggestions to speed up the polling on these few boxes or is the only solution add another polling server and do odd even or something of that nature. Maybe there is another issue I'm not seeing? Thanks in advance for any assistance/direction.
-Chip
observium mailing listobservium@observium.orghttp://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
Turns out the disabling the fdb-table did the trick on these specific devices. It looks like the fdb-table is using BRIDGE-MIB to gather information? Is the BRIDGE-MIB used for anything else in Observium? Does the BRIDGE-MIB correspond to a "show mac address-table" on the boxes? I'm wondering what the specific OIDs its using in the BRIDGE-MIB as I have another application gather mac forwarding tables, therefore I'm wondering if it causing the same issue. Also, could someone let me know what functionality in Observium I'm losing by disable the fdb-table polling. I'm assuming I wont be able to see/search for MACs under the Ports/ARP/NDT Tables? Thank you in advance for any feedback and your time answering my barrage of questions.
Thanks, Chip
On Mon, Mar 3, 2014 at 8:05 PM, Chip Pleasants wpleasants@gmail.com wrote:
Thanks for the input! I didn't even know the time reports per module were even there! The down side is the ports module took 67% or 581.9524s and the fdb-table took 14% or 122.0170s to complete. Doesn't look like disabling the fdb-table module will do much good if the ports modules it taking almost 600s. These are almost fully populated 6509s, but should they really be taking 10 minutes to poll all the interfaces? Manually polling using snmpbulkwalk show the slow response from these devices. Odd part is the cpu isn't high on the RP nor is the SNMP Engine or IP SNMP processes showing high cpu. They are running pretty new code 122-33.SXJ6.
-Chip
On Mon, Mar 3, 2014 at 6:53 PM, Tom Laermans tom.laermans@powersource.cxwrote:
You can see which modules take how long in the device's time reports; compare notes to Nikolay's email and go from there :-)
On 03/03/2014 21:25, Chip Pleasants wrote:
I have a few devices that take longer than 5 minutes to poll. Looking at the polling information Observium provides it reports 999.99s for two of the three and the third one reports 975.37s under Last Polled. I believe the result of taking to long o poll is the breaks I see in my graphs for only these three devices.
These boxes have about 150 vlan interfaces and are fully populated 6509s. I have adjusted the poller-wrapper.py to 10 and $config['snmp']['max-rep'] = 10 right now. A 4 processor VM averages about 60 cpu utilization constantly. The Server's Disk IO Ops/sec average is about 110. Today I only have 52 out of 1300 devices loaded into it.
Any suggestions to speed up the polling on these few boxes or is the only solution add another polling server and do odd even or something of that nature. Maybe there is another issue I'm not seeing? Thanks in advance for any assistance/direction.
-Chip
observium mailing listobservium@observium.orghttp://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
participants (5)
-
Adam Armstrong
-
Chip Pleasants
-
Nathan Phelan
-
Nikolay Shopik
-
Tom Laermans