slow polling of Cisco 4500 VSS switches
![](https://secure.gravatar.com/avatar/a48a43e509ce8b6f7655db66a387c343.jpg?s=120&d=mm&r=g)
Hi all,
We have a very long poll time on some of the devices. We have a few Cisco 4500 switches with a lott of interfaces. For example a Cisco 4506 with 4x 48 ports linecard, the poll time for the device is: [cid:image001.png@01CEA3DD.702B0310]
As you can see the poll time changed around 28/29 July. I disabled the device around 31 July, as you can see.
We also have Cisco 4500 VSS setups with several hundreds of interfaces, these devices I disabled for now, because the poller couldn't keep up with it.
Today I found some time to troubleshoot, and found the above statistics. Beside that a did a poll with debug on. It "hangs" very long on every interface of every device it polls. After looking into that, it seems that it collects all interfaces per metric it polls. I think that is the problem with devices with a big amount of interfaces, because it collects all statistics again for every interface.
How can I fix this? Is this by design/Is this a bug? Or is it something in my setup?
Regards,
Bastiaan Topper Korton Group BV
![](https://secure.gravatar.com/avatar/cdc8df7ce0f887b1f21d656c6097ac23.jpg?s=120&d=mm&r=g)
Are you running fairly new software? I've found some of the older 2010 code have problems handling snmp polling that would even bring down the supervisor on occassions. The polling was from appliances other than observium.
![](https://secure.gravatar.com/avatar/15c8f30b33d128d9c098441662abee8b.jpg?s=120&d=mm&r=g)
I ran into a similar issue with 3750s in a stack of 7, which took similar times to poll. What I found my issues were:
1) Make sure you're using poller-wrapper and not just a series of poller.h in your cron.
2) In your config.php, check your configuration options (http://observium.org/wiki/Configuration_Options#Poller_and_Discovery_Modules) for ports. Try to remove all of them and see if that improves your performance, and start adding them back one by one.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Bastiaan Topper Sent: Wednesday, August 28, 2013 4:58 AM To: observium@observium.org Subject: [Observium] slow polling of Cisco 4500 VSS switches
Hi all,
We have a very long poll time on some of the devices. We have a few Cisco 4500 switches with a lott of interfaces. For example a Cisco 4506 with 4x 48 ports linecard, the poll time for the device is: [cid:image001.png@01CEA3C4.0C533420]
As you can see the poll time changed around 28/29 July. I disabled the device around 31 July, as you can see.
We also have Cisco 4500 VSS setups with several hundreds of interfaces, these devices I disabled for now, because the poller couldn't keep up with it.
Today I found some time to troubleshoot, and found the above statistics. Beside that a did a poll with debug on. It "hangs" very long on every interface of every device it polls. After looking into that, it seems that it collects all interfaces per metric it polls. I think that is the problem with devices with a big amount of interfaces, because it collects all statistics again for every interface.
How can I fix this? Is this by design/Is this a bug? Or is it something in my setup?
Regards,
Bastiaan Topper Korton Group BV
![](https://secure.gravatar.com/avatar/0fa97865a0e1ab36152b6b2299eedb49.jpg?s=120&d=mm&r=g)
Your polling time seems to be pretty absurdly high. This definitely isn't normal behaviour.
This seems like a 4500 VSS bug, probably related to fetching interface counters from the other VSS member. Please make sure you're running the latest IOS.
I know of several 6500 VSS working without issues, but I've never encountered a 4500 VSS.
adam.
On 2013-08-28 09:58, Bastiaan Topper wrote:
Hi all,
We have a very long poll time on some of the devices. We have a few Cisco 4500 switches with a lott of interfaces. For example a Cisco 4506 with 4x 48 ports linecard, the poll time for the device is:
As you can see the poll time changed around 28/29 July. I disabled the device around 31 July, as you can see.
We also have Cisco 4500 VSS setups with several hundreds of interfaces, these devices I disabled for now, because the poller couldn't keep up with it.
Today I found some time to troubleshoot, and found the above statistics. Beside that a did a poll with debug on. It "hangs" very long on every interface of every device it polls. After looking into that, it seems that it collects all interfaces per metric it polls. I think that is the problem with devices with a big amount of interfaces, because it collects all statistics again for every interface.
How can I fix this? Is this by design/Is this a bug? Or is it something in my setup?
Regards,
Bastiaan Topper
Korton Group BV
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
![](https://secure.gravatar.com/avatar/a48a43e509ce8b6f7655db66a387c343.jpg?s=120&d=mm&r=g)
Hi Adam,
The problem is that it is not only with VSS switches. The screenshot is off a switch without VSS (so a normal 4500 switch, but with many interfaces). I also have 6500 switches (and even a 6500 VSS), these don't have the problem, but have a lot less interfaces. The 4500 switches don't all run the same IOS version. The VSS switches have 2 version, the not VSS switch has an other supervisor and much older IOS.
Thing I noticed is that the port polling collects all metrics for all interfaces for every interface it polls. For big switches, this is a lott of info each time it polls an interface. If I do a debug it show very long lists of collected metrics (output of all the snmpbulkwalk commands per interface), each time for all interfaces. Could it be that the problem is there somewhere, or am I interpreting the output wrong?
I am using the poller wrapper, thing is that it polls every 5 minutes, but the poll itself takes much longer. After a few hours my server crashes, because it is starting more polls than it finishes.
Regards, Bastiaan
-----Oorspronkelijk bericht----- Van: Adam Armstrong [mailto:adama@memetic.org] Verzonden: woensdag 28 augustus 2013 17:10 Aan: Observium Network Observation System Onderwerp: Re: [Observium] slow polling of Cisco 4500 VSS switches
Your polling time seems to be pretty absurdly high. This definitely isn't normal behaviour.
This seems like a 4500 VSS bug, probably related to fetching interface counters from the other VSS member. Please make sure you're running the latest IOS.
I know of several 6500 VSS working without issues, but I've never encountered a 4500 VSS.
adam.
On 2013-08-28 09:58, Bastiaan Topper wrote:
Hi all,
We have a very long poll time on some of the devices. We have a few Cisco 4500 switches with a lott of interfaces. For example a Cisco 4506 with 4x 48 ports linecard, the poll time for the device is:
As you can see the poll time changed around 28/29 July. I disabled the device around 31 July, as you can see.
We also have Cisco 4500 VSS setups with several hundreds of interfaces, these devices I disabled for now, because the poller couldn't keep up with it.
Today I found some time to troubleshoot, and found the above statistics. Beside that a did a poll with debug on. It "hangs" very long on every interface of every device it polls. After looking into that, it seems that it collects all interfaces per metric it polls. I think that is the problem with devices with a big amount of interfaces, because it collects all statistics again for every interface.
How can I fix this? Is this by design/Is this a bug? Or is it something in my setup?
Regards,
Bastiaan Topper
Korton Group BV
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
![](https://secure.gravatar.com/avatar/15c8f30b33d128d9c098441662abee8b.jpg?s=120&d=mm&r=g)
Can you post what your config.php looks like (sanitized for things like DB if you're not okay with that).
-----Original Message----- From: observium [mailto:observium-bounces@observium.org] On Behalf Of Bastiaan Topper Sent: Wednesday, August 28, 2013 12:29 PM To: Observium Network Observation System Subject: Re: [Observium] slow polling of Cisco 4500 VSS switches
Hi Adam,
The problem is that it is not only with VSS switches. The screenshot is off a switch without VSS (so a normal 4500 switch, but with many interfaces). I also have 6500 switches (and even a 6500 VSS), these don't have the problem, but have a lot less interfaces. The 4500 switches don't all run the same IOS version. The VSS switches have 2 version, the not VSS switch has an other supervisor and much older IOS.
Thing I noticed is that the port polling collects all metrics for all interfaces for every interface it polls. For big switches, this is a lott of info each time it polls an interface. If I do a debug it show very long lists of collected metrics (output of all the snmpbulkwalk commands per interface), each time for all interfaces. Could it be that the problem is there somewhere, or am I interpreting the output wrong?
I am using the poller wrapper, thing is that it polls every 5 minutes, but the poll itself takes much longer. After a few hours my server crashes, because it is starting more polls than it finishes.
Regards, Bastiaan
-----Oorspronkelijk bericht----- Van: Adam Armstrong [mailto:adama@memetic.org] Verzonden: woensdag 28 augustus 2013 17:10 Aan: Observium Network Observation System Onderwerp: Re: [Observium] slow polling of Cisco 4500 VSS switches
Your polling time seems to be pretty absurdly high. This definitely isn't normal behaviour.
This seems like a 4500 VSS bug, probably related to fetching interface counters from the other VSS member. Please make sure you're running the latest IOS.
I know of several 6500 VSS working without issues, but I've never encountered a 4500 VSS.
adam.
On 2013-08-28 09:58, Bastiaan Topper wrote:
Hi all,
We have a very long poll time on some of the devices. We have a few Cisco 4500 switches with a lott of interfaces. For example a Cisco 4506 with 4x 48 ports linecard, the poll time for the device is:
As you can see the poll time changed around 28/29 July. I disabled the device around 31 July, as you can see.
We also have Cisco 4500 VSS setups with several hundreds of interfaces, these devices I disabled for now, because the poller couldn't keep up with it.
Today I found some time to troubleshoot, and found the above statistics. Beside that a did a poll with debug on. It "hangs" very long on every interface of every device it polls. After looking into that, it seems that it collects all interfaces per metric it polls. I think that is the problem with devices with a big amount of interfaces, because it collects all statistics again for every interface.
How can I fix this? Is this by design/Is this a bug? Or is it something in my setup?
Regards,
Bastiaan Topper
Korton Group BV
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
![](https://secure.gravatar.com/avatar/0fa97865a0e1ab36152b6b2299eedb49.jpg?s=120&d=mm&r=g)
On 2013-08-28 17:28, Bastiaan Topper wrote:
Hi Adam,
The problem is that it is not only with VSS switches. The screenshot is off a switch without VSS (so a normal 4500 switch, but with many interfaces). I also have 6500 switches (and even a 6500 VSS), these don't have the problem, but have a lot less interfaces. The 4500 switches don't all run the same IOS version. The VSS switches have 2 version, the not VSS switch has an other supervisor and much older IOS.
Thing I noticed is that the port polling collects all metrics for all interfaces for every interface it polls. For big switches, this is a lott of info each time it polls an interface. If I do a debug it show very long lists of collected metrics (output of all the snmpbulkwalk commands per interface), each time for all interfaces. Could it be that the problem is there somewhere, or am I interpreting the output wrong?
It's not actually all that much information, and 250 ports isn't really a lot.
The issue is almost certainly a bug with counter collection on the 4500.
Cisco doesn't seem to bother giving many shits about bugs on "enterprise" kit, they only seem to fix issues on "service provider" kit.
This just further cements my view that the 4500 is a largely pointless device that people only buy when they've been tricked by Cisco's sales and marketing splurge.
adam.
![](https://secure.gravatar.com/avatar/a48a43e509ce8b6f7655db66a387c343.jpg?s=120&d=mm&r=g)
Hi Adam,
Thanks for your help. I found the problem. Nothing to do with the 4500 switches. Was the PoE polling option, this is not working efficiently, so for now I disabled it.
Polling times dropped to few seconds now.
Regards, Bastiaan
-----Oorspronkelijk bericht----- Van: Adam Armstrong [mailto:adama@memetic.org] Verzonden: woensdag 28 augustus 2013 19:39 Aan: Observium Network Observation System Onderwerp: Re: [Observium] slow polling of Cisco 4500 VSS switches
On 2013-08-28 17:28, Bastiaan Topper wrote:
Hi Adam,
The problem is that it is not only with VSS switches. The screenshot is off a switch without VSS (so a normal 4500 switch, but with many interfaces). I also have 6500 switches (and even a 6500 VSS), these don't have the problem, but have a lot less interfaces. The 4500 switches don't all run the same IOS version. The VSS switches have 2 version, the not VSS switch has an other supervisor and much older IOS.
Thing I noticed is that the port polling collects all metrics for all interfaces for every interface it polls. For big switches, this is a lott of info each time it polls an interface. If I do a debug it show very long lists of collected metrics (output of all the snmpbulkwalk commands per interface), each time for all interfaces. Could it be that the problem is there somewhere, or am I interpreting the output wrong?
It's not actually all that much information, and 250 ports isn't really a lot.
The issue is almost certainly a bug with counter collection on the 4500.
Cisco doesn't seem to bother giving many shits about bugs on "enterprise" kit, they only seem to fix issues on "service provider" kit.
This just further cements my view that the 4500 is a largely pointless device that people only buy when they've been tricked by Cisco's sales and marketing splurge.
adam.
participants (4)
-
Adam Armstrong
-
Bastiaan Topper
-
Darius Seroka
-
Michael Sweikata