Alerting switch stack status
I want to alert whenever one switch in a stack drops out. And I need help composing a checker.
Specifically, I have stacks of cisco 2960-X switches linked via flex stack modules such that they appear as a single switch for purposes of ping'ing or snmp. On occasion, one switch in the stack will fail, but the rest of the stack stays on-line and continues to respond to pings, etc.
I believe I want to watch the Status entity: Stack is redundant CISCO-STACKWISE-MIB::cswRingRedundant.0 https://mibs.observium.org/mib/CISCO-STACKWISE-MIB/#cswRingRedundant
Each stacking module has two cables and one runs cables such that all the switches are in a big loop. cswRingRedundant is a boolean that indicates that such a loop is intact. So it will be False if a switch dies or simply if a cable is unplugged (also a desirable alert).
If the switch is a standalone with no stacking modules installed, you get: "No Such Object available on this agent at this OID"
I've created a Status entry:
<?xml version="1.0"?> <templates> <template type="alert" description="Autogenerated observium template" version="0.91" created="Wed, 01 Apr 2020 11:51:32 -0400" observium="20.2.10302" id="fbe29724e648671f6e2095e9425d7515"> <entity_type>status</entity_type> <name>2960_stack_loop_redundancy</name> <message>Stack loop redundancy has failed (might be a missing stack switch)</message> <severity>crit</severity> <suppress_recovery>0</suppress_recovery> <delay>0</delay> <conditions_and>1</conditions_and> <conditions>status_name == Stack is redundant</conditions> <conditions>status_event == warn</conditions> <conditions_complex>status_name == Stack is redundant AND status_event == warn</conditions_complex> </template> </templates>
It doesn't appear to be working (I have one switch with a cable unplugged it should alert on).
Do I just have to wait 5 minutes? Or does it read the status from the database?
Is that the correct "status_name?" I copied it from the Health --> Status page "Description" column. Should it be quoted?
The export XML doesn't include the associations, but I included: device.hardware regexp /.*WS-C2960X-.*/
I don't really know what the value of "device.hardware" might be, but I'm assuming it's the same as the field labeled "Vendor/Hardware" shown on the front page for a given switch?
Thanks for your time.
Hi
This is how my alert looks for when a stack member fails or is removed.
And the conditions for it:
Tested and works fine for our cisco devices running IOS-XR and normal old IOS
Regards Daniel
On 2020-04-01 18:37, Eric W. Bates via observium wrote:
I want to alert whenever one switch in a stack drops out. And I need help composing a checker.
Specifically, I have stacks of cisco 2960-X switches linked via flex stack modules such that they appear as a single switch for purposes of ping'ing or snmp. On occasion, one switch in the stack will fail, but the rest of the stack stays on-line and continues to respond to pings, etc.
I believe I want to watch the Status entity: Stack is redundant CISCO-STACKWISE-MIB::cswRingRedundant.0 https://mibs.observium.org/mib/CISCO-STACKWISE-MIB/#cswRingRedundant
Each stacking module has two cables and one runs cables such that all the switches are in a big loop. cswRingRedundant is a boolean that indicates that such a loop is intact. So it will be False if a switch dies or simply if a cable is unplugged (also a desirable alert).
If the switch is a standalone with no stacking modules installed, you get: "No Such Object available on this agent at this OID"
I've created a Status entry:
<?xml version="1.0"?>
<templates> <template type="alert" description="Autogenerated observium template" version="0.91" created="Wed, 01 Apr 2020 11:51:32 -0400" observium="20.2.10302" id="fbe29724e648671f6e2095e9425d7515"> <entity_type>status</entity_type> <name>2960_stack_loop_redundancy</name> <message>Stack loop redundancy has failed (might be a missing stack switch)</message> <severity>crit</severity> <suppress_recovery>0</suppress_recovery> <delay>0</delay> <conditions_and>1</conditions_and> <conditions>status_name == Stack is redundant</conditions> <conditions>status_event == warn</conditions> <conditions_complex>status_name == Stack is redundant AND status_event == warn</conditions_complex> </template> </templates>
It doesn't appear to be working (I have one switch with a cable unplugged it should alert on).
Do I just have to wait 5 minutes? Or does it read the status from the database?
Is that the correct "status_name?" I copied it from the Health --> Status page "Description" column. Should it be quoted?
The export XML doesn't include the associations, but I included: device.hardware regexp /.*WS-C2960X-.*/
I don't really know what the value of "device.hardware" might be, but I'm assuming it's the same as the field labeled "Vendor/Hardware" shown on the front page for a given switch?
Thanks for your time.
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
Thank you.
Where did you find entPhysicalClass? It does appear as an option when creating a checker; but doesn't appear on any of my Status pages.
On 4/2/20 10:00 AM, Daniel Johansson via observium wrote:
Hi
This is how my alert looks for when a stack member fails or is removed.
And the conditions for it:
Tested and works fine for our cisco devices running IOS-XR and normal old IOS
Regards Daniel
On 2020-04-01 18:37, Eric W. Bates via observium wrote:
I want to alert whenever one switch in a stack drops out. And I need help composing a checker.
Specifically, I have stacks of cisco 2960-X switches linked via flex stack modules such that they appear as a single switch for purposes of ping'ing or snmp. On occasion, one switch in the stack will fail, but the rest of the stack stays on-line and continues to respond to pings, etc.
I believe I want to watch the Status entity: Stack is redundant CISCO-STACKWISE-MIB::cswRingRedundant.0 https://mibs.observium.org/mib/CISCO-STACKWISE-MIB/#cswRingRedundant
Each stacking module has two cables and one runs cables such that all the switches are in a big loop. cswRingRedundant is a boolean that indicates that such a loop is intact. So it will be False if a switch dies or simply if a cable is unplugged (also a desirable alert).
If the switch is a standalone with no stacking modules installed, you get: "No Such Object available on this agent at this OID"
I've created a Status entry:
<?xml version="1.0"?>
<templates> <template type="alert" description="Autogenerated observium template" version="0.91" created="Wed, 01 Apr 2020 11:51:32 -0400" observium="20.2.10302" id="fbe29724e648671f6e2095e9425d7515"> <entity_type>status</entity_type> <name>2960_stack_loop_redundancy</name> <message>Stack loop redundancy has failed (might be a missing stack switch)</message> <severity>crit</severity> <suppress_recovery>0</suppress_recovery> <delay>0</delay> <conditions_and>1</conditions_and> <conditions>status_name == Stack is redundant</conditions> <conditions>status_event == warn</conditions> <conditions_complex>status_name == Stack is redundant AND status_event == warn</conditions_complex> </template> </templates>
It doesn't appear to be working (I have one switch with a cable unplugged it should alert on).
Do I just have to wait 5 minutes? Or does it read the status from the database?
Is that the correct "status_name?" I copied it from the Health --> Status page "Description" column. Should it be quoted?
The export XML doesn't include the associations, but I included: device.hardware regexp /.*WS-C2960X-.*/
I don't really know what the value of "device.hardware" might be, but I'm assuming it's the same as the field labeled "Vendor/Hardware" shown on the front page for a given switch?
Thanks for your time.
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
Hi
If you click on the different statuses you can find you can see what class it is in just above the graphs. Don't have the specific info for what you look for but in the pic below you can see where to look. (the yellow marking)
Regards Daniel
From: "observium" observium@observium.org To: "observium" observium@observium.org Cc: "Daniel Johansson" daz@voodoo-people.com Sent: Thursday, April 2, 2020 4:00:13 PM Subject: Re: [Observium] Alerting switch stack status
Hi
This is how my alert looks for when a stack member fails or is removed.
And the conditions for it:
Tested and works fine for our cisco devices running IOS-XR and normal old IOS
Regards Daniel
On 2020-04-01 18:37, Eric W. Bates via observium wrote:
I want to alert whenever one switch in a stack drops out. And I need help composing a checker.
Specifically, I have stacks of cisco 2960-X switches linked via flex stack modules such that they appear as a single switch for purposes of ping'ing or snmp. On occasion, one switch in the stack will fail, but the rest of the stack stays on-line and continues to respond to pings, etc.
I believe I want to watch the Status entity: Stack is redundant CISCO-STACKWISE-MIB::cswRingRedundant.0 [ https://mibs.observium.org/mib/CISCO-STACKWISE-MIB/#cswRingRedundant | https://mibs.observium.org/mib/CISCO-STACKWISE-MIB/#cswRingRedundant ]
Each stacking module has two cables and one runs cables such that all the switches are in a big loop. cswRingRedundant is a boolean that indicates that such a loop is intact. So it will be False if a switch dies or simply if a cable is unplugged (also a desirable alert).
If the switch is a standalone with no stacking modules installed, you get: "No Such Object available on this agent at this OID"
I've created a Status entry:
<?xml version="1.0"?> <templates> <template type="alert" description="Autogenerated observium template" version="0.91" created="Wed, 01 Apr 2020 11:51:32 -0400" observium="20.2.10302" id="fbe29724e648671f6e2095e9425d7515"> <entity_type>status</entity_type> <name>2960_stack_loop_redundancy</name> <message>Stack loop redundancy has failed (might be a missing stack switch)</message> <severity>crit</severity> <suppress_recovery>0</suppress_recovery> <delay>0</delay> <conditions_and>1</conditions_and> <conditions>status_name == Stack is redundant</conditions> <conditions>status_event == warn</conditions> <conditions_complex>status_name == Stack is redundant AND status_event == warn</conditions_complex> </template> </templates>
It doesn't appear to be working (I have one switch with a cable unplugged it should alert on).
Do I just have to wait 5 minutes? Or does it read the status from the database?
Is that the correct "status_name?" I copied it from the Health --> Status page "Description" column. Should it be quoted?
The export XML doesn't include the associations, but I included: device.hardware regexp /.*WS-C2960X-.*/
I don't really know what the value of "device.hardware" might be, but I'm assuming it's the same as the field labeled "Vendor/Hardware" shown on the front page for a given switch?
Thanks for your time.
_______________________________________________ observium mailing list [ mailto:observium@observium.org | observium@observium.org ] [ http://postman.memetic.org/cgi-bin/mailman/listinfo/observium | http://postman.memetic.org/cgi-bin/mailman/listinfo/observium ]
_______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
participants (2)
-
Daniel Johansson
-
Eric W. Bates