Hi

This is how my alert looks for when a stack member fails or is removed.



And the conditions for it:

Tested and works fine for our cisco devices running IOS-XR and normal old IOS


Regards
Daniel




On 2020-04-01 18:37, Eric W. Bates via observium wrote:
I want to alert whenever one switch in a stack drops out. And I need help composing a checker.

Specifically, I have stacks of cisco 2960-X switches linked via flex stack modules such that they appear as a single switch for purposes of ping'ing or snmp. On occasion, one switch in the stack will fail, but the rest of the stack stays on-line and continues to respond to pings, etc.

I believe I want to watch the Status entity:
    Stack is redundant
    CISCO-STACKWISE-MIB::cswRingRedundant.0
    https://mibs.observium.org/mib/CISCO-STACKWISE-MIB/#cswRingRedundant

Each stacking module has two cables and one runs cables such that all the switches are in a big loop. cswRingRedundant is a boolean that indicates that such a loop is intact. So it will be False if a switch dies or simply if a cable is unplugged (also a desirable alert).

If the switch is a standalone with no stacking modules installed, you get: "No Such Object available on this agent at this OID"


I've created a Status entry:

<?xml version="1.0"?>
<templates>
   <template type="alert" description="Autogenerated observium template" version="0.91" created="Wed, 01 Apr 2020 11:51:32 -0400" observium="20.2.10302" id="fbe29724e648671f6e2095e9425d7515">
     <entity_type>status</entity_type>
     <name>2960_stack_loop_redundancy</name>
     <message>Stack loop redundancy has failed (might be a missing stack switch)</message>
     <severity>crit</severity>
     <suppress_recovery>0</suppress_recovery>
     <delay>0</delay>
     <conditions_and>1</conditions_and>
     <conditions>status_name == Stack is redundant</conditions>
     <conditions>status_event == warn</conditions>
     <conditions_complex>status_name == Stack is redundant AND status_event == warn</conditions_complex>
   </template>
</templates>

It doesn't appear to be working (I have one switch with a cable unplugged it should alert on).

Do I just have to wait 5 minutes? Or does it read the status from the database?

Is that the correct "status_name?" I copied it from the Health --> Status page "Description" column. Should it be quoted?

The export XML doesn't include the associations, but I included:
device.hardware regexp /.*WS-C2960X-.*/

I don't really know what the value of "device.hardware" might be, but I'm assuming it's the same as the field labeled "Vendor/Hardware" shown on the front page for a given switch?

Thanks for your time.


_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium