Hi

If you click on the different statuses you can find you can see what class it is in just above the graphs. Don't have the specific info for what you look for but in the pic below you can see where to look. (the yellow marking)



Regards
Daniel



From: "observium" <observium@observium.org>
To: "observium" <observium@observium.org>
Cc: "Daniel Johansson" <daz@voodoo-people.com>
Sent: Thursday, April 2, 2020 4:00:13 PM
Subject: Re: [Observium] Alerting switch stack status

Hi

This is how my alert looks for when a stack member fails or is removed.



And the conditions for it:

Tested and works fine for our cisco devices running IOS-XR and normal old IOS


Regards
Daniel




On 2020-04-01 18:37, Eric W. Bates via observium wrote:
I want to alert whenever one switch in a stack drops out. And I need help composing a checker.

Specifically, I have stacks of cisco 2960-X switches linked via flex stack modules such that they appear as a single switch for purposes of ping'ing or snmp. On occasion, one switch in the stack will fail, but the rest of the stack stays on-line and continues to respond to pings, etc.

I believe I want to watch the Status entity:
    Stack is redundant
    CISCO-STACKWISE-MIB::cswRingRedundant.0
    https://mibs.observium.org/mib/CISCO-STACKWISE-MIB/#cswRingRedundant

Each stacking module has two cables and one runs cables such that all the switches are in a big loop. cswRingRedundant is a boolean that indicates that such a loop is intact. So it will be False if a switch dies or simply if a cable is unplugged (also a desirable alert).

If the switch is a standalone with no stacking modules installed, you get: "No Such Object available on this agent at this OID"


I've created a Status entry:

<?xml version="1.0"?>
<templates>
   <template type="alert" description="Autogenerated observium template" version="0.91" created="Wed, 01 Apr 2020 11:51:32 -0400" observium="20.2.10302" id="fbe29724e648671f6e2095e9425d7515">
     <entity_type>status</entity_type>
     <name>2960_stack_loop_redundancy</name>
     <message>Stack loop redundancy has failed (might be a missing stack switch)</message>
     <severity>crit</severity>
     <suppress_recovery>0</suppress_recovery>
     <delay>0</delay>
     <conditions_and>1</conditions_and>
     <conditions>status_name == Stack is redundant</conditions>
     <conditions>status_event == warn</conditions>
     <conditions_complex>status_name == Stack is redundant AND status_event == warn</conditions_complex>
   </template>
</templates>

It doesn't appear to be working (I have one switch with a cable unplugged it should alert on).

Do I just have to wait 5 minutes? Or does it read the status from the database?

Is that the correct "status_name?" I copied it from the Health --> Status page "Description" column. Should it be quoted?

The export XML doesn't include the associations, but I included:
device.hardware regexp /.*WS-C2960X-.*/

I don't really know what the value of "device.hardware" might be, but I'm assuming it's the same as the field labeled "Vendor/Hardware" shown on the front page for a given switch?

Thanks for your time.


_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium