Hi All,

 

I have a device a Cisco 2960G with a failed internal PSU that has switched to it’s backup RPS supply. Observium is correctly identifying this as ‘critical’ under the Status Indicators for the device in question (image below):

 

 

 

There is a “Hardware Fault” event I have created which /should/ have matched this, I believe:

 

 

(I’ve added the match for value = critical as a test, it wasn’t there originally)

 

This alert checker correctly matches a who load of devices and status checkers on many devices (1691 in total, all green):

 

 

However, for this particular type of sensor, it does not see any sensors which match the “Hardware Fault” checker. As you can see, I have already removed all the “Device Match” and “Entity Match” criteria from the checker itself to rule that out. So essentially this checker is matching every type of sensor that there is (except, obviously, the one which I want it to).

 

When I click the link on the graph for “Sw1, PS1 Critical, RPS Normal” sensor I get this URL http://observium-server/device/device=343/tab=health/metric=status/ which may be of use in helping diagnose this.

 

After doing some more checking, I’ve found that it is not matching any other kind of sensor with a similar class, for example it does not match the WS-CAC-4000W-IN power supplies in any of the Cat6k chassis either.

 

In fact, I think it’s actually failing to match sensors which have ‘normal’ and ‘critical’ type statuses, instead of ‘true’ and ‘false’

 

So the following sensors highlighted in yellow are also being missed from matching in this generic alerting config:

 

 

So I believe it’s a wider issue and not just related to this one device.

 

Can someone confirm what I’m doing wrong and/or maybe check to see if your alerting configurations can match sensors which have an OK state of ‘normal’ instead of ‘true’?

 

Cheers!

Robert Williams
Custodian Data Centre
Email: Robert@CustodianDC.com
http://www.CustodianDC.com