Hi all,
Just noticed a strange problem with reporting alarms on the ASR9001 (XR5.3.4).
A lab router was reloaded and when it came back up there is a “Major” chassis alarm as one of the two PSUs has failed. However, neither the PSU failure nor the ‘MAJ’ alarm status are detected by Observium.
Even after a full discovery and polling, it is all green and happy on Observium it seems. In fact, it shows only 1 PSU now, even though two are installed and one is in a failed state:
(All other 9001 have 2 x “ASR-9001 AC Power Supply” entries here)
Console output confirms the fault and the presence of the major alarm, as well as a few different things (like N+1 resilience lost and ‘Capacity’) which could be used to detect this issue (if they are in Cisco-SNMP-land that is).
#show env power
R/S/I Modules Capacity Status
(W)
0/PS0/M0/*
host PM 750 Ok
0/PS0/M1/*
host PM 0 Failed <<<<<<<<
#show env power
<snip>
N+1 Supply Protected Capacity Available: Not Protected <<<<<<<<
#show environment leds
R/S/I Modules LED Status
0/RSP0/*
host Critical-Alarm Off
host Major-Alarm On <<<<<<<<
host Minor-Alarm Off
RP/0/RSP0/CPU0:RTR-123(admin)#show env power states
Wed Nov 15 16:59:15.135 GMT
R/S/I State MaxPower Time Count
(1-ON/2-OFF) (W) (YY:WK:DD:HH:MIN:SS)
----------------------------------------------------------------------
0/PS0/M0/*
1 750 00:00:00:00:59:27 1
2 0 00:00:00:00:00:00 0
----------------------------------------------------------------------
0/PS0/M1/*
1 0 00:00:00:00:00:00 0
2 0 00:00:00:00:59:27 1 <<<<<< ‘2 = Off’
so……Can anything be done to tune-up the detection of such issues? In this case it was pure chance I noticed the fault after the reload as I had to be physically in front of it to patch something in. Otherwise I’d not have known.
Any ideas? Cheers!