
Hi all,
Just noticed a strange problem with reporting alarms on the ASR9001 (XR5.3.4).
A lab router was reloaded and when it came back up there is a "Major" chassis alarm as one of the two PSUs has failed. However, neither the PSU failure nor the 'MAJ' alarm status are detected by Observium.
Even after a full discovery and polling, it is all green and happy on Observium it seems. In fact, it shows only 1 PSU now, even though two are installed and one is in a failed state:
(All other 9001 have 2 x "ASR-9001 AC Power Supply" entries here)
Console output confirms the fault and the presence of the major alarm, as well as a few different things (like N+1 resilience lost and 'Capacity') which could be used to detect this issue (if they are in Cisco-SNMP-land that is).
#show env power R/S/I Modules Capacity Status (W) 0/PS0/M0/* host PM 750 Ok 0/PS0/M1/* host PM 0 Failed <<<<<<<<
#show env power <snip> N+1 Supply Protected Capacity Available: Not Protected <<<<<<<<
#show environment leds R/S/I Modules LED Status 0/RSP0/* host Critical-Alarm Off host Major-Alarm On <<<<<<<< host Minor-Alarm Off
RP/0/RSP0/CPU0:RTR-123(admin)#show env power states Wed Nov 15 16:59:15.135 GMT R/S/I State MaxPower Time Count (1-ON/2-OFF) (W) (YY:WK:DD:HH:MIN:SS) ---------------------------------------------------------------------- 0/PS0/M0/* 1 750 00:00:00:00:59:27 1 2 0 00:00:00:00:00:00 0 ---------------------------------------------------------------------- 0/PS0/M1/* 1 0 00:00:00:00:00:00 0 2 0 00:00:00:00:59:27 1 <<<<<< '2 = Off'
so......Can anything be done to tune-up the detection of such issues? In this case it was pure chance I noticed the fault after the reload as I had to be physically in front of it to patch something in. Otherwise I'd not have known.
Any ideas? Cheers!
Robert Williams Custodian Data Centres https://www.CustodianDC.com