There was one more outlier case we came across today where you have nodes with OMSA installed but its not working correctly. The hdarray picks up omreport but incorrectly reports the controller status as failed. This has been fixed now. Full code below if any one cares:
#!/usr/bin/env bash # hdarray - Observium agent script for Dell PERC RAID controllers # Supports: Dell OMSA (omreport), LSI MegaCli64, Adaptec (arcconf), perccli/storcli # # Updated: 17 April 2026 # - Added perccli/storcli support for newer Dell PowerEdge servers # - Fixed stdin closed issue with perccli (</dev/null on all CLI calls) # - Added omreport sanity check to fall through to perccli if omreport # is installed but non-functional (e.g. OMSA installed but not configured)
echo '<<<hdarray>>>'
############### Dell OMSA (omreport) ############# # Note: test that omreport actually works before using it - on some nodes # OMSA is installed but broken, and we should fall through to perccli instead if [ -x /opt/dell/srvadmin/bin/omreport ] && \ /opt/dell/srvadmin/bin/omreport storage controller >/dev/null 2>&1; then CONTROLLER=$(/opt/dell/srvadmin/bin/omreport storage controller) CINFO=$(echo "$CONTROLLER" | grep Status) CINFOSTAT=$(echo "$CINFO" | cut -d':' -f2) CINFOSTAT=${CINFOSTAT:1} echo "Controller Status=$CINFOSTAT" IFS=' ' set -f DRIVES=$(/opt/dell/srvadmin/bin/omreport storage pdisk controller=0 -fmt ssv | sed -n '/[0-1].*/p') for line in $DRIVES; do DRIVEINFO=$(echo "$line" | cut -d';' -f1) DRIVESTATUS=$(echo "$line" | cut -d';' -f2) echo "Drive $DRIVEINFO=$DRIVESTATUS" done set +f unset IFS
############### Dell LSI MegaCli64 ############## elif [ -x /opt/MegaRAID/MegaCli/MegaCli64 ]; then RAID=/tmp/lsi.$$ CTRLSTAT=Ok echo "Controller Status=$CTRLSTAT" /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aALL -NoLog > $RAID let iLOGICALDEV=0 grep "Virtual Drive:" $RAID | while read DEVICE; do STATUS="$(grep -A 5 "$DEVICE" $RAID | tail -1 | cut -d: -f2)" STATUS=${STATUS:1} [ "$STATUS" = "Optimal" ] && STATUS=Ok echo "Logical Drive $iLOGICALDEV=$STATUS" let iLOGICALDEV++ done /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL -NoLog > $RAID let iDEVICE=0 grep "Device Id" $RAID | while read DEVICE; do STATUS="$(grep -A 12 "$DEVICE" $RAID | tail -1 | cut -d: -f2)" STATUS=${STATUS:1} [ "$STATUS" = "Online, Spun Up" ] && STATUS=Ok echo "Drive $iDEVICE=$STATUS" let iDEVICE++ done rm -f $RAID
############### Dell perccli / storcli ############## elif [ -x /opt/MegaRAID/perccli/perccli64 ] || [ -x /opt/MegaRAID/storcli/storcli64 ]; then # Find the right binary if [ -x /opt/MegaRAID/perccli/perccli64 ]; then CLI=/opt/MegaRAID/perccli/perccli64 else CLI=/opt/MegaRAID/storcli/storcli64 fi
RAID=/tmp/perccli.$$
# Get number of controllers NUMCTRL=$($CLI show nolog </dev/null 2>/dev/null | grep "Number of Controllers" | awk '{print $NF}') [ -z "$NUMCTRL" ] && NUMCTRL=1
for ctrl in $(seq 0 $((NUMCTRL - 1))); do # Controller status $CLI /c${ctrl} show nolog > $RAID 2>&1 </dev/null CTRLSTAT=$(grep "^Status" $RAID | awk -F'= ' '{print $2}' | tr -d ' \r\n') [ "$CTRLSTAT" = "Success" ] && CTRLSTAT=Ok echo "Controller Status=$CTRLSTAT"
# Virtual drive status $CLI /c${ctrl}/vall show nolog > $RAID 2>&1 </dev/null iLOGICALDEV=0 while IFS= read -r line; do # Lines with VD info have format: DG/VD TYPE State ... if echo "$line" | grep -qE "^[0-9]+/[0-9]+"; then STATE=$(echo "$line" | awk '{print $3}') [ "$STATE" = "Optl" ] && STATE=Ok [ "$STATE" = "Dgrd" ] && STATE=Degraded [ "$STATE" = "Pdgd" ] && STATE=Degraded [ "$STATE" = "OfLn" ] && STATE=Offline echo "Logical Drive $iLOGICALDEV=$STATE" iLOGICALDEV=$((iLOGICALDEV + 1)) fi done < $RAID
# Physical drive status $CLI /c${ctrl}/eall/sall show nolog > $RAID 2>&1 </dev/null iDEVICE=0 while IFS= read -r line; do # Lines with drive info have format: EID:Slt DID State ... if echo "$line" | grep -qE "^[0-9]+:[0-9]+"; then EID_SLT=$(echo "$line" | awk '{print $1}') STATE=$(echo "$line" | awk '{print $3}') [ "$STATE" = "Onln" ] && STATE=Ok [ "$STATE" = "GHS" ] && STATE=Ok [ "$STATE" = "DHS" ] && STATE=Ok [ "$STATE" = "UGood" ] && STATE=Ok [ "$STATE" = "Offln" ] && STATE=Offline [ "$STATE" = "UBad" ] && STATE=Failed [ "$STATE" = "Rbld" ] && STATE=Rebuilding echo "Drive $EID_SLT=$STATE" iDEVICE=$((iDEVICE + 1)) fi done < $RAID done rm -f $RAID
#################### Adaptec controllers #################### elif [ -x /usr/StorMan/arcconf ]; then RAID=/tmp/adaptec.$$ /usr/StorMan/arcconf getconfig 1 al > $RAID CTRLSTAT=$(grep "Controller Status" $RAID | cut -d: -f2 | tr -d ' ') [ "$CTRLSTAT" = "Optimal" ] && CTRLSTAT=Ok echo "Controller Status=$CTRLSTAT" let iLOGICALDEV=0 grep "Status of logical device" $RAID | cut -d: -f2 | while read STATUS; do STATUS=$(echo $STATUS) [ "$STATUS" = "Optimal" ] && STATUS=Ok echo "Logical Drive $iLOGICALDEV=$STATUS" let iLOGICALDEV++ done let iDEVICE=0 grep "Device #" $RAID | while read DEVICE; do STATUS=$(grep -A 2 "$DEVICE" $RAID | tail -1 | cut -d: -f2) STATUS=$(echo $STATUS) [ "$STATUS" = "Online" ] && STATUS=Ok echo "Drive $iDEVICE=$STATUS" let iDEVICE++ done rm -f $RAID fi
From: Chris James Sent: 16 April 2026 09:39 To: observium@lists.observium.org Subject: Fix for hdarray
We have continued to have an issue where PERC controller status is flagged as failed on pretty much all our Poweredge Rx20, Rx30, Rx40, Rx50, Rx60 servers. We finally got to the bottom of it today.
Please can the following be corrected.
Root Cause: perccli64 requires an open stdin to function correctly. When it runs with stdin closed (<&-), it returns Status = Failure instead of querying the controller. The Observium agent script closes stdin as a security measure (exec <&- 2>/dev/null) before running all local scripts, so every time the agent ran our hdarray script via the socket, perccli silently failed. The Fix: Add </dev/null to all four perccli calls in the hdarray script so the binary gets a valid (empty) stdin rather than a closed file descriptor: $CLI show nolog </dev/null 2>/dev/null $CLI /c${ctrl} show nolog > $RAID 2>&1 </dev/null $CLI /c${ctrl}/vall show nolog > $RAID 2>&1 </dev/null $CLI /c${ctrl}/eall/sall show nolog > $RAID 2>&1 </dev/null What made this so hard to diagnose: The script worked perfectly in every direct test (setsid, bash -x, env -i, running as root via ssh) because all those methods leave stdin open. It only failed via the agent socket where stdin was closed - exactly replicating production conditions was key. Many thanks,
Chris