There was one more outlier case we came across today where you have nodes with OMSA installed but its not working correctly. The hdarray picks up omreport but incorrectly reports the controller status as failed. This has been fixed now.
Full code below if any one cares:
#!/usr/bin/env bash
# hdarray - Observium agent script for Dell PERC RAID controllers
# Supports: Dell OMSA (omreport), LSI MegaCli64, Adaptec (arcconf), perccli/storcli
#
# Updated: 17 April 2026
# - Added perccli/storcli support for newer Dell PowerEdge servers
# - Fixed stdin closed issue with perccli (</dev/null on all CLI calls)
# - Added omreport sanity check to fall through to perccli if omreport
# is installed but non-functional (e.g. OMSA installed but not configured)
echo '<<<hdarray>>>'
############### Dell OMSA (omreport) #############
# Note: test that omreport actually works before using it - on some nodes
# OMSA is installed but broken, and we should fall through to perccli instead
if [ -x /opt/dell/srvadmin/bin/omreport ] && \
/opt/dell/srvadmin/bin/omreport storage controller >/dev/null 2>&1; then
CONTROLLER=$(/opt/dell/srvadmin/bin/omreport storage controller)
CINFO=$(echo "$CONTROLLER" | grep Status)
CINFOSTAT=$(echo "$CINFO" | cut -d':' -f2)
CINFOSTAT=${CINFOSTAT:1}
echo "Controller Status=$CINFOSTAT"
IFS='
'
set -f
DRIVES=$(/opt/dell/srvadmin/bin/omreport storage pdisk controller=0 -fmt ssv | sed -n '/[0-1].*/p')
for line in $DRIVES; do
DRIVEINFO=$(echo "$line" | cut -d';' -f1)
DRIVESTATUS=$(echo "$line" | cut -d';' -f2)
echo "Drive $DRIVEINFO=$DRIVESTATUS"
done
set +f
unset IFS
############### Dell LSI MegaCli64 ##############
elif [ -x /opt/MegaRAID/MegaCli/MegaCli64 ]; then
RAID=/tmp/lsi.$$
CTRLSTAT=Ok
echo "Controller Status=$CTRLSTAT"
/opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aALL -NoLog > $RAID
let iLOGICALDEV=0
grep "Virtual Drive:" $RAID | while read DEVICE; do
STATUS="$(grep -A 5 "$DEVICE" $RAID | tail -1 | cut -d\: -f2)"
STATUS=${STATUS:1}
[ "$STATUS" = "Optimal" ] && STATUS=Ok
echo "Logical Drive $iLOGICALDEV=$STATUS"
let iLOGICALDEV++
done
/opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL -NoLog > $RAID
let iDEVICE=0
grep "Device Id" $RAID | while read DEVICE; do
STATUS="$(grep -A 12 "$DEVICE" $RAID | tail -1 | cut -d\: -f2)"
STATUS=${STATUS:1}
[ "$STATUS" = "Online, Spun Up" ] && STATUS=Ok
echo "Drive $iDEVICE=$STATUS"
let iDEVICE++
done
rm -f $RAID
############### Dell perccli / storcli ##############
elif [ -x /opt/MegaRAID/perccli/perccli64 ] || [ -x /opt/MegaRAID/storcli/storcli64 ]; then
# Find the right binary
if [ -x /opt/MegaRAID/perccli/perccli64 ]; then
CLI=/opt/MegaRAID/perccli/perccli64
else
CLI=/opt/MegaRAID/storcli/storcli64
fi
RAID=/tmp/perccli.$$
# Get number of controllers
NUMCTRL=$($CLI show nolog </dev/null 2>/dev/null | grep "Number of Controllers" | awk '{print $NF}')
[ -z "$NUMCTRL" ] && NUMCTRL=1
for ctrl in $(seq 0 $((NUMCTRL - 1))); do
# Controller status
$CLI /c${ctrl} show nolog > $RAID 2>&1 </dev/null
CTRLSTAT=$(grep "^Status" $RAID | awk -F'= ' '{print $2}' | tr -d ' \r\n')
[ "$CTRLSTAT" = "Success" ] && CTRLSTAT=Ok
echo "Controller Status=$CTRLSTAT"
# Virtual drive status
$CLI /c${ctrl}/vall show nolog > $RAID 2>&1 </dev/null
iLOGICALDEV=0
while IFS= read -r line; do
# Lines with VD info have format: DG/VD TYPE State ...
if echo "$line" | grep -qE "^[0-9]+/[0-9]+"; then
STATE=$(echo "$line" | awk '{print $3}')
[ "$STATE" = "Optl" ] && STATE=Ok
[ "$STATE" = "Dgrd" ] && STATE=Degraded
[ "$STATE" = "Pdgd" ] && STATE=Degraded
[ "$STATE" = "OfLn" ] && STATE=Offline
echo "Logical Drive $iLOGICALDEV=$STATE"
iLOGICALDEV=$((iLOGICALDEV + 1))
fi
done < $RAID
# Physical drive status
$CLI /c${ctrl}/eall/sall show nolog > $RAID 2>&1 </dev/null
iDEVICE=0
while IFS= read -r line; do
# Lines with drive info have format: EID:Slt DID State ...
if echo "$line" | grep -qE "^[0-9]+:[0-9]+"; then
EID_SLT=$(echo "$line" | awk '{print $1}')
STATE=$(echo "$line" | awk '{print $3}')
[ "$STATE" = "Onln" ] && STATE=Ok
[ "$STATE" = "GHS" ] && STATE=Ok
[ "$STATE" = "DHS" ] && STATE=Ok
[ "$STATE" = "UGood" ] && STATE=Ok
[ "$STATE" = "Offln" ] && STATE=Offline
[ "$STATE" = "UBad" ] && STATE=Failed
[ "$STATE" = "Rbld" ] && STATE=Rebuilding
echo "Drive $EID_SLT=$STATE"
iDEVICE=$((iDEVICE + 1))
fi
done < $RAID
done
rm -f $RAID
#################### Adaptec controllers ####################
elif [ -x /usr/StorMan/arcconf ]; then
RAID=/tmp/adaptec.$$
/usr/StorMan/arcconf getconfig 1 al > $RAID
CTRLSTAT=$(grep "Controller Status" $RAID | cut -d\: -f2 | tr -d ' ')
[ "$CTRLSTAT" = "Optimal" ] && CTRLSTAT=Ok
echo "Controller Status=$CTRLSTAT"
let iLOGICALDEV=0
grep "Status of logical device" $RAID | cut -d\: -f2 | while read STATUS; do
STATUS=$(echo $STATUS)
[ "$STATUS" = "Optimal" ] && STATUS=Ok
echo "Logical Drive $iLOGICALDEV=$STATUS"
let iLOGICALDEV++
done
let iDEVICE=0
grep "Device #" $RAID | while read DEVICE; do
STATUS=$(grep -A 2 "$DEVICE" $RAID | tail -1 | cut -d\: -f2)
STATUS=$(echo $STATUS)
[ "$STATUS" = "Online" ] && STATUS=Ok
echo "Drive $iDEVICE=$STATUS"
let iDEVICE++
done
rm -f $RAID
fi
From:
Chris James
Sent: 16 April 2026 09:39
To: observium@lists.observium.org
Subject: Fix for hdarray
We have continued to have an issue where PERC controller status is flagged as failed on pretty much all our Poweredge Rx20, Rx30, Rx40, Rx50, Rx60 servers. We finally got to the bottom of it today.
Please can the following be corrected.
Root Cause: perccli64 requires an open stdin to function correctly. When it runs with stdin closed (<&-), it returns
Status = Failure instead of querying the controller. The Observium agent script closes stdin as a security measure (exec <&- 2>/dev/null)
before running all local scripts, so every time the agent ran our hdarray script via the socket, perccli silently failed.
The Fix: Add
</dev/null to all four
perccli calls in the hdarray script so the binary gets a valid (empty) stdin rather than a closed file descriptor:
$CLI
show nolog </dev/null 2>/dev/null
$CLI
/c${ctrl} show nolog
> $RAID
2>&1
</dev/null
$CLI
/c${ctrl}/vall show
nolog > $RAID
2>&1
</dev/null
$CLI
/c${ctrl}/eall/sall
show nolog > $RAID
2>&1
</dev/null
What made this so hard to diagnose: The script
worked perfectly in every direct test (setsid,
bash -x,
env -i, running as root via ssh) because all those methods leave stdin open. It
only failed via the agent socket where stdin was closed — exactly replicating production conditions was key.
Many thanks,
Chris