There was one more outlier case we came across today where you have nodes with OMSA installed but its not working correctly. The hdarray picks up omreport but incorrectly reports the controller status as failed. This has been fixed now. Full code below if any one cares:

#!/usr/bin/env bash

# hdarray - Observium agent script for Dell PERC RAID controllers

# Supports: Dell OMSA (omreport), LSI MegaCli64, Adaptec (arcconf), perccli/storcli

#

# Updated: 17 April 2026

#   - Added perccli/storcli support for newer Dell PowerEdge servers

#   - Fixed stdin closed issue with perccli (</dev/null on all CLI calls)

#   - Added omreport sanity check to fall through to perccli if omreport

#     is installed but non-functional (e.g. OMSA installed but not configured)

 

echo '<<<hdarray>>>'

 

############### Dell OMSA (omreport) #############

# Note: test that omreport actually works before using it - on some nodes

# OMSA is installed but broken, and we should fall through to perccli instead

if [ -x /opt/dell/srvadmin/bin/omreport ] && \

   /opt/dell/srvadmin/bin/omreport storage controller >/dev/null 2>&1; then

    CONTROLLER=$(/opt/dell/srvadmin/bin/omreport storage controller)

    CINFO=$(echo "$CONTROLLER" | grep Status)

    CINFOSTAT=$(echo "$CINFO" | cut -d':' -f2)

    CINFOSTAT=${CINFOSTAT:1}

    echo "Controller Status=$CINFOSTAT"

    IFS='

'

    set -f

    DRIVES=$(/opt/dell/srvadmin/bin/omreport storage pdisk controller=0 -fmt ssv | sed -n '/[0-1].*/p')

    for line in $DRIVES; do

        DRIVEINFO=$(echo "$line" | cut -d';' -f1)

        DRIVESTATUS=$(echo "$line" | cut -d';' -f2)

        echo "Drive $DRIVEINFO=$DRIVESTATUS"

    done

    set +f

    unset IFS

 

############### Dell LSI MegaCli64 ##############

elif [ -x /opt/MegaRAID/MegaCli/MegaCli64 ]; then

    RAID=/tmp/lsi.$$

    CTRLSTAT=Ok

    echo "Controller Status=$CTRLSTAT"

    /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aALL -NoLog > $RAID

    let iLOGICALDEV=0

    grep "Virtual Drive:" $RAID | while read DEVICE; do

        STATUS="$(grep -A 5 "$DEVICE" $RAID | tail -1 | cut -d\: -f2)"

        STATUS=${STATUS:1}

        [ "$STATUS" = "Optimal" ] && STATUS=Ok

        echo "Logical Drive $iLOGICALDEV=$STATUS"

        let iLOGICALDEV++

    done

    /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL -NoLog > $RAID

    let iDEVICE=0

    grep "Device Id" $RAID | while read DEVICE; do

        STATUS="$(grep -A 12 "$DEVICE" $RAID | tail -1 | cut -d\: -f2)"

        STATUS=${STATUS:1}

        [ "$STATUS" = "Online, Spun Up" ] && STATUS=Ok

        echo "Drive $iDEVICE=$STATUS"

        let iDEVICE++

    done

    rm -f $RAID

 

############### Dell perccli / storcli ##############

elif [ -x /opt/MegaRAID/perccli/perccli64 ] || [ -x /opt/MegaRAID/storcli/storcli64 ]; then

    # Find the right binary

    if [ -x /opt/MegaRAID/perccli/perccli64 ]; then

        CLI=/opt/MegaRAID/perccli/perccli64

    else

        CLI=/opt/MegaRAID/storcli/storcli64

    fi

 

    RAID=/tmp/perccli.$$

 

    # Get number of controllers

    NUMCTRL=$($CLI show nolog </dev/null 2>/dev/null | grep "Number of Controllers" | awk '{print $NF}')

    [ -z "$NUMCTRL" ] && NUMCTRL=1

 

    for ctrl in $(seq 0 $((NUMCTRL - 1))); do

        # Controller status

        $CLI /c${ctrl} show nolog > $RAID 2>&1 </dev/null

        CTRLSTAT=$(grep "^Status" $RAID | awk -F'= ' '{print $2}' | tr -d ' \r\n')

        [ "$CTRLSTAT" = "Success" ] && CTRLSTAT=Ok

        echo "Controller Status=$CTRLSTAT"

 

        # Virtual drive status

        $CLI /c${ctrl}/vall show nolog > $RAID 2>&1 </dev/null

        iLOGICALDEV=0

        while IFS= read -r line; do

            # Lines with VD info have format: DG/VD TYPE State ...

            if echo "$line" | grep -qE "^[0-9]+/[0-9]+"; then

                STATE=$(echo "$line" | awk '{print $3}')

                [ "$STATE" = "Optl" ] && STATE=Ok

                [ "$STATE" = "Dgrd" ] && STATE=Degraded

                [ "$STATE" = "Pdgd" ] && STATE=Degraded

                [ "$STATE" = "OfLn" ] && STATE=Offline

                echo "Logical Drive $iLOGICALDEV=$STATE"

                iLOGICALDEV=$((iLOGICALDEV + 1))

            fi

        done < $RAID

 

        # Physical drive status

        $CLI /c${ctrl}/eall/sall show nolog > $RAID 2>&1 </dev/null

        iDEVICE=0

        while IFS= read -r line; do

            # Lines with drive info have format: EID:Slt DID State ...

            if echo "$line" | grep -qE "^[0-9]+:[0-9]+"; then

                EID_SLT=$(echo "$line" | awk '{print $1}')

                STATE=$(echo "$line" | awk '{print $3}')

                [ "$STATE" = "Onln" ] && STATE=Ok

                [ "$STATE" = "GHS"  ] && STATE=Ok

                [ "$STATE" = "DHS"  ] && STATE=Ok

                [ "$STATE" = "UGood" ] && STATE=Ok

                [ "$STATE" = "Offln" ] && STATE=Offline

                [ "$STATE" = "UBad"  ] && STATE=Failed

                [ "$STATE" = "Rbld"  ] && STATE=Rebuilding

                echo "Drive $EID_SLT=$STATE"

                iDEVICE=$((iDEVICE + 1))

            fi

        done < $RAID

    done

    rm -f $RAID

 

#################### Adaptec controllers ####################

elif [ -x /usr/StorMan/arcconf ]; then

    RAID=/tmp/adaptec.$$

    /usr/StorMan/arcconf getconfig 1 al > $RAID

    CTRLSTAT=$(grep "Controller Status" $RAID | cut -d\: -f2 | tr -d ' ')

    [ "$CTRLSTAT" = "Optimal" ] && CTRLSTAT=Ok

    echo "Controller Status=$CTRLSTAT"

    let iLOGICALDEV=0

    grep "Status of logical device" $RAID | cut -d\: -f2 | while read STATUS; do

        STATUS=$(echo $STATUS)

        [ "$STATUS" = "Optimal" ] && STATUS=Ok

        echo "Logical Drive $iLOGICALDEV=$STATUS"

        let iLOGICALDEV++

    done

    let iDEVICE=0

    grep "Device #" $RAID | while read DEVICE; do

        STATUS=$(grep -A 2 "$DEVICE" $RAID | tail -1 | cut -d\: -f2)

        STATUS=$(echo $STATUS)

        [ "$STATUS" = "Online" ] && STATUS=Ok

        echo "Drive $iDEVICE=$STATUS"

        let iDEVICE++

    done

    rm -f $RAID

fi

 

 

 

 

From: Chris James
Sent: 16 April 2026 09:39
To: observium@lists.observium.org
Subject: Fix for hdarray

 

We have continued to have an issue where PERC controller status is flagged as failed on pretty much all our Poweredge Rx20, Rx30, Rx40, Rx50, Rx60 servers. We finally got to the bottom of it today.

Please can the following be corrected.

Root Cause: perccli64 requires an open stdin to function correctly. When it runs with stdin closed (<&-), it returns Status = Failure instead of querying the controller. The Observium agent script closes stdin as a security measure (exec <&- 2>/dev/null) before running all local scripts, so every time the agent ran our hdarray script via the socket, perccli silently failed.

The Fix: Add </dev/null to all four perccli calls in the hdarray script so the binary gets a valid (empty) stdin rather than a closed file descriptor:

$CLI show nolog </dev/null 2>/dev/null

$CLI /c${ctrl} show nolog > $RAID 2>&1 </dev/null

$CLI /c${ctrl}/vall show nolog > $RAID 2>&1 </dev/null

$CLI /c${ctrl}/eall/sall show nolog > $RAID 2>&1 </dev/null

What made this so hard to diagnose: The script worked perfectly in every direct test (setsid, bash -x, env -i, running as root via ssh) because all those methods leave stdin open. It only failed via the agent socket where stdin was closed — exactly replicating production conditions was key.

Many thanks,

 

Chris