This is an old revision of the document!


monitor HP Smart Array from NRPE / cron for FreeBSD

Synopsis

I wrote this little check-script for nrpe/nagios to get the status of various raids in a box, and output the failed volumes if any such exist.

Syntax

$path/check_smartarray.sh [email] [email]

If no arguments are specified, the script will assume its run for NRPE. If one or more email addresses are specified, the script will send an email in case an array reports an error.

Output

da1: DEGRADED / da2: rebuilding / da0: ok / da3: ok

Failed/rebuilding volumes will always be first in the output string, to help diagnose the problem when recieving the output via pager/sms.

Output Examples

output description
ok The device is reported as ok by the smart array controller
DEGRADED The RAID volume is degraded, it's still working but without the safety of RAID, and in some cases with severe performance loss.
rebuilding The RAID is rebuilding, will return to OK when done
expanding The RAID is expanding, will return to OK when done
ready for recovery The RAID is ready for recovery, but not recovering. This can happen if automatic recovery is disabled, and on some smaller versions of the Smart Array Controllers where only one RAID volume can be rebuild at a time
unknown state Volume is in an unknown state. Please report this to me (soren at klintrup.dk) so I can udate the script, include the following output. camcontrol devlist, camcontrol inquiry da0 -D - run the inquiry for every volume on the system.

Compability

Should work on all smartarray controllers though - if you test on another (working or not) controller, I would like to know, please mail me on soren at klintrup.dk.

I have tested the script on the following controllers

  • HP Smart Array 6i
  • HP Smart Array 5i
  • HP Smart Array P400
  • HP Smart Array P410
  • HP Smart Array P420
  • HP Smart Array P800

Download

Latest version

Latest version 1.7 check_smartarray.sh

Old versions

Changelog

1.7

     o Updated to work with FreeBSD 10.1

1.6

     o HP Finally changed the SCSI output of their latest smart array controllers, updated script to be compatible with both versions
       Thanks to Paul Yates for reporting this and providing sample output

1.5

     o Can now email an address of choice, just use email address(es) as arguments to shellscript
     o check if camcontrol binary exists on system before running script

1.4.5

     o Problems with status of ADG (Advanced Data Guarding) Volumes fixed.
 Thanks to Peter Larsen for reporting this

1.4.4

     o Added online expansion
       Thanks to Mikael Antonsen for reporting this :)

1.4.3

     o Changed tr A-Z a-z to tr [:upper:] [:lower:] to prevent problems with various locales.
       Thanks to Oliver Fromme for reporting this :)

1.4.2

     o The nagios web interface would only show one RAID volume, it seems nagios blocks "|" in the input and throws everything after that away.
       Changed the "|" to a "/"
       Thanks to Kai Gallasch for reporting this :)

1.4.1

     o Patch by Christoph Schug applied to replace two (cut) systemcalls with one (sed) when getting DEVICESTRING.
     o Added quotes in various places for consistency
     o Don't set state to unknown if state is already critical (for code added in 1.4)
     o unset $ERR before doing anything to avoid problems if the variable is already set

1.4

     o If a volume didn't have a known state, it just wouldn't show that volume, it now exits with errorcode3 and outputs as "unknown state"

1.3.1

     o Using tr to replace the string-output from camcontrol, for a more human-readable script, no changes in functionality.

1.3

     o Initial public release