Operating System - Linux
1753675 Members
5359 Online
108799 Solutions
New Discussion юеВ

How to detect a Raid failure with Linux ?

 
Mike_811
Advisor

How to detect a Raid failure with Linux ?

Hi guys,
I've a DL380 running Linux Centos. The OS is installed on disks mirrored with a controller RAID0+1.
How can I setup the OS in order that when the mirror is broken an alarm raises (snmp, Email, logs...) ?
I can't see anything with the ILO and in the logs ?
Thanks for your help
5 REPLIES 5
Heironimus
Honored Contributor

Re: How to detect a Raid failure with Linux ?

On a Proliant you should install the HP tools for the Red Hat Enterprise Linux version that corresponds to your installed CentOS version. You'll probably need to edit /etc/redhat-release to look like an official RHEL version to get the tools installed and configured, but once they're set up you can put the CentOS one back and they should run just fine.

I think the HP agents send SNMP traps, but you will need some other tool to handle those traps and do something useful.
Ivan Ferreira
Honored Contributor

Re: How to detect a Raid failure with Linux ?

The HP insight management agents can send you a mail when something goes wrong with the hardware:

This is a mail I received recently:

Trap-ID=3034

Logical Drive Status Change: Slot 0, Drive: 1.
Status is now Rebuilding.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
dirk dierickx
Honored Contributor

Re: How to detect a Raid failure with Linux ?

i always put my money on the software raid from linux itself.
skt_skt
Honored Contributor

Re: How to detect a Raid failure with Linux ?

i heard of Raid Aplication CDs which can be used to install raid management s/w. But not sure if they are available for Linux versions.
Ralph Grothe
Honored Contributor

Re: How to detect a Raid failure with Linux ?

Sorry, if this doesn't apply to your HW RAID layout.
But I also would second Dirk in his preference for Linux's own SW RAID.
We have been running all our Linux servers with this for several years now and never have experienced any data loss owe to some RAID failure.
I think the Linux MD layer is ultra stable,
very easy to set up and administer.
The same goes for the mdadm monitor mode
which lets you most easily plug in your own custom alerting or event handling scripts (I for instance have it send passive check results to my Nagios server, which always has notified me in time when there needed a disk to be replaced).
On RHEL you are already provided with an mdmonitor init script.
In /etc/mdadm.conf all what's left is to set PROGRAM to point to your custom event handler.
e.g.
# grep ^PROGRAM /etc/mdadm.conf
PROGRAM /usr/local/sbin/mdevent.pl
Madness, thy name is system administration