Operating System - Linux
1752352 Members
5574 Online
108787 Solutions
New Discussion юеВ

Re: Detecting failed disks in a Raid

 

Detecting failed disks in a Raid

Hi

I just installed Debian Sarge Linux og a ProLiant ML 350 G4p server, and everything works great!

It's a web-server, so I didn't install any GUI (X-windows) at all, only change I did from the standard install was to use the 2.6 kernel.

The only thing I'm missing is a way to get an alert if a hard disk fails. I saw there are some monitoring tools available for Linux, but it seemed to have quite a bit of requirements.

So I looked around in /proc to see if there was a way to find out if the Raid is ok, but I couldn't find any.

One looked promising:

cat /proc/driver/cciss/cciss0
cciss0: HP Smart Array 642 Controller
Board ID: 0x409b0e11
Firmware Version: 2.58
IRQ: 201
Logical drives: 1
Current Q depth: 0
Current # commands on controller: 0
Max Q depth since init: 159
Max # commands on controller since init: 261
Max SG entries since init: 31
Sequential access devices: 0

cciss/c0d0: 72.83GB RAID 1(1+0)

But I then tried to remove one of the hotswap disks, but I could see no difference.

Anyone know where to look for Raid status information? I could then write my own little script to send an alert if a harddisk fails.

Best regards,
Guttorm Fj├╕rtoft
5 REPLIES 5
Andrew Cowan
Honored Contributor

Re: Detecting failed disks in a Raid

The Redhat commands are:

mdadm Manage software RAID (mdadm ├в detail /dev/md0)
partprobe [-s] Inform OS of partition table changes
"watch" or "cat /proc/mdstat"

mpstat View a RAID's status
raidstart /dev/md0 Start the array
mkraid create the array
Also see "/etc/raidtab"

Example:
Here's a sample configuration file:

#
# sample raiddev configuration file
#
raiddev /dev/md0
raid-level 0
nr-raid-disks 2 # Specified below
persistent-superblock 0
chunk-size 8

# device #1:
#
device /dev/hda1
raid-disk 0

# device #2:
#
device /dev/hdb1
raid-disk 1

# A new section always starts with the
# keyword 'raiddev'

raiddev /dev/md1
raid-level 5
nr-raid-disks 3 # Specified below
nr-spare-disks 1 # Specified below
persistent-superblock 1
parity-algorithm left-symmetric

# Devices to use in the RAID array:
#
device /dev/sda1
raid-disk 0
device /dev/sdb1
raid-disk 1
device /dev/sdc1
raid-disk 2

# The spare disk:
device /dev/sdd1
spare-disk 0

Let's walk through this example, line by line:

#
# sample raiddev configuration file
#
raiddev /dev/md0


Re: Detecting failed disks in a Raid

Thank you but this wont work. I am using hardware Raid - not software.

The Linux kernel can use the Raid controller just fine - rebuilding also works, but I cant find any way to ask the driver of the Raid status.

Unplugging a disk and putting it back in produces nothing in the logs, that I think is kind of strange. But maybe I'm looking in the wrong places?

Re: Detecting failed disks in a Raid

Someone pointed me to

http://www.ussg.iu.edu/hypermail/linux/kernel/0302.0/1066.html

There seems to be a way :)

Now I'll try to get those rpms installed.

Re: Detecting failed disks in a Raid

Sorry for spamming the forum replying to myself. But if there are some other Debian users - the simple solution is

apt-get install cpqarrayd

Hope this helps someone else :)
Ralph Grothe
Honored Contributor

Re: Detecting failed disks in a Raid

As you are using HW RAID I would suggest
to look at your RAID controller's driver docs for any possibilities how to check the disks' state.
If any custom utilities exist then they should be mentioned there.
Also it should be stated in what guise the RAID's disks stat appear in the procfs (or even sysfs if it's that current).
Maybe you are lucky and the driver even exists as source code, where you could then look for comments and implementaion details.
If you haven't so done yet install the kernel sources and search there for any hints relating to your controller's driver.
Madness, thy name is system administration