System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

P400 RAID scrubbing/verifying, howto?

 
Highlighted
Occasional Advisor

P400 RAID scrubbing/verifying, howto?

On a DL380G5 with P400-controller, running Linux (CentOS5).
Is it possible to make the RAID-controller verify all the data/disks in the RAID? I'm looking for some kind of verify/scrubbing-functionality.

I failed to find any such feature in HPACUCLI.
Are there any other P400-tools that I should be aware of?

Regards,
Mattias
6 REPLIES 6
Highlighted
Exalted Contributor

Re: P400 RAID scrubbing/verifying, howto?

Shalom,

How about dd from the operating system.

dd if=/dev/random of=/dev/dsk bs=1024 count=1000000

That will pretty much make useless (count is optional) on any raid array.

Note that the NSA could probably still reconstruct the disk after this, so we have ours crushed to insure they are not re-used.

That does lower the resale value of the disk 100% however.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Highlighted
Occasional Advisor

Re: P400 RAID scrubbing/verifying, howto?

I probably was a bit unspecific in my initial post.
Basically what I would like to do on a regular basis is to verify the RAID-sets w r t data corruption -- both silent and non-silent corruptions -- and have checksum mismatches reported. Spare-disks I would like to have verified also.
It would also be nice to have each of the disks in the RAID-set (or at least the spare disks) to be checked in a way similar to how the program badblocks works (thereof my use of the word scrubbing).

/Mattias
Highlighted
Honored Contributor

Re: P400 RAID scrubbing/verifying, howto?

You can try the Array Diagnostics Utility at http://h18007.www1.hp.com/support/files/server/us/download/27057.html

IMO your desire is a little bit of overkill...the hardware manages that stuff pretty transparently to the operating system. For example, you may be issuing a command at the driver level to read in a "track" of a hard drive...but what really happens is that because the drive geometry presented to the OS is different than the actual physical drive geometry, you're really reading in a portion of the end of one physical track and the beginning of another...oh, and by the way, during the manufacturing process a sector in that sequence was marked as bad, so transparently a replacement sector from another section of the disk was remapped into its place; something similar was done on the fly for another sector during an operation recently as well.
Highlighted
Occasional Advisor

Re: P400 RAID scrubbing/verifying, howto?

Thanks, the hpadu-utility provides more information about the status of the RAID.

About the verification of the disks, I'm not interested in doing manuall "low-level disk access". But I would like the RAID-controller to be able to verify the integrity of its RAID-sets, to verify that the disks does indeed give back the same information written to them, and I would like the controller to inform me of any errors or anomalies encountered.
Perhaps I'm paranoid, but it seems like e.g. silent data corruption is a fact of life, and it would be nice to have it detected. :)

/m
Highlighted
Occasional Advisor

Re: P400 RAID scrubbing/verifying, howto?

I checked into the documentation more thoroughly -- I should have done that from the begining, sorry.
I guess the functionality i'm inquireing about is the "Surface Scan Analysis"*.

1) Are there any ways to get any status-report for how much has been scanned?

2) How/where are encountered errors logged?

3) It says SSA is performed only on configured logical drives. Does that include the designated spare-disks?

/m

*)
"Surface Scan Analysis will scan drives whenever the controller is otherwise idle, and stops whenever I/O requests are received from the operating system. The amount of delay or idle time that the controller waits before resuming Surface Scan Analysis is configurable and measured in seconds. The default value is 15 seconds for scans that occur after the initial scan on newly-created, never scanned volumes."
Highlighted
Occasional Advisor

Re: P400 RAID scrubbing/verifying, howto?

Hate to reply to myself... :)
Maybe someone find our results interesting.

We have done a few tests.

Setup: HP DL380G5, with p400-controller and battery-backup. 8 146GB 10k SAS-disks arranged in a RAID6. LUN on said array formatted with ext2.

Time to "Surface Scan" the full array is approx 2 days when absolutely _no_ disk-io is performed on the LUN. The only parameter I could find to control the Surface Scan behaviour was the "Surface Scan Delay" which was set to 3 seconds (the value, I guess, should have no impact on the scan-time if there is no disk-io).

The surface scan seems to scan all disks in a syncronous manner.
During a surface scan, the controller does not check RAID-parity*, so a silent data-corruption on a disk will not generate any notifications, warning, or error messages (dmesg, system event log (sel list), HP integrated management logging).
This behaviour were explicitly tested by altering data-blocks on one of the disks in the array (without having the controller to know anything about it, ofcourse).

There seems to exist (according to hpadu), a parameter called "Force Scan Complete", but I have yet no idea of how to invoke that feature (if it is a feature).

*) or if it does, it will silently just update the parities, trusting the data read from disk is correct.