ProLiant Servers (ML,DL,SL)
1748036 Members
4651 Online
108757 Solutions
New Discussion юеВ

Re: DL380 disk fail continuously

 
tomguk
Occasional Advisor

DL380 disk fail continuously

Hi there,

I have a server that has 6 disks running off a smartarray 6402. One channel for 2 disks running C: - the other channel running 4 disks for other partitions. RAID 5 for the one logical drive that contains the various partitions.

I had a hardware fail on one of the c: drives - bunged in a new hard drive which rebuilt, and upon rebuild, failed again. I did this twice - first with a working disk from an old server and then with a brand new spare.
(HP U320 146.4Gb)

I have tried using imaging software to grab the existing partitions of data off, but each time the app starts to read the partition data ( when it is about to offer me a choice of what I would like to backup) it fails and the server restarts.

I've tried smart start - but that hangs as its loading.


So, I have a situation where I need to get the existing partitions back online somehow. ( E / F / etc) - I can lose the c drive and rebuild but the physical disks are not staying alive in that physical slot.

( I managed to run ADU but have no floppy port to save report to)

I am wondering whether it could be the backplane instead of the disks, re putting in 2 disks and them both failing to rebuild.

I have an older DL380 G4 - would it be possible to build a fresh C drive in there - then add the existing other 4 disks - would they rebuild themselves / present themselves to the OS for use?

I can either move the existing 6402 to the older chassis or use another 6402 that I have spare.

I dont have enough experience in array technology to know how to keep the data safe..

Please would anyone have a suggestion or guidance for me?
10 REPLIES 10
TTr
Honored Contributor

Re: DL380 disk fail continuously

It looks like booting fails in general, with the C: drive and with the imaging sw (I assume this is self booting CD/DVD). It could be the backplane or even the system board. The raid configuration is stored on the disks so if you hook up the disks to another server with the same or different 6402 (preferably a different one to eliminate any problems with the old 6402), the controller will read the raid configuration from the disks and will bind the disks in the last saved raid configuration and you will see the partitions. They recommend to do a backup before starting any raid disk movement but in your case you can't run a backup.
tomguk
Occasional Advisor

Re: DL380 disk fail continuously

Thanks for that - I was hoping someone could reassure me that I can try dropping the disks into another chassis.

I think that it is a combo of the drive not being bootable any more and either the backplane being duff or some bad data on one of the disks that is stopping the rebuild safely. Guess the red light on the failing disk could be a red herring...
tomguk
Occasional Advisor

Re: DL380 disk fail continuously

I was wondering if you had a bit of further guidance:

If I were to drop disks into another server - I was thinking to boot it straight into acronis to grab the partition I want.

Or.. is it going to take a while first for the disks to re-array themselves..?
tomguk
Occasional Advisor

Re: DL380 disk fail continuously

oh gosh, and another query.

I have the spare server with 6402 array card in.

I am loathe to spin up the existing disk array more than neccessary till I know its back in working condition or not.

What in anyones opinion would be the way to go:

- try putting all 6 disks in the array slots and seeing whether it boots

- or configuring another 2 new disks for the c drive and then adding the 4 disks containing the partitions I want access to, to the remainder of the slots.

I don't know whether the info on the 4 disks will be enough for them to rebuild or whether I need all 6 original disks to do that.

to recap - 6 disks configured RAID5, with 4 partitions.
One channel had 2 disks with c drive on it, the other channel had the remaining 4 drives with 3 data partitions on.
TTr
Honored Contributor

Re: DL380 disk fail continuously

> One channel for 2 disks running C: - the other channel running 4 disks for other partitions

> to recap - 6 disks configured RAID5, with 4 partitions.
> One channel had 2 disks with c drive on it, the other channel had the remaining 4 drives with 3 data partitions on.

This can not be. A. Either you have two raid arrays one raid1 (two disks mirrored pair) for drive c and one 4-disk array in raid5 for data
B. Or all six disks are in raid5. In this case drive c is in all six disks and the rest of the partitions are on all six disks as well.

> I had a hardware fail on one of the c: drives
From the behavior of the failure of drive C only, it looks like you have the first setup "A", I outlined above. You need to verify this.
tomguk
Occasional Advisor

Re: DL380 disk fail continuously

I have setup B.

One raid card / 6 drives / all RAID 5.
One logical drive partitioned into 4 partitions.

Maybe I am using the wrong terminology - the os would not boot - hence my referral to C drive.
I managed to acronis an older image to the c partition a couple of days ago, and the server was up again. It tanked after around 4 hours though.

Thanks for going this with me, I can see now my logic was a bit off, re a disk failing and the server not booting, was leading me to think that the failed drive was "C" so to speak.

Putting a fresh disk in was what would be the solution I thought, but now that that fresh disk has failed, I am not sure what to do.

tomguk
Occasional Advisor

Re: DL380 disk fail continuously

From the ACU:

Controller Slot 2

scsi bus 0: device ID 2 3 4 5
scsi bus 1: device ID 0 1

Raid 5 distributed data guarding
logical capacity 683.6Gb

Upon boot I was getting message SCSI Port 2 SCSI ID 0 needs auto data recovery, run ADU

The machine continues to boot and the disk goes red.

The first time I put the fresh disk in, I hit one of the F key combos, to put it into rom based utils, and then left it to rebuild, didnt actually use the utils to keep disk spin to a minimum. The drive stopped blinking green after a few hours, looking like it was rebuilt.

Ran the ADU and the disk tanked. Red light on front.

TTr
Honored Contributor

Re: DL380 disk fail continuously

In this case you would need all six disks to transfer the raid volume to another server. It definitely sounds it is isolated to ID 0 in scsi bus 1 (scsi port2), it could be the disk cage slot or it could still be the disks.

There may be something else wrong here, even with a failed disk, the raid5 volume should have stayed up and running. Might there be another failed disk or a problem with the controller?
tomguk
Occasional Advisor

Re: DL380 disk fail continuously

Yeah, there is something definitely afoot.

Thanks for clarification re the 6 disks, I realised that after your last reply cheers.

Could be the firmware as I upgraded raid card and hard drives a day before. Sometimes upon restart I get scsi device 3 and 0 reporting as have having failed and now online.

I can't keep a clear line of process - too many random errors - sometimes the eternal disk 0, randomly disk 3 ( this seems to occur when I try acronis which now always fails upon getting to the screen to show what you want to backup, i.e. reading the disks / partitions)

I'm thinking as a last attempt, taking all 6 disks and putting them into a g2 dl380 and seeing what happens. That will isolate the backplane being in error. Not sure whether to use as you said, a different controller or to keep existing one to just isolate the backplane.

I was thinking about trying to downgrade firmware on the raid controller ( if I can find out how to) as another thing to try.

I'm hating that all this troubleshooting is muddying the waters, i.e. trying so many separate paths is not really the best logical way to troubleshoot.