ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

DL360 G2 Smart Array 5i data loss and corruption

Derek_31
Valued Contributor

DL360 G2 Smart Array 5i data loss and corruption

Today I experienced a very strange and disturbing drive problem on my DL360 G2. I have a RAID-1 setup with 18GB drives. All was fine for the last 2 years until this AM.

Windows 2000 crashed on me, rebooted, and then crashed again. By the time I realized what was happening, the server was powered on, both red lights on HDs was on, and no console response.

So I powered it off, and when it came back up, I got a smart array warning about a bad disk. So I replaced the disk tried to boot into Windows. I didn't get the full login screen before I got the BSOD. So I pulled out the new drive and just booted with one. Windows came up, but just after I logged in I got a 'hard error' dialog box and Windows froze. Oh, and during at least one of those times, the red lights came on both the drives and Windows BSOD.

I finally then let the system sit at the BIOS screen while the rebuild (flashing disk light) was blinking. I let that run for an hour and then attempted to boot into Windows. I was able to run ACU, and it showed 17% rebuild complete. It was set for low priority rebuild, so I kicked that up to High. A minute later both red lights flashed on the drives and I got the 'hard error' message and Windows died.

I rebooted again, and this time I got a login screen. I logged in and ran ACU, and it showed no rebuild in process. This was about 2 minutes after I put it to high, and it was just at 17%..but 'instantly' it's done ????

The mirror was divided into two partitions..about 9GB each. The D partition was toasted in Windows and I had to reformat. The system log had numerous NTFS corruption entried before the reformat. I restored the little data that was on there and no more errors.

I'm still unsure what filesystem damage was done to the C drive. I looked the IML, and it just showed two drive failures, a non responsive physical drive, and nothing else except the BSODs.

I've had drives fail before, and it was always transparent to the OS. The Smart Array was at 2.38 (I know 2.58 is out).

What really bugs me is:

1) A drive failure took down the OS
2) Several times during the rebuild both drives died and crashed the OS
3) I'm not sure the rebuild really did get done
4) The D partition was clearly toasted.

What the heck happened? When I first found the dead server the power lights were on, NICs on, red lights on BOTH drives, and no video or keyboard. Something seriously went wrong.

I'm thinking something with the SCSI bus or SCSI backplane is hosed and that I should get the guts replaced. Data loss of any type is unacceptable. Thankfully I could restore from a backup. But it caused my users 2+ hours of downtime today.

Any ideas?



2 REPLIES
Derek_31
Valued Contributor

Re: DL360 G2 Smart Array 5i data loss and corruption

Today the server failed two more times with the same symptoms. So I put the old drives into a different DL360 G2 and will get the system board replaced on the failing server.
e4services
Honored Contributor

Re: DL360 G2 Smart Array 5i data loss and corruption

I would take the RAID controller out of the equation. SOunds to me that it is making errors. You are lucky that you have only a RAID1 and should be able to use one of the drive to start the system off of, but it sounds as if you do not care since you lost some data and the Windows install also sounds like it has been corupted.
Restore to a drive on the onboard SCSI to get going and test for stability.
Hot Swap Hard Drives