Disk Enclosures

Motherboard Replacement

Barry Elliott

Motherboard Replacement

I have a Proliant ML350 with a 439399-001 (395566-002) motherboard. The Array is 4 72 gb drives in raid 10 and 2 500 gb drives mirroring each other for a total of 6 drives.Drive 1(first drive) keeps failing and I have been told the embedded E200i is failing. I have put 3 different drives in the bay and I get same error.I was advised to get a replacement motherboard. I have gotten it but it is a little different - 461081-001(395566-003.) Now with the first drive not able to rebuild, I want to swap out the motherboard but concerned about the raid setup.I have been told that the array info is on the drives and I can just accept the changes when I reboot and no data will be lost. I also have been told that the array data is on the chip and all data will be lost. Data is backed up but I need to align the right resources if a full array rebuild and data restore is needed or just swap out the motherboard and accept the changes? Any info would be appreciated.Thanks.
Barry Elliott

Re: Motherboard Replacement

Well I changed the motherboard and after 3 reboots all logical drives were back and almost every thing is good except the original problem remains. Drive in slot one will not rebuild. I have replaced 3 times.Here are events:

"Due to an unrecoverable read error, the recovery of logical drive 1 configured on array controller [Embedded] was aborted while rebuilding a physical drive.

The physical drive which was being rebuilt is located in bay 1 of box 1 which is connected to port 1I of array controller [Embedded] .

The physical drive that reported the read error is located in bay 3 of box 1 which is connected to port 1I of array controller [Embedded]."

Then after that I get:

"Drive Array Logical Drive Status Change. Logical drive number 1 on the array controller in Slot 0 has a new status of 6.
(Logical Drive status values: 1=other, 2=ok, 3=failed, 4=unconfigured, 5=recovering, 6=readyForRebuild, 7=rebuilding, 8=wrongDrive, 9=badConnect, 10=overheating, 11=shutdown, 12=expanding, 13=notAvailable, 14=queuedForExpansion)

and then I get:

"Drive Array Physical Drive Status Change. The physical drive in Slot 0, Port 1I Box 1 Bay 1 with serial number "DQA7P7A00TLM0741", has a new status of 2.
(Drive status values: 1=other, 2=ok, 3=failed, 4=predictiveFailure)

I have replaced the motherboard (embedded E200i) and the drive with 3 different drives.
Any help would be appreciated.
Marino Meloni_1
Honored Contributor

Re: Motherboard Replacement

The raid have redundant data spread across your three disks.
That mean if you loose one disk, the data are still there, some readable and some compressed, but you still have everything.
Now, when you go for a rebuild, the system read the data on the two remaining disks, and add the data into the third disk.

If in one of the two disks with data, you have a defective bloc, a crc error, a read error, or any of the physical block, when it is read by the rebuild process is not matching the crc, that mean the source data are not congruent, and the system cannot rebuild the raid, as it cannot invent what kind of data you have on that block.

You will never been able to rebuild a raid where you have an error on the remaining two disks.

What can you do:

you should know that the raid controller, work at a block level, so he do not care if the defective block have data or not on it.

The steps to performms are the folling:
backup all your data to tape or to another storage. As I just said, it may be that all your data are still available (if the defective block was in an area without data) or it maybe you will have a single file damaged.

reformat the disks, and recreate the raid

Then restore the data.

Someone had some success running some of the tools that detect and remap defective blocks, but you should try this only after you are done with the backup.
Barry Elliott

Re: Motherboard Replacement

Thanks for the advice. I do have a couple questions if you don't mind:

1) I have 4 disks in the "logical drive 1" setup in a raid 10. I think I know the answer but can I pull out a second disk(slot 3) with slot 1 already out and put in a new one and let it rebuild then put a new one in slot one?

2) I am getting this temperature warning from bay1 in the enclosure even without a drive in there. Do you think there is a possibility the back plane has a problem?

3) Do you know the best way/software to use to backup and restore? This are Windows 2003 Server.

4) There is a second logical drive which is 2-500 gb drives. Is there a way for me to move them to another ML350 G5 without losing data? I have a total of 3.

I know I am asking a lot but I am understaffed (only me) and over budget as it is. I think it is possibly time for a pro.
Marino Meloni_1
Honored Contributor

Re: Motherboard Replacement

first of all, it is now several years I do not work directly with Proliant platform, and you may get more support on this question if you ask also in the Proliant Forum, and contact HP, even if your server is not under contract, I think they will answer the questions you may ask them

answering your questions:

1 raid 10 should have the possibility to loose two drive, but not in any combination, so I do not think that pulling out another drive will allow you to achieve your goal, also because I suspect the problem is on the disk with unique informations, not on the one with still a mirror active.

2 it could be the backplane have a problem, but that could not be the reason for the rebuild failure that is indicated in the error message ""Due to an unrecoverable read error, the recovery of logical drive 1 configured on array controller [Embedded] was aborted while rebuilding a physical drive"

3 You can just clone the volume with what you want, or use a backup application to save all the data, then restore them. the first way is more easy (starting from a usb key or a cdrom, allow you to create an image on the LAN. The second methode need more time as when you have do the full backup, you need, after you recreate the raid with the new disk, to reinstall the OS, reinstall the backup application, then restore the data.

4 as you said in the first post, the informations of the raid are on the disks, you can move it, but you may have change the configuration of the array if you add those disk, I cannot give you more details as I do not have these skills anymore, but you may contact HP, and they should be able to confirm the steps to do if this is supported.