ProLiant Servers (ML,DL,SL)
1825711 Members
3089 Online
109686 Solutions
New Discussion

Poliant ML370G5 array rebuild problem.

 
J. T. Kirk
Advisor

Poliant ML370G5 array rebuild problem.

Hi all,
i'm experiencing a weird problem on 4 servers as in subject.

In each server there are 2 controllers SmartArray P400 with 512 MB of cache, the controller in slot 1 manages drives from 1 to 8 of the first cage (140GB 2.5" SAS 10K), the controller in slot 5 manages drives from 9 to16 of the second cage (140GB 2.5" SAS 10K).
On the first controller there is configured array A (raid 5 of 7 disks plus spare) with 2 logical drives, on the second controller there is configured array B (raid 5 of 7 disks plus spare) with 1 logical drive.

If i detach a drive on the first cage the spare disk comes online and the rebuild begins and ends correctly.

If i detach a drive on the second cage the spare disk doesn't come online and the server sops responding via O.S. forcing me to power it down. When i power it on again the automatic rebuild starts and ends correctly.

All the server have been updated to the latest firmware available with Maintenance CD 7.91.

Thanks in advance.
13 REPLIES 13
KarloChacon
Honored Contributor

Re: Poliant ML370G5 array rebuild problem.

hi J. T. Kirk

yes I would say yes it's weird in 4 servers same condition.

all of them are Windows or linux? mix?

let me ask you which firmware you have in those controller? 4.06 or 4.12?

I supposed you have the latest drivers also? 7.91 also?

regards
Didn't your momma teach you to say thanks!
J. T. Kirk
Advisor

Re: Poliant ML370G5 array rebuild problem.

Hello Karlo,

on these 4 server the O.S. running is VMWare ESX 3.0.2 with HP MGTM Agents 7.9.0.

The firmware on the controller is the one included in HP Firmware maintenance cd 7.9.1 (4.06).

Seeing that you wrote 4.12 i searched more deeply and i found the additional 4.12 firmware for the maintenance CD, i think i'll try the update.

Thanks for the suggestion.

I'll keep you updated on the results.

Best regards, JTK.
J. T. Kirk
Advisor

Re: Poliant ML370G5 array rebuild problem.

:-(

Hi all,

unfortunately the problem still persist even with P400 firmware 4.12, same behaviour, the array B is rebuilded automatically only after the powerdown and powerup of the server.

J.T.K.
KarloChacon
Honored Contributor

Re: Poliant ML370G5 array rebuild problem.

hi

I really dont know about VMware but let me ask
is there any error logged when that scenario takes place?

Regards
Didn't your momma teach you to say thanks!
J. T. Kirk
Advisor

Re: Poliant ML370G5 array rebuild problem.

Hello Karlo,

the only log i have is this:

[root@VMML370C1P01 log]# cat messages | grep cmai
Jan 27 20:10:45 VMML370C1P01 hpasm: Shutting down Storage Agents (cmastor): cmaeventd cmaidad cmafcad cmaided cmascsid cmasasd
Jan 27 20:10:46 VMML370C1P01 hpasm: Shutting down IDA agent (cmaidad):
Jan 27 20:10:48 VMML370C1P01 hpasm: Shutting down IDE agent (cmaided):
Jan 27 20:17:01 VMML370C1P01 hpasm: Starting Storage Agents (cmastor): cmaeventd cmaidad cmafcad cmaided cmascsid cmasasd
Jan 28 09:51:15 VMML370C1P01 hpasm: Starting Storage Agents (cmastor): cmaeventd cmaidad cmafcad cmaided cmascsid cmasasd
Jan 28 09:51:20 VMML370C1P01 cmaidad[2101]: Logical Drive Status Change: Slot 5, Drive: 1. Status is now Rebuilding.
Jan 28 09:51:20 VMML370C1P01 cmaidad[2101]: Physical Drive Status Change: Slot 5 Port 1I Box 1 Bay 14. Status is now Failed.
Jan 28 09:55:08 VMML370C1P01 cmaidad[2101]: Logical Drive Status Change: Slot 5, Drive: 1. Status is now OK.
Jan 28 09:55:08 VMML370C1P01 cmaidad[2101]: Spare Drive Status Change: Slot 5 Port 1I Box 1 Bay 16. Status is now Active.
Jan 28 10:01:24 VMML370C1P01 cmaidad[2101]: Logical Drive Status Change: Slot 5, Drive: 1. Status is now Rebuilding.
Jan 28 10:01:24 VMML370C1P01 cmaidad[2101]: Physical Drive Status Change: Slot 5 Port 1I Box 1 Bay 14. Status is now OK.
Jan 28 10:06:28 VMML370C1P01 cmaidad[2101]: Logical Drive Status Change: Slot 5, Drive: 1. Status is now OK.
Jan 28 10:06:28 VMML370C1P01 cmaidad[2101]: Spare Drive Status Change: Slot 5 Port 1I Box 1 Bay 16. Status is now Inactive.
Jan 28 10:13:20 VMML370C1P01 hpasm: Starting Storage Agents (cmastor): cmaeventd cmaidad cmafcad cmaided cmascsid cmasasd
Jan 28 10:13:26 VMML370C1P01 cmaidad[2099]: Logical Drive Status Change: Slot 5, Drive: 1. Status is now Rebuilding.
Jan 28 10:13:26 VMML370C1P01 cmaidad[2099]: Physical Drive Status Change: Slot 5 Port 2I Box 1 Bay 9. Status is now Failed.
Jan 28 10:17:14 VMML370C1P01 cmaidad[2099]: Logical Drive Status Change: Slot 5, Drive: 1. Status is now OK.
Jan 28 10:17:14 VMML370C1P01 cmaidad[2099]: Spare Drive Status Change: Slot 5 Port 1I Box 1 Bay 16. Status is now Active.
Jan 28 10:19:14 VMML370C1P01 cmaidad[2099]: Logical Drive Status Change: Slot 5, Drive: 1. Status is now Rebuilding.
Jan 28 10:19:15 VMML370C1P01 cmaidad[2099]: Physical Drive Status Change: Slot 5 Port 2I Box 1 Bay 9. Status is now OK.
Jan 28 10:24:18 VMML370C1P01 cmaidad[2099]: Logical Drive Status Change: Slot 5, Drive: 1. Status is now OK.
Jan 28 10:24:18 VMML370C1P01 cmaidad[2099]: Spare Drive Status Change: Slot 5 Port 1I Box 1 Bay 16. Status is now Inactive.
[root@VMML370C1P01 log]#

As you can see the log starts with the rebuild and not with the failure of the disk because as soon as i unplug one disk from the second cage the server locks up.

I'll try even the upgrade of the BIOS of the server which is not the last one, i'll keep you updated.

Best Regards, JTK.
J. T. Kirk
Advisor

Re: Poliant ML370G5 array rebuild problem.

Sad update.
Flashed the BIOS to varsion 11/13/2007.
First rebuild after reboot runs correctly, after it finishes, if i unplug a whatever drive from cage 2 locksup the server and no logs are produced.

Jan 28 19:07:59 VMML370C1P01 cmaidad[2008]: Logical Drive Status Change: Slot 5, Drive: 1. Status is now Rebuilding.
Jan 28 19:08:00 VMML370C1P01 cmaidad[2008]: Physical Drive Status Change: Slot 5 Port 2I Box 1 Bay 9. Status is now Failed.
Jan 28 19:08:00 VMML370C1P01 cmaidad[2008]: Spare Drive Status Change: Slot 5 Port 1I Box 1 Bay 16. Status is now Building.
Jan 28 19:08:09 VMML370C1P01 cmaeventd[1983]: Hot-plug drive removed: Port 2I Box 1 Bay 9 of Array Controller in slot 5.
Jan 28 19:08:09 VMML370C1P01 cmaeventd[1983]: Physical drive failed: Port 2I Box 1 Bay 9 of Array Controller in slot 5.
Jan 28 19:08:12 VMML370C1P01 cmaeventd[1983]: Logical drive 1 of Array Controller in slot 5, has changed from status OK to Interim Recovery
Jan 28 19:08:12 VMML370C1P01 cmaeventd[1983]: Logical drive 1 of Array Controller in slot 5, has changed from status Interim Recovery to Ready For Rebuild
Jan 28 19:08:12 VMML370C1P01 cmaeventd[1983]: Logical drive 1 of Array Controller in slot 5, has changed from status Ready For Rebuild to Rebuilding
!

Jan 28 19:13:04 VMML370C1P01 cmaidad[2008]: Logical Drive Status Change: Slot 5, Drive: 1. Status is now OK.
Jan 28 19:13:04 VMML370C1P01 cmaidad[2008]: Spare Drive Status Change: Slot 5 Port 1I Box 1 Bay 16. Status is now Active.
Jan 28 19:13:12 VMML370C1P01 cmaeventd[1983]: Logical drive 1 of Array Controller in slot 5, has changed from status Rebuilding to OK
!Jan 28 19:19:34 VMML370C1P01 cmaidad[2008]: Logical Drive Status Change: Slot 5, Drive: 1. Status is now Rebuilding.
Jan 28 19:19:34 VMML370C1P01 cmaidad[2008]: Physical Drive Status Change: Slot 5 Port 2I Box 1 Bay 9. Status is now OK.
Jan 28 19:19:42 VMML370C1P01 cmaeventd[1983]: Hot-plug drive inserted: Port 2I Box 1 Bay 9 of Array Controller in slot 5.
Jan 28 19:19:44 VMML370C1P01 cmaeventd[1983]: Logical drive 1 of Array Controller in slot 5, has changed from status OK to Ready For Rebuild
Jan 28 19:19:44 VMML370C1P01 cmaeventd[1983]: Logical drive 1 of Array Controller in slot 5, has changed from status Ready For Rebuild to Rebuilding
!Jan 28 19:24:44 VMML370C1P01 cmaeventd[1983]: Logical drive 1 of Array Controller in slot 5, has changed from status Rebuilding to OK
Jan 28 19:24:53 VMML370C1P01 cmaidad[2008]: Logical Drive Status Change: Slot 5, Drive: 1. Status is now OK.
Jan 28 19:24:53 VMML370C1P01 cmaidad[2008]: Spare Drive Status Change: Slot 5 Port 1I Box 1 Bay 16. Status is now Inactive.



JTK.
J. T. Kirk
Advisor

Re: Poliant ML370G5 array rebuild problem.

Hi all,
unfortunately i'm not able to solve the problem, i moved the controller from slot 5 to slot 4 but the problem still persists.

When the controller stops working there is the LED ID 1(CR14: Controller lockup LED) on.

Has anyone further suggestions?

Thanks in advance.

JTK.
KarloChacon
Honored Contributor

Re: Poliant ML370G5 array rebuild problem.

hi

did you resolved this issue?

I got 2 workarounds, but first maybe you already fixed it

regards
Didn't your momma teach you to say thanks!
J. T. Kirk
Advisor

Re: Poliant ML370G5 array rebuild problem.

Hello Karlo,

i haven't fixed the problem yet, but i'm 99%sure that it is HW problem on the controller.

I changed the controller with a p400 with 256 MB of cache and the problem did not show

My next step is to ask for replacement of SA Controller, if the solution is confirmed i'll proceed asking for replacement of the other 3 SA Controller.

However i'd like to know about the workaround you mentioned.

Best regards, JTK.
KarloChacon
Honored Contributor

Re: Poliant ML370G5 array rebuild problem.

hi

mmm you have done an important finding already

well this because some days ago someone had a similar issue you're having ... rebuild process did not come up when it should so it's up to you if you want give it a try

I hear two ways but I have not had the time to test second one

first:
power off server
remove HDDs from second controller
Use/reboot with SmartStart CD (SSCD)
go to maintain server and use erase utility with HDDS out.
Power off server
insert HDDs again

second (not tested yet)
power off server
Remove HDDs from second controller
So it is using SSCD again
Go to maintain server
Open Array configuration utility (ACU)
Erase the Array B/Logical Drive.
Power off server
insert HDDs again

regards
Didn't your momma teach you to say thanks!
J. T. Kirk
Advisor

Re: Poliant ML370G5 array rebuild problem.

Hello Karlo, what you said is really interesting, it could avoid to open 4 RMA, i've to find the time to try it on a standby server.
I'll keep you updated.

Best regards, JTK.
J. T. Kirk
Advisor

Re: Poliant ML370G5 array rebuild problem.

Hello Karlo,

i first tried the 2nd solution but without disks inserted it's not possible erase the array configuration.

I tried 1st solution also but the problem was still there.

I'm opening a RMA call.

Thank you very much for your support.

Best regards, JTK.
KarloChacon
Honored Contributor

Re: Poliant ML370G5 array rebuild problem.

hi

that's what I said the second one I had not tested yet.

someone told me that and I did not believe for the same situation you are saying that person was wrong now I can say that

regards
Didn't your momma teach you to say thanks!