ProLiant Servers (ML,DL,SL)
1748036 Members
4445 Online
108757 Solutions
New Discussion юеВ

Re: Poliant ML370G5 array rebuild problem.

 
J. T. Kirk
Advisor

Poliant ML370G5 array rebuild problem.

Hi all,
i'm experiencing a weird problem on 4 servers as in subject.

In each server there are 2 controllers SmartArray P400 with 512 MB of cache, the controller in slot 1 manages drives from 1 to 8 of the first cage (140GB 2.5" SAS 10K), the controller in slot 5 manages drives from 9 to16 of the second cage (140GB 2.5" SAS 10K).
On the first controller there is configured array A (raid 5 of 7 disks plus spare) with 2 logical drives, on the second controller there is configured array B (raid 5 of 7 disks plus spare) with 1 logical drive.

If i detach a drive on the first cage the spare disk comes online and the rebuild begins and ends correctly.

If i detach a drive on the second cage the spare disk doesn't come online and the server sops responding via O.S. forcing me to power it down. When i power it on again the automatic rebuild starts and ends correctly.

All the server have been updated to the latest firmware available with Maintenance CD 7.91.

Thanks in advance.
13 REPLIES 13
KarloChacon
Honored Contributor

Re: Poliant ML370G5 array rebuild problem.

hi J. T. Kirk

yes I would say yes it's weird in 4 servers same condition.

all of them are Windows or linux? mix?

let me ask you which firmware you have in those controller? 4.06 or 4.12?

I supposed you have the latest drivers also? 7.91 also?

regards
Didn't your momma teach you to say thanks!
J. T. Kirk
Advisor

Re: Poliant ML370G5 array rebuild problem.

Hello Karlo,

on these 4 server the O.S. running is VMWare ESX 3.0.2 with HP MGTM Agents 7.9.0.

The firmware on the controller is the one included in HP Firmware maintenance cd 7.9.1 (4.06).

Seeing that you wrote 4.12 i searched more deeply and i found the additional 4.12 firmware for the maintenance CD, i think i'll try the update.

Thanks for the suggestion.

I'll keep you updated on the results.

Best regards, JTK.
J. T. Kirk
Advisor

Re: Poliant ML370G5 array rebuild problem.

:-(

Hi all,

unfortunately the problem still persist even with P400 firmware 4.12, same behaviour, the array B is rebuilded automatically only after the powerdown and powerup of the server.

J.T.K.
KarloChacon
Honored Contributor

Re: Poliant ML370G5 array rebuild problem.

hi

I really dont know about VMware but let me ask
is there any error logged when that scenario takes place?

Regards
Didn't your momma teach you to say thanks!
J. T. Kirk
Advisor

Re: Poliant ML370G5 array rebuild problem.

Hello Karlo,

the only log i have is this:

[root@VMML370C1P01 log]# cat messages | grep cmai
Jan 27 20:10:45 VMML370C1P01 hpasm: Shutting down Storage Agents (cmastor): cmaeventd cmaidad cmafcad cmaided cmascsid cmasasd
Jan 27 20:10:46 VMML370C1P01 hpasm: Shutting down IDA agent (cmaidad):
Jan 27 20:10:48 VMML370C1P01 hpasm: Shutting down IDE agent (cmaided):
Jan 27 20:17:01 VMML370C1P01 hpasm: Starting Storage Agents (cmastor): cmaeventd cmaidad cmafcad cmaided cmascsid cmasasd
Jan 28 09:51:15 VMML370C1P01 hpasm: Starting Storage Agents (cmastor): cmaeventd cmaidad cmafcad cmaided cmascsid cmasasd
Jan 28 09:51:20 VMML370C1P01 cmaidad[2101]: Logical Drive Status Change: Slot 5, Drive: 1. Status is now Rebuilding.
Jan 28 09:51:20 VMML370C1P01 cmaidad[2101]: Physical Drive Status Change: Slot 5 Port 1I Box 1 Bay 14. Status is now Failed.
Jan 28 09:55:08 VMML370C1P01 cmaidad[2101]: Logical Drive Status Change: Slot 5, Drive: 1. Status is now OK.
Jan 28 09:55:08 VMML370C1P01 cmaidad[2101]: Spare Drive Status Change: Slot 5 Port 1I Box 1 Bay 16. Status is now Active.
Jan 28 10:01:24 VMML370C1P01 cmaidad[2101]: Logical Drive Status Change: Slot 5, Drive: 1. Status is now Rebuilding.
Jan 28 10:01:24 VMML370C1P01 cmaidad[2101]: Physical Drive Status Change: Slot 5 Port 1I Box 1 Bay 14. Status is now OK.
Jan 28 10:06:28 VMML370C1P01 cmaidad[2101]: Logical Drive Status Change: Slot 5, Drive: 1. Status is now OK.
Jan 28 10:06:28 VMML370C1P01 cmaidad[2101]: Spare Drive Status Change: Slot 5 Port 1I Box 1 Bay 16. Status is now Inactive.
Jan 28 10:13:20 VMML370C1P01 hpasm: Starting Storage Agents (cmastor): cmaeventd cmaidad cmafcad cmaided cmascsid cmasasd
Jan 28 10:13:26 VMML370C1P01 cmaidad[2099]: Logical Drive Status Change: Slot 5, Drive: 1. Status is now Rebuilding.
Jan 28 10:13:26 VMML370C1P01 cmaidad[2099]: Physical Drive Status Change: Slot 5 Port 2I Box 1 Bay 9. Status is now Failed.
Jan 28 10:17:14 VMML370C1P01 cmaidad[2099]: Logical Drive Status Change: Slot 5, Drive: 1. Status is now OK.
Jan 28 10:17:14 VMML370C1P01 cmaidad[2099]: Spare Drive Status Change: Slot 5 Port 1I Box 1 Bay 16. Status is now Active.
Jan 28 10:19:14 VMML370C1P01 cmaidad[2099]: Logical Drive Status Change: Slot 5, Drive: 1. Status is now Rebuilding.
Jan 28 10:19:15 VMML370C1P01 cmaidad[2099]: Physical Drive Status Change: Slot 5 Port 2I Box 1 Bay 9. Status is now OK.
Jan 28 10:24:18 VMML370C1P01 cmaidad[2099]: Logical Drive Status Change: Slot 5, Drive: 1. Status is now OK.
Jan 28 10:24:18 VMML370C1P01 cmaidad[2099]: Spare Drive Status Change: Slot 5 Port 1I Box 1 Bay 16. Status is now Inactive.
[root@VMML370C1P01 log]#

As you can see the log starts with the rebuild and not with the failure of the disk because as soon as i unplug one disk from the second cage the server locks up.

I'll try even the upgrade of the BIOS of the server which is not the last one, i'll keep you updated.

Best Regards, JTK.
J. T. Kirk
Advisor

Re: Poliant ML370G5 array rebuild problem.

Sad update.
Flashed the BIOS to varsion 11/13/2007.
First rebuild after reboot runs correctly, after it finishes, if i unplug a whatever drive from cage 2 locksup the server and no logs are produced.

Jan 28 19:07:59 VMML370C1P01 cmaidad[2008]: Logical Drive Status Change: Slot 5, Drive: 1. Status is now Rebuilding.
Jan 28 19:08:00 VMML370C1P01 cmaidad[2008]: Physical Drive Status Change: Slot 5 Port 2I Box 1 Bay 9. Status is now Failed.
Jan 28 19:08:00 VMML370C1P01 cmaidad[2008]: Spare Drive Status Change: Slot 5 Port 1I Box 1 Bay 16. Status is now Building.
Jan 28 19:08:09 VMML370C1P01 cmaeventd[1983]: Hot-plug drive removed: Port 2I Box 1 Bay 9 of Array Controller in slot 5.
Jan 28 19:08:09 VMML370C1P01 cmaeventd[1983]: Physical drive failed: Port 2I Box 1 Bay 9 of Array Controller in slot 5.
Jan 28 19:08:12 VMML370C1P01 cmaeventd[1983]: Logical drive 1 of Array Controller in slot 5, has changed from status OK to Interim Recovery
Jan 28 19:08:12 VMML370C1P01 cmaeventd[1983]: Logical drive 1 of Array Controller in slot 5, has changed from status Interim Recovery to Ready For Rebuild
Jan 28 19:08:12 VMML370C1P01 cmaeventd[1983]: Logical drive 1 of Array Controller in slot 5, has changed from status Ready For Rebuild to Rebuilding
!

Jan 28 19:13:04 VMML370C1P01 cmaidad[2008]: Logical Drive Status Change: Slot 5, Drive: 1. Status is now OK.
Jan 28 19:13:04 VMML370C1P01 cmaidad[2008]: Spare Drive Status Change: Slot 5 Port 1I Box 1 Bay 16. Status is now Active.
Jan 28 19:13:12 VMML370C1P01 cmaeventd[1983]: Logical drive 1 of Array Controller in slot 5, has changed from status Rebuilding to OK
!Jan 28 19:19:34 VMML370C1P01 cmaidad[2008]: Logical Drive Status Change: Slot 5, Drive: 1. Status is now Rebuilding.
Jan 28 19:19:34 VMML370C1P01 cmaidad[2008]: Physical Drive Status Change: Slot 5 Port 2I Box 1 Bay 9. Status is now OK.
Jan 28 19:19:42 VMML370C1P01 cmaeventd[1983]: Hot-plug drive inserted: Port 2I Box 1 Bay 9 of Array Controller in slot 5.
Jan 28 19:19:44 VMML370C1P01 cmaeventd[1983]: Logical drive 1 of Array Controller in slot 5, has changed from status OK to Ready For Rebuild
Jan 28 19:19:44 VMML370C1P01 cmaeventd[1983]: Logical drive 1 of Array Controller in slot 5, has changed from status Ready For Rebuild to Rebuilding
!Jan 28 19:24:44 VMML370C1P01 cmaeventd[1983]: Logical drive 1 of Array Controller in slot 5, has changed from status Rebuilding to OK
Jan 28 19:24:53 VMML370C1P01 cmaidad[2008]: Logical Drive Status Change: Slot 5, Drive: 1. Status is now OK.
Jan 28 19:24:53 VMML370C1P01 cmaidad[2008]: Spare Drive Status Change: Slot 5 Port 1I Box 1 Bay 16. Status is now Inactive.



JTK.
J. T. Kirk
Advisor

Re: Poliant ML370G5 array rebuild problem.

Hi all,
unfortunately i'm not able to solve the problem, i moved the controller from slot 5 to slot 4 but the problem still persists.

When the controller stops working there is the LED ID 1(CR14: Controller lockup LED) on.

Has anyone further suggestions?

Thanks in advance.

JTK.
KarloChacon
Honored Contributor

Re: Poliant ML370G5 array rebuild problem.

hi

did you resolved this issue?

I got 2 workarounds, but first maybe you already fixed it

regards
Didn't your momma teach you to say thanks!
J. T. Kirk
Advisor

Re: Poliant ML370G5 array rebuild problem.

Hello Karlo,

i haven't fixed the problem yet, but i'm 99%sure that it is HW problem on the controller.

I changed the controller with a p400 with 256 MB of cache and the problem did not show

My next step is to ask for replacement of SA Controller, if the solution is confirmed i'll proceed asking for replacement of the other 3 SA Controller.

However i'd like to know about the workaround you mentioned.

Best regards, JTK.