HPE 3PAR StoreServ Storage
1748151 Members
3787 Online
108758 Solutions
New Discussion юеВ

3PAR 8400 :2 disks failed at the same moment (less than one second)

 
ehubpoi
Visitor

3PAR 8400 :2 disks failed at the same moment (less than one second)

Hi,
During the upgrade of a Disk Array (SP and OS Upgrade (from 3.2.2 ==> 3.3.1 MU5)), 2 SDDs disks became "degraded" (ID = [1, 2]) suddenly in the same Second !!. All were working fine before. I even stressed the disk array the week before, executing the "compactcpg" command on all the CPG, 5 times, to reclaim more or less 12 Tb and no sign noticed !!
Now the disk array is trying to relocate the failed chunklets of the 2 SDD disks, but it takes too much time.
there are still spare blocs available.

If I execute a "servicemag status", figures doesn't evoluate !!!

Cage 0, magazine 1:
The magazine is being brought offline due to a servicemag start.
The last status update was at Fri Jan 28 16:56:54 2022.
Chunklets relocated: 826 in 7 hours, 27 minutes and 27 seconds
Chunklets remaining: 698
Chunklets marked for moving: 9
Estimated time for relocation completion based on 32 seconds per chunklet is: 4 minutes and 48 seconds
The cumulative output so far is:
servicemag start -pdid 1
... servicing disks in mag: 0 1
... normal disks:
... not normal disks: WWN [50011731009082E4] Id [ 1] diskpos [0]
... relocating chunklets to spare space...
... 713 chunklets - move_error,disk_relocating, will retry
... 709 chunklets - move_error,disk_relocating, will retry
... 708 chunklets - move_error,disk_relocating, will retry
... 706 chunklets - move_error,disk_relocating, will retry
... 698 chunklets - move_error,disk_relocating, will retry

Cage 0, magazine 2:
The magazine is being brought offline due to a servicemag start.
The last status update was at Fri Jan 28 11:58:18 2022.
Chunklets relocated: 721 in 7 hours, 27 minutes and 22 seconds
Chunklets remaining: 796
Chunklets marked for moving: 592
Estimated time for relocation completion based on 37 seconds per chunklet is: 6 hours, 5 minutes and 4 seconds
The cumulative output so far is:
servicemag start -pdid 2
... servicing disks in mag: 0 2
... normal disks:
... not normal disks: WWN [500117310090745C] Id [ 2] diskpos [0]
... relocating chunklets to spare space...
... 805 chunklets - move_error,disk_relocating, will retry

Is there a way to speed up this activity as all the data seems corrupted ( this disk storage is dedicated to Non Production environments, hosting more or less 300VMs. Most of them are unusable at the moment! all is freezed !! ). More or less, 28 Virtual Volumes are affected !!!
I cant understand why 2 disks degraded put the disk array in an unsuble state !!!

Any help would be greatly appreciated

Thanks

 

8 REPLIES 8
support_s
System Recommended

Query: 3PAR 8400 :2 disks failed at the same moment (less than one second)

System recommended content:

1. HPE StorageWorks 6400/8400 Enterprise Virtual Array - Fibre Channel Disk Drive Replacement

 

If the above information is helpful, then please click on "Thumbs Up/Kudo" icon.

 

Thank you for being a HPE community member.


Accept or Kudo

ehubpoi
Visitor

Re: Query: 3PAR 8400 :2 disks failed at the same moment (less than one second)

Hi,

Sorry I wasn't too accurate . Our disk Array is an HPe 3Par 8400 StoreServe.
SO your document is not really helpful ...

Rgds

hubert

veeyarvi
HPE Pro

Re: Query: 3PAR 8400 :2 disks failed at the same moment (less than one second)

Hi Hubert,

 

Unfortunately, there is no clear way to speed up the process (and I assume the servicemag start is over by now). But, two disks failing at the same time is not very common and might need to be looked in. Did you get a chance to log with HPE support to investigate this?

Regards,

Veeyaarvi


I am an HPE Employee.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
ehubpoi
Visitor

Re: Query: 3PAR 8400 :2 disks failed at the same moment (less than one second)

Hi,

Thanks for your reply.
We already log a Case at the Support, but we are not convinced that it will restore the Array functionality. as it xas before .
and we dont understand why 2 disks failed at the exact same time !!! and why such kind of problem leads to a total disk array unavailability !!!! we query ourself if it would not be careful to change our disk arrays, as we have four times the same disk array, to serve different needs, different environments, and if the same issue would appear in Production, it would freeze all the Telecom business of our customer !!!
The "servicemag" process is so slow, trying to rebuild bloc after bloc 45 chunklets during 24hrs !!

No tricks to speed up this process ?

We added new disks to help to restore the failed disks and morever, potentialy the service.

thanks for your advice.
Rgds//Hubert

veeyarvi
HPE Pro

Re: Query: 3PAR 8400 :2 disks failed at the same moment (less than one second)

Hi Hubert,

 

I think the support definitely will be able to answer the question "why two disks failed at the same time". And if they are not from the same RAID set, there will not be any data access issues as well. 
If the chunklet movement is slow, that means, there could be media defect but without checking the logs cannot be confirmed. Again, the support might be able to help with that too.

Regards,

Veeyaarvi


I am an HPE Employee.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
ehubpoi
Visitor

Re: Query: 3PAR 8400 :2 disks failed at the same moment (less than one second)

Hi,

Most of the volumes impacted belongs to the same CPG, same RAID5.
The support is managed by A HPe company partner. and I guess they are fully competent on this topics .
At the moment, the "servicemag" process failed and we are obliged to remove some VV to try to unblock the situation, to avoid to restore all the VMs hosted on this disk array (around 350).
Moving blocks manually failed as well.
if you would have any good idea, we would appr├йciate .

Thanks in advance.
Rgds

hubert

veeyarvi
HPE Pro

Re: Query: 3PAR 8400 :2 disks failed at the same moment (less than one second)

Hi Hubert,

If two disks failed in the same RAID, we will defenitely have issues. The best possible action is to try getting at least either of the failed disks back online (depending on the condition of the drives - need to check the logs to determine which one would be the potential candidate for bringing back online). 

Please suggest the partner to contact HPE support if they are unsure of the steps and methods.

Regards, 
Veeyaarvi


I am an HPE Employee.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
ehubpoi
Visitor

Re: Query: 3PAR 8400 :2 disks failed at the same moment (less than one second)

Hi,

I could extract the Log.
Can you tell me how to extract these log files regarding the disk failure?
Thanks in advance
Rgds