ProLiant Servers (ML,DL,SL)

Migration of RAID 5 leads to misidentification and errors

 
krisd443
Advisor

Migration of RAID 5 leads to misidentification and errors

Howdy,

We've been having trouble with a FreeNAS server and thought we'd swap it for a more powerful option. So we moved from a DL380 with a 4-core Xeon Silver processor to an 8-core EPYC 7262 processor. Once in the new device, it only detects 2 out of 3 disks from a RAID 5 array, not claiming that one is bad but that somehow the RAID 5 array it found only has 2 disks. It complained about something wrong with the write cache, and apparently disabled settings. It behaved better when we swapped the original P408i-a controller in, at first (in that it booted) but it crashed within a day.

It's not letting us add a drive or a spare, saying that the "array is transforming". It also claims that there is a physical drive in a nonsense position while detecting all 8 drives. Everything else works but this is where the production server is booting from. we have tried lots of things and nothing has worked - we are locked out of almost anything other than building an array which I have doubts about migrating data from to begin with.

Anyone run across something like this?

Output from SSACLI commands:

~$ sudo ssacli ctrl slot=0 ld 1 add drives=2I:1:3
Error: This operation is not supported with the current configuration. Use the 
       "show" command on devices to show additional details about the
       configuration.
       Reason: Array is transforming
4:58
~$ sudo ssacli ctrl slot=0 ld 1 add spares=2I:1:3
Error: The array status is NOT okay. Cannot perform operation. Spare is not
       allowed.
4:58
The problem is it still thinking there's a disk that doesn't exist:
~$ sudo ssacli ctrl slot=0 pd all show status
   physicaldrive 0:ATTR_VALUE_BAY_UNKNOWN (box 0:bay ATTR_VALUE_BAY_UNKNOWN, 0 GB): Failed
   physicaldrive 2I:1:1 (port 2I:box 1:bay 1, 600 GB): OK
   physicaldrive 2I:1:2 (port 2I:box 1:bay 2, 600 GB): OK
   physicaldrive 1I:2:1 (port 1I:box 2:bay 1, 600 GB): OK
   physicaldrive 1I:2:2 (port 1I:box 2:bay 2, 600 GB): OK
   physicaldrive 1I:2:3 (port 1I:box 2:bay 3, 600 GB): OK
   physicaldrive 1I:2:4 (port 1I:box 2:bay 4, 600 GB): OK
   physicaldrive 2I:1:3 (port 2I:box 1:bay 3, 600 GB): OK
   physicaldrive 2I:1:4 (port 2I:box 1:bay 4, 600 GB): OK
7 REPLIES 7
Suman_1978
HPE Pro

Re: Migration of RAID 5 leads to misidentification and errors

Hi,

For this error message "Array is transforming"
Here is the reason:
The user requested too many simultaneous changes. For example, the user added new disks to an array (expand array) and changed the size or RAID level of logical volumes on the array. The solution is for the user to wait until the array transformation is complete.

For this error, ATTR_VALUE_BAY_UNKNOWN, it could be a loose connection/cable/cage.
https://community.hpe.com/t5/ProLiant-Servers-ML-DL-SL/SFF-Drive-Cage-installation/td-p/4629637

Thank You!
https://support.hpe.com/hpesc/public/home


I work for HPE

Accept or Kudo

krisd443
Advisor

Re: Migration of RAID 5 leads to misidentification and errors

Hi,

Is there any way I can find out about this transformation? Right now, it looks like it's doing nothing other than not working right.

We haven't requested any changes, just moved the drives to a new machine with the same model of controller (something that is supposed to be doable between any variation of SmartArray controllers by my understanding). If moving the disks is requesting a change, then we have one.

I don't understand how a controller can detect and operate a RAID 5 array with two disks (hacving had 3 without reported issues previously), and not allow any changes at all, nor show any sign of improvement after 4 days.

If there is some scenario in which the controller will suddenly detect the presence of the array on the third disk or allow one so we have a proper RAID 5 array, I would like to know about it/make it happen.

In the meantime, any information that might help us find out how/why this is happening or to check the state of this transformation process, we would very much appreciate it.

krisd443
Advisor

Re: Migration of RAID 5 leads to misidentification and errors

The result of triying the heal command (we have no idea how to make the cache status okay and we've tried 2 different controllers, old and new):

~$ sudo ssacli ctrl slot=0 array A heal drives=2I:1:3
Error: This operation is not supported with the current configuration. Use the 
       "show" command on devices to show additional details about the
       configuration.
       Reason: Cache status not ok
krisd443
Advisor

Re: Migration of RAID 5 leads to misidentification and errors

Is there any possibility that there is a bug or bias in the way RAID arrays are detected?

I ask this because the array was in drives in box 2, bays 1 and 2 and box 3, bay 1. I think this is correct but we moved from an 8LFF to a 12LFF model and the drive layout was different but we tried to match things up by position and then by putting the disks in box 1, bay 1 and 2 and box 2, bay 1 - and I'm not 100% sure that the upper left regular hotswap bay on an 8LFF model is box 2. I was thinking that it was and that the upper space was still box 1, for optical or maybe u2 NVME drives or similar. Is that correct? We had 5 unprovisioned disks that moved with them and kept things in the same overall order, but were under the impression that the Smart Array scanned the disks and could tell the difference.

If that makes a difference and there is a bug that allows arrays to be built one way and then might misdetect them if they are moved in the same configuration, we'd like to know. We haven't seen a recommended way to build arrays and the array building setup allows any disks that can become an array.

hunter86_bg
Frequent Advisor

Re: Migration of RAID 5 leads to misidentification and errors

Can you summarize the situation ?


You used a 8drivebox and later moved the disks to a 12 drive box and also moved to another controller ?
Now you got a raid5 array currently in transformation and it doesn't let you expand but you have vital information on it ?

I would start the troubleshooting by updating all firmware for that DL including drives and controller. Of course a proper backup is highly recommended.

Then I would just wipe and start over, if that is an option.
sudhirsingh
HPE Pro

Re: Migration of RAID 5 leads to misidentification and errors

Hi,

I would like to see the ADU log if you can share ?

Meanwhile, if cache is disabled/failed please try to fix it as its important for migration/expansion activities.

Seems like there is some communication issue between controller and drive, will try to check in log if we get something.

 

Regards,

Sudhir

I work for HPE
Accept or Kudo
krisd443
Advisor

Re: Migration of RAID 5 leads to misidentification and errors

We ended up giving up on saving the data and rebuilt FreeNAS (fortunately we had a configuration backup or it would have been a much longer job). So far, so good after killing the old array and building a new one.