MSA Storage

msa1000 strangeness after upgrading to fw 4.48

 
SOLVED
Go to solution
richard stovall
Advisor

msa1000 strangeness after upgrading to fw 4.48

After upgrading from 4.24 to 4.48 I restarted the MSA1000 and lo and behold TWO of the drives indicated failures (bright orange LED on the drives), but the MSA1000 LCD did not report any errors. I powered up the servers and they came up just fine. Diagnostics on the drives indicates that they are operating within spec. and should not be replaced. The single configured array is fine and the 4 configured units are fine.

Tech support has worked with me on the phone and has sent a field engineer to check out the situation, but no one seems to really know what's going on. Here is the output of 'show disks' from the console CLI:

CLI> show disks
box,bay bus,ID Size Speed Units
Disk101 1,01 0,00 72.8 GB 160 MB/s 0, 1, 2, 3
Disk102 1,02 0,01 72.8 GB 160 MB/s 0, 1, 2, 3
Disk103 1,03 0,02 72.8 GB 160 MB/s 0, 1, 2, 3
Disk104 1,04 0,03 72.8 GB 160 MB/s 0, 1, 2, 3
Disk105 1,05 0,04 72.8 GB 160 MB/s 0, 1, 2, 3
Disk107 1,07 0,08 72.8 GB 160 MB/s 0, 1, 2, 3
Disk112 1,12 0,13 72.8 GB 160 MB/s 0, 1, 2, 3
Disk108 1,08 1,00 72.8 GB 160 MB/s 0, 1, 2, 3
Disk109 1,09 1,01 72.8 GB 160 MB/s 0, 1, 2, 3
Disk110 1,10 1,02 72.8 GB 160 MB/s 0, 1, 2, 3
Disk111 1,11 1,03 72.8 GB 160 MB/s 0, 1, 2, 3
Disk112 1,12 1,04 72.8 GB 160 MB/s 0, 1, 2, 3
Disk114 1,14 1,08 72.8 GB 160 MB/s 0, 1, 2, 3
Disk1255 1,255 1,13 72.8 GB 160 MB/s 0, 1, 2, 3


Notice that two disks are indicating that they are in physical bay 12. None of them indicate they are in 6 or 13, and what's up with bay 255? The two disks with the failure lights are actually in bays 6 and 13.

The firmware was upgraded from 4.24 to 4.48. The EMU is 1.86

The field engineer is coming again Tuesday with a replacement backplane, EMU, and controller.

Does any of this make sense? What are the chances of data loss when replacing the above components?

Thanks for any thoughts,

RS
27 REPLIES 27
richard stovall
Advisor

Re: msa1000 strangeness after upgrading to fw 4.48

BTW, the following appears next to each drive in the report from the Array Diagnostic Utility.

"Error occurred reading RIS copy"

Is this going to affect what happens when the engineer starts swapping out components? Should I be concerned?

Thanks,

RS
John Kufrovich
Honored Contributor

Re: msa1000 strangeness after upgrading to fw 4.48

Strange, I haven't seen this problem in my lab.

Give this a try. Power down Servers, and then MSA. reseat drives Disk106, Disk113. Then power on everything.

jk
richard stovall
Advisor

Re: msa1000 strangeness after upgrading to fw 4.48

I tried this while on the phone with tech support the other day. No luck.

They also had me reseat all the drives and the controller. When I did that two OTHER drives showed up as failed and it really did break everything. (That was a scary few minutes.) Reseating those two again solved that problem, but the original problem remains.

Any thoughts about the RIS message?

Thanks,

RS
John Kufrovich
Honored Contributor

Re: msa1000 strangeness after upgrading to fw 4.48

Even after you reseated the drives the report looks the same?

The RIS area, stores your MSA Array,LUN configuration and a couple of other items. Each drive has a copy. I believe we read each drive RIS, just in case someone performed a DTS (Direct to SAN).

when you do a >show unit 0, does the drive show up as failed. Can you upload a >show tech_support

I'm suspecting the EMU.

richard stovall
Advisor

Re: msa1000 strangeness after upgrading to fw 4.48

Yes, the status has not changed at all since the firmware upgrade. The only time it differed was when the two OTHER drives reported as failed and had to be reseated.

I'm asking about the RIS because I don't know whether or not to be concerned that the information is unavailable to the diagnostic tools.

Here is the show unit 0 result:

Unit 0:
In PDLA mode, Unit 0 is Lun 1; In VSA mode, Unit 0 is Lun 0.
Unit Identifier :
Device Identifier : 600805F3-000D4F00-A91B235C-AC1E0013
Cache Status : Enabled
Max Boot Partition: Disabled
Volume Status : VOLUME OK
Parity Init Status: Complete
14 Data Disk(s) used by lun 0:
Disk101: Box 1, Bay 01, (SCSI bus 0, SCSI id 0)
Disk102: Box 1, Bay 02, (SCSI bus 0, SCSI id 1)
Disk103: Box 1, Bay 03, (SCSI bus 0, SCSI id 2)
Disk104: Box 1, Bay 04, (SCSI bus 0, SCSI id 3)
Disk105: Box 1, Bay 05, (SCSI bus 0, SCSI id 4)
Disk112: Box 1, Bay 12, (SCSI bus 0, SCSI id 13)
Disk107: Box 1, Bay 07, (SCSI bus 0, SCSI id 8)
Disk108: Box 1, Bay 08, (SCSI bus 1, SCSI id 0)
Disk109: Box 1, Bay 09, (SCSI bus 1, SCSI id 1)
Disk110: Box 1, Bay 10, (SCSI bus 1, SCSI id 2)
Disk111: Box 1, Bay 11, (SCSI bus 1, SCSI id 3)
Disk112: Box 1, Bay 12, (SCSI bus 1, SCSI id 4)
Disk1255: Box 1, Bay 255, (SCSI bus 1, SCSI id 13)
Disk114: Box 1, Bay 14, (SCSI bus 1, SCSI id 8)
Spare Disk(s) used by lun 0:
No spare drive is designated.
Logical Volume Raid Level: DISTRIBUTED PARITY FAULT TOLERANCE (Raid 5)
stripe_size=16kB
Logical Volume Capacity : 498MB



The show tech_support result is attached.

What are the implications for data loss of replacing the EMU?
John Kufrovich
Honored Contributor

Re: msa1000 strangeness after upgrading to fw 4.48

Richard,

Replacing EMU will not be a problem.

Strange no drive failures are being reported.

at the cli
>locate disk disk105 2
Does the drive LED flash?

Try the next inc drive.

Have you backed up everything?

richard stovall
Advisor

Re: msa1000 strangeness after upgrading to fw 4.48

>Strange no drive failures are being reported.

How so? Do you mean the physical drives themselves or the logical units?

>at the cli locate disk disk105 2
>Does the drive LED flash?

The drive in bay 5 flashes.

Yesterday the field engineer and I identified the drives one by one using the HP Insight Diagnostics tool. The drives are not being reported by that utility as 0-13 or 1-14. They are represented as 4-17. The mapping from this number to the bay # in the shelf is:

4-1
5-2
6-3
7-4
8-5
9-7
10-??? (two drives are solid amber so couldn't tell)
11-8
12-9
13-10
14-11
15-12
16-14
17-??? (two drives are solid amber...)

>Have you backed up everything?

Yes. I do nightly backups but I'm deathly afraid of having to resurrect MS-SQL and Exchange on the same evening. Any good pointers to quick recovery procedures wouldn't be unwelcome...

---

I just spoke with the field engineer who said that the people he has spoken to think it is the backplane. Any thoughts on this -vs- the EMU? Why do you think one over the other?

Something else he said that he wants to try first is to change the read/write % on the controller, forcing a configuration change on it.

I'm dying here not knowing what, if anything, is actually wrong.

In the meantime, thanks for your help.

RS
Anthony Martin_1
Frequent Advisor

Re: msa1000 strangeness after upgrading to fw 4.48

Hi Richard,
It could be worthwhile to check the firmware versions on these 2 disk drives. It is possible that they are down rev and the new firmware doesn't like them too much.

Cheers
Anthony
richard stovall
Advisor

Re: msa1000 strangeness after upgrading to fw 4.48

4 of them are HPB4, the other 10 are HPB3. The HPB4 drives are in bays 1-4.

Thanks for weighing in.

RS