MSA Storage
cancel
Showing results for 
Search instead for 
Did you mean: 

MSA 2040 SAN with a LUN failure.

 
LinuxSysAdmin
Occasional Contributor

MSA 2040 SAN with a LUN failure.

Hi,

 

I have a twin-controller MSA 2040 SAN with 16 x 900GB disks in it.

 

I have 3 LUNs, each 1.6TB

 

3 weeks ago I lost all access to one of those LUNs.  The other two were fine.

 

At the time, I had other live apps running on the other LUNs and took the decision to restore the affected VMs from backup onto the remaining LUNs rather than risk the 2 good LUNS with attempts to reboot controllers etc.

 

I reported the issue to HP and they eventually told me I was using an unsupported OS (I am running XenServer on 3 DL380s, direct connected to the MSA 2040 with 10GB SFP+ cables).

 

I sent them the MSA 2040 logs but all they said was "not supported".

 

So, I sacrificed one of my DL380s and built RHEL6 on it - and, just as I expected, I was unable to see the faulty LUN from RHEL either.  The other two were fine.

 

Sent the logs again, and an sosreport from the rhel box and haven't heard a peep out of HP support for  4 days now.

 

I really would like to understand what has failed on my MSA 2040.

 

The only unique thing about the failed LUN is that it is the only one of the 3 that is in Disk Group B which I believe uses Controller B as primary, and so one might think that controller B has failed but I thought that a dual controller device would fail over to Controller A in such a case.

 

Also, when I do "show controller-stat" on the MSA 2040 I see 0 IOPS on Controller A and lots of IOPS on Controller B.

 

On the MSA 2040 SMU GUI I also see green icons on the ports on controller B, as opposed to grey italicised icons on the corresponding A-ports.  This again would suggest lost of activity on A rather than B, but that doesn't tie up with a failure on controller B.

 

In any case all the health status on teh MSA 2040 is green all the way - no errors at all.

 

The only thing I get is anything that tries to access that LUN (in my case a XenServer using iSCSI, or RHEL via iSCSI) just hangs.

 

Even the sosreport script I ran kept hanging as soon as the script went anywhere near /dev/sdd and /dev/sdh which in my case are the two paths to the failed LUN.

 

Why are HP taking so long over this issue?

 

Why is the MSA 2040 reporting as healthy when clearly it isn't?

 

Which controller do I have an issue with? A or B?

 

How do I get out of this mess, bearing in mind I have other live VMs on the other LUNs, and I don't really want to lose all my data of I can help it.

 

This setup has been running fine since last year.

 

Regards,

Grant

2 REPLIES 2
WarNox
Advisor

Re: MSA 2040 SAN with a LUN failure.

First of all I think you need to find out if any FC ports are down, which it sounds like they are.

Is there a FC switch between the servers and the MSA? And is there 2 ports from each DL plugged into the SAN/switch or just one?

On the MSA run 'show ports' and post the output.

HP should respond within the SLA, depending on your support agreement.
JayGodfrey
Occasional Visitor

Re: MSA 2040 SAN with a LUN failure.

Hey Grant,

 

We have had a very similar issue over the last few months.

 

We have a cluster of 5 Xenservers, and 4 times now we have had 1 of the storage processes fail but not completely, then after a while the second storage processor will fail.

 

After rebooting the storage process's it comes good. I spent a bit of time with HP trying to resolve the issue, but all they said was update to the latest firmware then that xenserver was unsupported.

 

We are currently migrating from XenServer to Hyper-V.

 

My collegues think it is an issue with LVM, and our manager is blaming XenServer.

 

I find it hard to belive that XenServer could cause the SP to fail.

 

I am half way through the migration to Hyper-V and the issue happened again last night and my Hyper-V box also lost access.