HPE EVA Storage

Strange MSA1500 Problem (VOLUME STATE FAILED, but no broken drives)

 
Thomas Schrettl
New Member

Strange MSA1500 Problem (VOLUME STATE FAILED, but no broken drives)

Hello!

I hope somebody can help me with this strange problem. I am running an

MSA1500 Storage with 14x300GB SCSI drives (RAID 5 over 13 disks with two volumes, 14th is on Hot-Standby),

the Storage is attached to 2 ESX-Servers. This afternoon the ESX-Servers reported that they have lost the connection to volumes on the MSA.

I restarted the MSA and thats what I got:
101 VOLUME STATE #0 FAILED
101 VOLUME STATE #1 FAILED
80 REPLACEMENT DRIVE FOUND BOX #1 BAY 1
80 REPLACEMENT DRIVE FOUND BOX #1 BAY 2

So it seems that 2 drives failed, but...
- I did not touch the storage
- Disks in Bay 1 and 2 are green!
- and if one drive should have failed recently, why is drive in bay 14 (hot-spare) still inactive?

So my tip is that the MSA Controller only "thinks" that these 2 drives where replaced but they are still the original ones with hopefully all its data on it.

Is there any way to "tell" the controller that this are the old disks or to rescan all disks for valid volumes?

I attached my Windows 2003-Server to the MSA and ACU could reactivate the Volumes, but it reports that "all data might be lost", so I stepped away from it.

Isn´t there a "Array Diagonse Util" around? Could this help?

Any tips would be great!!!

Thanks a lot and greetings from Tyrol
Tom



9 REPLIES 9
Craig_83
Frequent Advisor

Re: Strange MSA1500 Problem (VOLUME STATE FAILED, but no broken drives)

Hi Tom,

What firmware are you running on the MSA?

You can download the Array Diagnostic Utility (ADU) here:
http://h18023.www1.hp.com/support/files/server/us/download/27531.html?jumpid=reg_R1002_USEN

If you connect a serial cable to the controllers and issue a "show eventlog", what does it return?

Craig
Thomas Schrettl
New Member

Re: Strange MSA1500 Problem (VOLUME STATE FAILED, but no broken drives)

Hi!

I´m running
MSA1500 Firmware Revision: 5.10b414 (SGA065201N)
MSA1500 Hardware Revision: a [AutoRev: 0x030000]

Eventlog says:
CLI> show eventlog
TIME CLASS SUB DETL MESSAGE
00000056 0001 0004 0000 Controller removed: 0
00000056 0005 0000 0001 Media exchanged detected on volume: 0
00000056 0005 0000 0001 Media exchanged detected on volume: 1
00000056 0005 0000 0001 Media exchanged detected on volume: 0
00000056 0005 0000 0001 Media exchanged detected on volume: 1

Best Regards
Thomas
Thomas Schrettl
New Member

Re: Strange MSA1500 Problem (VOLUME STATE FAILED, but no broken drives)

Hello all!

I´m happy to tell you that everything works fine again. For all you that interessted how, here´s the answer.

1) Today I had a look into my serverlog and I realized that two (unimportant which I do not monitored had a power loss at the same time the ESX-Servers lost connection.

2) I checked my support-contract and I still had support so the first thing I did was calling HP-Support. After sending them a few Console-Output (e.g. show tech_support). The phoned back and told me that after a powerloss errormessages like described above can ocore and you can ignore them in this case.

So I tried a "Reactivate" in the ACU and 5 seconds later everything worked again! No rebuild needed, it just worked. After reactivating my ESX-Cluster everything looked good and so I think the problem is solved!

Thanks to anybody for helping!

Best Regards
Thomas

----
btw: How to set this thread to "solved"?
John Kufrovich
Honored Contributor

Re: Strange MSA1500 Problem (VOLUME STATE FAILED, but no broken drives)

Tom,
Did you hot remove the controller?

You must properly shutdown the controller before removing.

jk
Uwe Zessin
Honored Contributor

Re: Strange MSA1500 Problem (VOLUME STATE FAILED, but no broken drives)

And how on earth does one do that?

I've checked the latest CLI guide I could find, but it does not seem to contain the string "shutdown".
.
John Kufrovich
Honored Contributor

Re: Strange MSA1500 Problem (VOLUME STATE FAILED, but no broken drives)

I don't have a system in front of me but you do it through ACU, highlight the MSA controller. It may be in the upper right pane or under Advance features.

or MSA CLI, disable standby

jk
Koen Dooms
New Member

Re: Strange MSA1500 Problem (VOLUME STATE FAILED, but no broken drives)

You can also just power down the msa1000 with the power button, but in that case make sure all other attached storage or servers are shutdown "first",the msa won't have diskaccess anymore that way and it can safely handle a cold power off without cli interface.

Already had two times a similar problem with volumes being disabled in ACU , this happened after a general power failure,seems the MSA's can't always handle this.
In both cases I reenabled the volumes in the acu , got the same warning about probable data loss, but just ignore it like you did, volumes come back online properly without data loss.In one case however there was datacorruption on some files,these were databasefiles which were open when the problem occured , so always perform a checkdisk after these kind of problems because some datacorruption might only show up after some weeks !
Ravi kumar raju
Advisor

Re: Strange MSA1500 Problem (VOLUME STATE FAILED, but no broken drives)

HI,

I am facing same issue, but I donâ t have ACU in my lab. How do I solve this issue from CLI prompt?

Thanks,
Ravi kumar.R
Thomas Schrettl
New Member

Re: Strange MSA1500 Problem (VOLUME STATE FAILED, but no broken drives)

Hi Ravi!

Check out the CLI Manual, e.g.:
http://docs.hp.com/en/9320/acu.pdf

On Page 53 there is descriped how to "Re-enabled" a failed logic device.

That should do the same like the reactivate in the ACU. But no guarantee ;)

Best Regards
Tom