Re: msa1000 strangeness after upgrading to fw 4.48

richard stovall · ‎06-27-2006

My understanding is that the field engineer is going to bring the following:

EMU
Backplane
MSA1000 Controller

Can all of these be replaced in one operation? Any thoughts about which is most likely the culprit?

Where is the array information stored? On the drives and on the controller, right? Is it possible to 'backup' this information somehow?

Thanks again for all the assistance.

RS

John Kufrovich · ‎06-27-2006

The array information is stored on the drives and in controller nvram.

The EMU controls environmental status plus handles some drive events. Example hot remove and hot add.

Look at the show tech_support, your LUNs are not showing a drive fault of any kind. If you experience a drive fault, the would be line added to the faulty disk in your LUN.

Just as a safety precaution, locate the cpqacuxe.exe or hpacuxe.exe. At a cmd prompt, under the directory of the executible. issue the cmd.
cpqacuxe -c
This will capture your servers Array configuration plus the MSA's. The file created will be acucapt.ini
cpqacuxe -h will popup a help screen.
If needed you can edit out the embedded Smart Array configuration information, leaving the MSA information. execute, cpqacuxe -i acucapt.ini This will recreate your configuration back to the original state.

It's difficult to say exactly if it is the backplane or the EMU. The only thing I can think of on the backplane that could cause something like this is perhaps one of the resistors in a RNET, used for the SCSI terminators either opened or shorted with another resistor in the RNET.

Tomorrow I'll do a little more digging. I would be interested in capturing this equipment.

richard stovall · ‎06-28-2006

>Look at the show tech_support, your LUNs >are not showing a drive fault of any kind. >If you experience a drive fault, the would >be line added to the faulty disk in your >LUN.

This how I interpret that information. Several suggestions have been made to pull one of the affected drives and let the array rebuild. It doesn't seem that this is likely to be a solution. In fact, it seems that this could be counterproductive given the extremely long rebuild time (?? hours) when the overall hardware state is supspect.

I have backed up the array configuration information from all 4 servers accessing this SAN using a local instance of cpqacuexe.exe on each server. Can you give me the 10 cent rundown on how this might be useful in case something goes awry during the equipment swap?

>Tomorrow I'll do a little more digging. I >would be interested in capturing this >equipment.

Thanks. I would be interested in sending it to you. Do you or I have to do anything to redirect it from the normal process once the faulty hardware has been identified?

RS

PS Just to make sure I understand, say for instance we brought in a completely new MSA1000 and populated it with our current drives. Theoretically that would start up and be usable, right? Sorry for all the questions. I'm new to HP land (though customer service like this is making me glad I came over!)

John Kufrovich · ‎06-28-2006

I was going to suggest hot remove and insert but your reporting two faulty drives in a RAID 5 configuration and didn't want to take any chances with your data.

Eventhough you have two drives with amber lights. Your system is still up and functional, yes?

If you do try the hot remove and insert, move your rebuild priority to medium or high.

richard stovall · ‎06-28-2006

Yes. The system is actually working with no degradation or loss of performance.

I really don't want to pull a drive if we don't have to.

Regarding the theory of how the device operates, what would happen if I took all of those drives and put them into a different MSA1000? Would the array be retained? I'm not asking because I want a new MSA1000. I truly just want to understand how far the system is designed to go before losing the data.

Thanks,

RS

John Kufrovich · ‎06-28-2006

This is a long shot.
Any chance you pulled the controller and reinserted it. If so, could you have bent a pin.

If able, power down everything and pull the controller. Look at the backplane connector for bent pins or look at the controller connector for messed up holes.

Still stumped.

By having the RIS data on the drives you can take a LUN and move it to another Smart Array device and still perserve everything. There are some backwards compatibility with older SA controllers.

richard stovall · ‎06-28-2006

The controller was removed and reinserted, but only after this problem arose. I did this at the request of the inital phone support technician.

>By having the RIS data on the drives you can take a LUN and move it to another Smart Array device and still perserve everything.

This is why I asked about the "Error occurred reading RIS copy" messages. Does this mean that the array information on the drives is corrupted? If that's the case, should the swap of the controller only be as a last resort?

Here is a snippet from the ADU report. The entire report is attached.

SLOT 2 (ID 65536) MSA1000 Array Controller ERROR REPORT:

Error occurred reading RIS copy from SCSI Port 1 Drive ID 0
Error occurred reading RIS copy from SCSI Port 1 Drive ID 1
Error occurred reading RIS copy from SCSI Port 1 Drive ID 2
Error occurred reading RIS copy from SCSI Port 1 Drive ID 3
Error occurred reading RIS copy from SCSI Port 1 Drive ID 4
Error occurred reading RIS copy from SCSI Port 1 Drive ID 8
Error occurred reading RIS copy from SCSI Port 1 Drive ID 13
Error occurred reading RIS copy from SCSI Port 2 Drive ID 0
Error occurred reading RIS copy from SCSI Port 2 Drive ID 1
Error occurred reading RIS copy from SCSI Port 2 Drive ID 2
Error occurred reading RIS copy from SCSI Port 2 Drive ID 3
Error occurred reading RIS copy from SCSI Port 2 Drive ID 4
Error occurred reading RIS copy from SCSI Port 2 Drive ID 8
Error occurred reading RIS copy from SCSI Port 2 Drive ID 13

What would make the RIS copy of each drive unreadable? Does this point to a specific piece of hardware such as the EMU?

Thanks for working so hard on this.

RS

richard stovall · ‎06-28-2006

Any more thoughts? It is 5PM EDT and the field engineer is due to arrive in a few hours with a handful of parts.

Thanks,

RS

John Kufrovich · ‎06-28-2006

The MSA is able to read a RIS from somewhere because you still have you LUNs intact.

I've looked over the backplane schmetic. If it was a backplane issue, I would expect other problems. Especially accessing your LUNs.

Tell the FE, that I work in the MSA development. Myself and a FW engineer would like to get our hands on the faulty component.

richard stovall · ‎06-28-2006

OK. We'll start with the EMU, then the controller, then the backplane.

The first thing he wants to try is to change the read/write cache distribution, so we'll do that then start on the hardware.

Wish us luck...

RS

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: msa1000 strangeness after upgrading to fw 4.48