Disk Enclosures
cancel
Showing results for 
Search instead for 
Did you mean: 

EVA4400 lost comms to storage cell

MattR
Occasional Contributor

EVA4400 lost comms to storage cell

Hi all

I've had this issue twice now in the space of 3 weeks. The EVA drops comms to the array and I get this message:

This instance of Command View EVA has lost communication with the storage cell

On both occasions Controller 2 logs errors on one of the drives (always bay 4) and a few seconds later the entire array cannot be contacted on either controller.

Restarting the EVA doesn't restore communication. I need to power off the drive enclosure for it to all starting working again.

The failed drive light isn't showing on physical drive and CommandView shows the drive as healthy.

Could this really be just a bad drive that is somehow causing the entire array to lose communcation?

Thanks
Matt
11 REPLIES
Uwe Zessin
Honored Contributor

Re: EVA4400 lost comms to storage cell

No, a bad disk drive _should_ be bypassed by the enclosure's I/O module. Many problems I have seen are in the EVA-4400 I/O module (firmware).

Open a call with HP and let them analyze the system. Maybe your EVA is running a _very_ old firmware and the problem can be fixed / reduced a bit by a software upgrade.
.
Víctor Cespón
Honored Contributor

Re: EVA4400 lost comms to storage cell

Do you lose access from Command View only, or the servers also lose access to their vdisks?

If you do a hardware rescan on a server or reboot it, does it keep seeing the disks?

If you have a recent management module firmware, can you manage the array from the internal Command View?
MattR
Occasional Contributor

Re: EVA4400 lost comms to storage cell

The servers cannot access the vdisks (two VMWare ESX hosts).

The first time this happened I rebooted both servers and they came back up without the LUNs on the EVA. It wasn't until I powered off the drive enclosure and CV could see the storage was I then able to rescan in ESX to see the vdisks.

I've been using the internal CV to manage the EVA. And the firmware is all up to date - when HP shipped me it just over a month ago the firmware was all mismatched, and thanks to the posts on this forum I knew what to do to fix it.

I've opened a job with HP and sent them all the logs. Fingers crossed, hopefully Uwes call on the I/O module will be the easy fix!
Víctor Cespón
Honored Contributor

Re: EVA4400 lost comms to storage cell

The I/O modules need a new firmware, which hopefully will be out in one or two weeks. Until then, the I/O modules reboot and cut the loops randomly, nothing can be done about it.

Although this, or a disk failure, should not make the array inoperative. I'm suspecting more a "indicated disk has transitioned to stalled too long" situation. And that needs a detailed analysis of the logs.

Uwe Zessin
Honored Contributor

Re: EVA4400 lost comms to storage cell

Well, I've seen an enclosure dropping 3 disk drives at once - fortunately, the customer started with an 2C8D from the beginning, so no data was lost.
.
S. Boetticher
Regular Advisor

Re: EVA4400 lost comms to storage cell

I had the following issue, confirmed by HP to be not a first-time thing on a 4400 with 1TB FATA: when the EVA went from rebuild (broken FATA disk) to relevel, suddenly all I/O on the Frontend-Ports shut down, so servers and CV lost the EVA-connections. we had to pull both controllers to bring it back to live. Since then I keep my fingers crossed to not come to the same situation again. We are at CV91, XCS09501100 and HP06.

Maybe similar issue at your site, have been told our EVA shut down FPs due to too much strange IO on Backend. IMHO not good implementation of NOSPF :-(
Rampy(Venu)
Frequent Advisor

Re: EVA4400 lost comms to storage cell

Yes request HP for a special build for XCS09501400 which will need to be upgraded under supervision of the level 3 engineer's for now until it's avaialable for general release.It has new firmware for the i/o module and it has fixes for backend loop disruptions and also for the unit stalled too long issue.For a temporary fix we need to ungroup the drives which are throwing medium error's.
MattR
Occasional Contributor

Re: EVA4400 lost comms to storage cell

Thanks for the replies everyone!

As suggested above, the HP tech indicated it was a known issue and the new firmware fixes the problem. He came out and grabbed some more logs to confirm, so now I'm expecting him to come back soon and apply the firmware.

I'm just thankful that the EVA isn't really in production just yet so the problem hasn't caused too much of a disaster.
Rampy(Venu)
Frequent Advisor

Re: EVA4400 lost comms to storage cell

Alright then matt please keep usupdated.Monitor the EVA for about a 2-3 days after the upgrade and confirm the solution.

Thanks
bmadhav
Advisor

Re: EVA4400 lost comms to storage cell

Hi

Is this fixed after the firmware upgrade as we are also facing the same issue with EVA4400 with 1TB FATA, but the firmware is 09006000..?

Regards
Bindumadhava.
MattR
Occasional Contributor

Re: EVA4400 lost comms to storage cell

Hi Bindumadhava

Apologies, I should have updated this thread.

I didn't update the firmware in the end. Being a newbie to Fibre Channel, I read that best practise was not to bend the FC cables more that 30 degrees. Looking at the cables linking the controller to the enclosure they were borderline 30 degrees so I untied them to give them some slack.

After a month the problem didn't come back and I finished virtualising all my servers. To this date I haven't had the loss of comms.

Although, I'm beginning to think it was related to a faulty HDD. On both occasions the loss of comms happened immediately after there was a failure in bay 4 of the enclosure. This morning the drive in that bay went into a failed state and I've now replaced it.

Matt