Disk Enclosures
1753720 Members
4887 Online
108799 Solutions
New Discussion

EVA4400 controller reset and host port issues

 
Joseph Salzer_3
Advisor

EVA4400 controller reset and host port issues

Hi all, I have an interesting issue.

Here is the scenario:

We have EVA4400s at two different sites. Each EVA4400 has (8) disk enclosures and all enclosures are fully populated with 450GB 15K FC disks. There are (2) Cisco 9222i switches at each site for redundancy. Not using CA. All hosts accessing the EVA are ESX hosts running ESX 3.5 Update 4. All servers using HP branded Emulex A8002A (FW 2.72a2)

Here is the issue (same issue at both sites):

We were advised by HP to upgrade the EVA controllers from XCS 09004000 to 09522000. After we did the firmware upgrade (at both sites), the host ports on both controllers go up and down and eventually will cause the controllers to reboot (sometimes the master controller, sometimes the slave). There is really no pattern to it. On the event logs you can see that the links on the host ports go up and down, eventually the controller sends a "last gasp" message and reboots. We also occasionally get "work request resources have run out" or something to that effect.

HP has received SSSU collect scripts, show tech support output from the switches, etc. HP at first said that it could be a load balancing issue. I can see how a load balancing issue can affect the performance of the ESX hosts, but to cause a controller to reboot? It didn't seem right to me. Anyway, everything was load balanced and multipathing manually configured per HP's recommendations. As expected, this did not solve the problem.

A month later and we continue to have these issues...

Here is how the SAN is configured:

There are two fabrics, Fabric A & Fabric B. Each fabric is made up of a single 9222i switch running SAN-OS 3.3(4), The EVA controllers are cabled as follows:

Controller 1, host port 1 to Fabric A
Controller 2, host port 1 to Fabric A
Controller 1, host port 2 to Fabric B
Controller 2, host port 2 to Fabric B

Sample Server1Zone1 (Fabric A):
Server1_HBA_1
EVA_CNTRL1_FP1
EVA_CNTRL2_FP1

Sample Server1Zone1 (Fabric B):
Server1_HBA_2
EVA_CNTRL1_FP2
EVA_CNTRL2_FP2

Everything resides in VSAN 10 (Fabric A) and VSAN 20 (Fabric B), with the exception of two ESX hosts which reside in VSAN 30 (Fabric A) and VSAN 40 (Fabric B). Access to the EVA which resides in VSAN 10 & VSAN 20 is provided through IVR to the hosts in VSAN 30 and 40.

As far as the EVA is concerned, another change that happened at the same time as the firmware upgrade was that (5) new disks enclosures were added to the EVA4400.

Everything looks good on the switches and everything is properly zoned, etc. One of my thoughts was to add another level of granularity to the zones. So, instead of having one HBA per zone with (2) EVA host ports, I would create two zones for each HBA with only one host port per zone. Like this:

Proposed Sample Server1Zone1A (Fabric A):
Server1_HBA_1
EVA_CNTRL1_FP1

Proposed Sample Server1Zone1B (Fabric A):
Server1_HBA_1
EVA_CNTRL2_FP1

Proposed Sample Server1Zone1A (Fabric B):
Server1_HBA_2
EVA_CNTRL1_FP2

Proposed Sample Server1Zone1B (Fabric B):
Server1_HBA_2
EVA_CNTRL2_FP2

I have done many EVA installations in the past and I’ve never had to zone it this way, but it can’t hurt. At least it may help isolate the problem. I seem to remember there was an HP advisory suggesting this level of granularity on some Command View SBM issues. Any thoughts on this?

Any thoughts or suggestions are greatly appreciated. This issue has become become quite critical. HP is still involved, but we don't seem to be getting very far. They do say that the EVA is not the problem (which I agree). It is very unlikely to have (2) EVAs, with bad controllers.

Thanks,
JS
1 REPLY 1
IvanForceville
Regular Advisor

Re: EVA4400 controller reset and host port issues

Maybe next link will explain a lot.

http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1391555

We have two 4400 in a CA setup and we're going to implement this type of zoning to.
Since our setup is highly unstable we're forced to... lots of problems with losing CV communication to the EVA's.