HPE Community read-only access December 15, 2018
This is a maintenance upgrade. You will be able to read articles and posts, but not post or reply.
Hours:
Dec 15, 4:00 am to 10:00 am UTC
Dec 14, 10:00 pm CST to Dec 15, 4:00 am CST
Dec 14, 8:00 pm PST to Dec 15, 2:00 am PST
BladeSystem Virtual Connect
cancel
Showing results for 
Search instead for 
Did you mean: 

Two Fabric redundancy and storage flapping

 
brumer
Occasional Contributor

Two Fabric redundancy and storage flapping

Hello!

 

I have a fairly new Nexus 5548 implementation, using the Nexus for stricly storage. I have two 5548s for two different Fabrics, for redundancy. They are two seperate fabrics, and the Nexus are not stacked so they are managed individually. When I have both Nexus online my VMware side starts flapping, and loosing storage which causes my ESXi hosts to lock up and VMs to go unresponsive. This does not happen 100% of the time, but it happens intermitentaly and sometimes is catestrophic to datacenter services. When I disconnect one of the San Fabrics on all the enclosures (or shut down a Nexus switche) storage comes back and everything is healthy.

 

All hosts are connected via 4GB FC (HP customer support said to do this bc of known problems with 8GB)

5/6 of my hosts are on HP c7000 enclosures via 10gb FlexFabric switches, the rest via UCS

Netapp clustered pair is the target. When the ESXi hosts loose storage, they are still flogi'd in to the storage and fabric

 

ESXi 5.0 w/newest (and correct) drivers. VMWare tech support sees no problems, other than the "storage is getting pulled from the host"

Newest firmware on everything HP & UCS

 

 

 

Any ideas? Do I have a design flaw in my fabric? HP, Cisco, Netapp, and VMware all pretty much have no clue. So this forum is a shot in the dark. Thanks for ANY ideas you guys can provide

3 REPLIES
Hongjun Ma
Trusted Contributor

Re: Two Fabric redundancy and storage flapping

when you have this problem, does it happen to both UCS and HP blades at the same time? If that's the case, very likely the problem is at N5K layer because I'm assuming your UCS is connecting to the same pair of N5K.

 

just curious what's your firmware used on hp side for OA/VC/NIC/CNA and NIC/FCOE driver for ESXi?

 

don't use Nexus5K for 8G FC connection. So far the issue looks like on N5K side.

http://www.linkedin.com/groupItem?view=&gid=2429235&type=member&item=79570817&qid=5f4ad350-1310-4775-b894-0993fd5baa84&trk=group_most_popular-0-b-ttl&goback=%2Egmp_2429235

 

My VC blog: http://hongjunma.wordpress.com



brumer
Occasional Contributor

Re: Two Fabric redundancy and storage flapping

Thanks for your reply!

 

I agree in the thinking that it is Nexus. I posted on the HP forum hoping that somebody might have a good idea, like you mentioned. There are a lot of smart people and I figured Id try here and Cisco.

 

We learned the hard way previously about the 8G FC to the VC.Everything is now hard set to 4GB on VC and nexus.

 

The requested info is:

VC=3.51

OA = 3.32

Emulex firmware = 4.0.360.15

ESXi driver = be2net 4.0.355.1

 

 

Thanks for your help!

 

Hongjun Ma
Trusted Contributor

Re: Two Fabric redundancy and storage flapping

all versions you listed are up to latest. you didn't list ESXi FCOE/FC driver for HP blade, is it 

8.2.2.105.36(for ESXi5)

 

If so, you are up to date for firmware/drivers related to this issue.

 

u can also check this post for latest firmware/driver versions.

http://h30499.www3.hp.com/t5/HP-BladeSystem-Virtual-Connect/VC-3-51-experiences-with-VMware/m-p/5547589/highlight/true#M1909

 

you can also try to post the issue at cisco customer forum of data center networking or unified computing. they have active forum there for N5K and UCS. I'm assuming you already worked with HP support and Cisco TAC for this and kind of get stuck.

 

my background is not on storage so I won't be able to get in deeper for detailed troubleshooting steps.

My VC blog: http://hongjunma.wordpress.com