HPE Community read-only access December 15, 2018
This is a maintenance upgrade. You will be able to read articles and posts, but not post or reply.
Hours:
Dec 15, 4:00 am to 10:00 am UTC
Dec 14, 10:00 pm CST to Dec 15, 4:00 am CST
Dec 14, 8:00 pm PST to Dec 15, 2:00 am PST
BladeSystem - General
cancel
Showing results for 
Search instead for 
Did you mean: 

FC on BL460c G1 in C7000 Mystery

 
SOLVED
Go to solution
Bob Firek
Regular Advisor

FC on BL460c G1 in C7000 Mystery

I have a mystery and I'm not quite sure how to describe it. We have a C7000 connected to two Cisco MDS9124 connected to an EVA6100. The FC hab1 ports on the blades in device bay 2 and device bay 10 do not connect to the storage. I've checked the fabric, cables everything looks good. These two blades are running ESX3.5. The really strange thing is that if I shut down the ESX server in device bay 10 for maintenance it will reboot our Polyserve blade in device bay 8. This is just a basic overview. I can provide more detail if need. I'm just beating my head against the server rack now. Any help or suggestions will be greatly appreciated.
16 REPLIES
Adrian Clint
Honored Contributor

Re: FC on BL460c G1 in C7000 Mystery

Sounds like a SAN switch issue to me.

Are your HBA driver,firmware and SAN switch firmware all in the EVA support matrix?

I'd also update the iLO, BIOS and Power controller firmware if you have power action issues.
Bob Firek
Regular Advisor

Re: FC on BL460c G1 in C7000 Mystery

Adrian,

The HBA drivers and firmware on the blades are up to date. The SAN switch firmware is on the EVA support matrix but just barely. Currently switch 1 is running version 3.1(3a) and switch 2 version 4.1(1b). I want 4.1(3a) or what is the current version. But before I upgrade the switches I'd like to resolve the pathing issue. I guess I'm in a chicken and an egg sitution. Is the older code causing the problem and upgrading will fix the problem or could be something else. I have one more blade to upgrade the firmware and iLO on before I can upgrade the enclousure and OA cards. The last blade to upgrade is hard to get freed up to run the upgrade. So I'm thinking get the get the enclosure, iLO, Onboard Administrator and Virtual Connect first and then move to upgrading the fiber channel swithes and then upgrade the EVA. Any thoughts or suggestions?

thanks for your help
Adrian Clint
Honored Contributor

Re: FC on BL460c G1 in C7000 Mystery

Bob,
Your config is as far as I know an invalid unsupported config. Having two switches with different firmware is probably your biggest issue. (especially major versions being different)
You need to get someone to confirm that the EVA firmware, Switch firmware, HBA firmware and HBA driver are in a supported matrix...not just the latest.
Eg it could be that the latest switch firmware is not in the supported matrix for your EVA firmware.
Get this confirmed in the SAN forum
http://forums11.itrc.hp.com/service/forums/categoryhome.do?categoryId=248
TTr
Honored Contributor

Re: FC on BL460c G1 in C7000 Mystery

> The FC hab1 ports on the blades in device bay 2 and device bay 10 do not connect to the storage

I read this as blades 2 and 10 have FC HBAs in them but are not in use.
You mention port1 but what about port2?
What about the zoning in the two MDSes? Anything in there that is related to blade8?

What about the ESX OSes of blade10? Is there any VM that is tied up to the polyserve server at the OS or application level?

What do the reboot logs say on the polyserve?
Bob Firek
Regular Advisor

Re: FC on BL460c G1 in C7000 Mystery

Adrian, good time. I have to check my notes but I think I did my due diligence a couple of months ago and I was in spec but I'll post in the SAN forum.

TTr,
Thanks for your thoughs. Intresting. When I checked in Fabric Manager blade 8 shares the same connection with blade 10 and blade 4 through the the HP 4Gb VC-FC module on both port 1 and port 2. Blade 10 shows connected under link status as does all of my other hosts. However when I got to the ESX host in blade 10 and do a esxcfg mpath -l show I do not see any storage on port 1. If I got to the ESX host on blade 4 and do the same command I see storage on both port 1 and 2. When I go to the Polyserve server in port 8 and look through HP MPIO DSM Manager I see storage on both port 1 and 2. Now here is the kicker. I didn't mention this before but I have another Polyserve server on blade 16 (which shares the fiber connection with blade 2) and when I go into HP MPIO DSM Manager I only see storage on port 2. Very strange. Do you think the mismatched software versions on the switches could be causing this anomaly.
TTr
Honored Contributor

Re: FC on BL460c G1 in C7000 Mystery

I am not sure about what you mean by "seeing storage on port...". Do you have any zoning defined on the MDSes? Or is it a flat fabric and possibly getting interference between HBAs? A zone should have one initiator (server) port and one target (disk array) port in it. If you have more than one HBA in the same FC zone you will get target traffic interference across the HBAs. I would expect the firmware difference to be the last thing to cause this but it could do it as well.
CoryB
Advisor

Re: FC on BL460c G1 in C7000 Mystery

What fibre interconnects are you using?

If Virtual Connect modules, log into VC Manager and look under 'Hardware Overview' --> 'Interconnect Bays' --> Bay 3 and Bay 4. Scroll down to the Server Port Information section and verify the individual ports are logging into the VC modules.
TTr
Honored Contributor

Re: FC on BL460c G1 in C7000 Mystery

> What fibre interconnects are you using?

The Cisco MDS9124 are fiber interconnect switches. I think the zoning needs to be verified so that there is no HBA interference across the blade servers.
Bob Firek
Regular Advisor

Re: FC on BL460c G1 in C7000 Mystery

TTr - It appears that the Polyserve servers are seeing the correct luns that are present to them except Polyserve02 can only see the luns present on fabric b. It is blind to fabric a. Where as Polyserve01 can see the storage present on both fabric a and b. This is by design for redundency. This is the same for the VMware hosts. They see the correct luns present to then except that blade 2 and blade 10 are blind to fabric a. I've checked the zoneing in Fabric Manager as everything appears correct. The blades connect to two HP 4Gb VC-FC modules. These modules has four 4Gb uplink ports that one module goes to FC switch 1 (fabric a) and the other one goes to FC switch 2 (fabric b). The fibre channel ports in the blades are split between the two modules. Each fibre port in the VC-FC modules handles four blades. It seems to me that if it was an issues with a bad cable or dirty connector the all blades connecting through that fiber connection would not work. I'm at a lost to determain where the problem lies. I hate to upgrade a system that isn't a 100 percent because who knows what kind of problems I may create for myself.

CoryB, good suggestion but another problem I ran in to is that a while ago we had added four additional blades that happen to be ProLiant BL460c G6 model. We added then to the enclousure that had BL460c G1 blades and appearently that broke Virtual Connect. When I try and launch VC I just get a gray striped "candy cane" with a message,"Loading, please wait..." I'm told upgrading the OA will take care of that but I can't get Polyserve02 long enough to run the upgrade on the server.
CoryB
Advisor

Re: FC on BL460c G1 in C7000 Mystery

We were having the same issues with two of our BL460c G1s that were running ESX where certain paths would go blind to the fabric. For us the problem was not SAN zoning but that the HBAs were not logging into the VC-FC switches, which is why I mentioned the VC Manager - the fix for us was a reset of the the two VC-FC interconnect modules. Note that we also had to reset the VC-Enet module that was the primary VC Domain Manager after we added some G6 blades to our chassis as well.
Adrian Clint
Honored Contributor

Re: FC on BL460c G1 in C7000 Mystery

He doesnt have Virtual Connect. He has Cisco 9124 SAN switches.
But I can see why you did it for VC.
Bob Firek
Regular Advisor

Re: FC on BL460c G1 in C7000 Mystery

Adrian, actually we are using Virtual Connect. VC is used to configure the the backend blade stuff, FC connections, ethernet ect. The enclousure is attacted to the MDS9124 through the VC-FC modules and the EVA6100 is connected to the MDS9124. So the issue that CoryB talked about is intresting to me. I'm just not sure I'm up to the task of pulling all the cards but I guess if that is what has to happen. CoryB can you explain this sitution a little more? Did this process come from HP support?
CoryB
Advisor

Re: FC on BL460c G1 in C7000 Mystery

When we added the two new blades to the existing enclosure they were having difficulty with iLO connectivity dropping. When I called HP, they did recommend bouncing the OBA modules which resolved the iLO issue.

When we ran into the FC connectivity problem, I bounced the VC-FC interconnects based on the success of the OBA resets and that resolved the FC problem as well, no support call for this part.
Adrian Clint
Honored Contributor
Solution

Re: FC on BL460c G1 in C7000 Mystery

Ah yes silly me I missed the bit where you started talking about VC. I've been quoting a blade solution with 9124 internal switches over the last 2 days and forgot the model no also refers to external ones.

As you mention you have 4,8,10 on the same uplink does that mean you have static distribution set to map uplinks>downlinks?
Uplink #1 - 1, 5, 11, 15
Uplink #2 - 2, 6, 12, (16)
Uplink #3 - 3, 7, 9, 13
Uplink #4 - 4, 8, (10), 14

I'm also assuming you gone thru the actions in Appendix B in the VC SAN Cookbook?
http://bizsupport1.austin.hp.com/bc/docs/support/SupportManual/c01702940/c01702940.pdf

I'd also ask for advice on the Virtual Connect User Group there are many of the VC design and support team on there.
https://h30340.leveragesoftware.com/default.aspx
Jeroen did a good list of VC commands you can type that will help you see more detail on the port status on the VC side
https://h30340.leveragesoftware.com/portfolio_detail.aspx?fileid=3ec7b55a121b45debbe6bd41258eaec2
Bob Firek
Regular Advisor

Re: FC on BL460c G1 in C7000 Mystery

CoryB - logical. I curious. Did you bring down the blades before reseating the VC-FC modules or did you just yank it out?

Adrian - You've given me a lot to think about. I'll get back to you when I don't have "people" pulling on me. The nerve.
Bob Firek
Regular Advisor

Re: FC on BL460c G1 in C7000 Mystery

Adrian - I believe we have static distribution set to map uplinks>downlinks. I didn't set up the configuration but was involved in the discussions. I recall that originally the mapping was set up for dynamic but we ran into the issue of the blade in bay 2 looking the fiber connection on port 1 to the SAN. We switched to static and initially it seemed to solve the problem but after a time it lost its connection to the SAN and no matter what we did we could not get it to reattach. About a month ago after applying some VMware updates and rebooting the blades I noticed that the blade in bay 10 lost its connection to the SAN on port 1. Also during these VMware updates I noticed that than when I brought down blade 4 and 10 I would receive errors on the Polyserve server on blade 8. Actually the errors had to do with fencing and I believe to prevent "split-brain" Polyserve issued a reboot. So to me that seems to indicate the connection to the SAN was lost but Polyserver saw both servers were up and communicating. It didn't want to corrupt the data so it rebooted the suspect server. So what I'm wondering is what causes the connection to the SAN to be interrupted when some blades in the uplink group and shut down. Any help or suggestions you can provide is greatly appreciated.