BladeSystem - General
1748285 Members
3961 Online
108761 Solutions
New Discussion юеВ

BL460 (Gen 9/10) blades randomly losing NIC port on Cisco B22

 
TechToaster
Occasional Visitor

BL460 (Gen 9/10) blades randomly losing NIC port on Cisco B22

Hello - I am looking for some answers, or seeking someone that has experienced something like we have been here where I work. This has been an ongoing issue, and seems to only be getting worse. HPE support and Cisco have ZERO answers for us, instead we just constantly throw hardware (replacement Cisco B22 FEX modules) at the problem after getting the typical level 1 to level 2 run-around for a few days. Its beginning to get quite frustrating. Below is the issue we are seeing....

We run C7000 chassis with a combination of BL460 Gen9 and Gen10 servers. These servers are configured to have 4x 10G NICs routed up thru Cisco B22 FEX modules in the chassis I/O slots 1, 2, 5 and 6.

In this past year, we have had countless instances where one of the 4 NICs will disconnect, and never reconnect. This happens completely at random, there is no rhyme or reason. The servers are just running in ESXi and suddenly lose a port. 

When you check UEFI on that specific NIC and port, it shows "Disconnected" like there is no logical link between the NIC and the FEX/switch....

Here are the steps we have tried to get these working again:

Updated to latest FW levels

Reseating the blade, and the NICs 

Putting new NICs into the blade from known working blade - still same port showing D/C

Replacing NICs with new parts shipped directly from HPE - nothing 

Moving the blade into a DIFFERENT chassis with the same I/O setup and the port shows connected - this tells me that its a B22 issue. HPE agrees, and has tried multiple times to "pawn" us off to Cisco support. After countless phone calls with Cisco, they repeatedly tell us to work with HPE support. After getting back with HPE support (typically a level 2 rep by now - after DAYS of calling and going thru the same crap over and over) they suggest we replace the B22 FEX module. 

Replacing the B22 FEX has worked in the past - HOWEVER - we are working in a high availability environment and shifting productivity away from the chassis to replace the FEX isn't always feasible due to development timelines, current workload, deadlines and so on. This is happening more and more, and we have been getting in a position where there are random failed NICs in almost EVERY chassis we have. This creates a special issue where we have blades that are just turned off and not used because they don't have the 4x NICs we need for our environment and we cant replace the B22 FEX in the chassis to get it fixed. 

Like I said, HPE and Cisco don't seem to have a solid answer for me at all. Has anyone else been dealing with this? What are your solutions? Any help is welcome, thanks! 

4 REPLIES 4
frenchy94
Regular Advisor

Re: BL460 (Gen 9/10) blades randomly losing NIC port on Cisco B22

hi

shall you involve Vmware support , they have tools to track this kind of
issue

JY

TechToaster
Occasional Visitor

Re: BL460 (Gen 9/10) blades randomly losing NIC port on Cisco B22

While I would agree with you, this issue is not just our ESXi systems - also plagues our Windows blades as well. 

frenchy94
Regular Advisor

Re: BL460 (Gen 9/10) blades randomly losing NIC port on Cisco B22

you need to start with one OS first, ESX has lot of tools in board, no
need to install as window

so may be start to debug on ESX host first



TechToaster
Occasional Visitor

Re: BL460 (Gen 9/10) blades randomly losing NIC port on Cisco B22

Not sure youre fully understanding the issues we are having. This seems to be a hardware issue that we are having with the B22's. This is not tied to an operating system.