BladeSystem - General
1748011 Members
3802 Online
108757 Solutions
New Discussion

Re: Blades never boot if SAN goes offline, even after SAN is availible; because HBA BIOS gets disabl

 
RandomName
Occasional Contributor

Blades never boot if SAN goes offline, even after SAN is availible; because HBA BIOS gets disabled.

SAN_Diagram.jpg

 

Hello everybody, thanks for taking the time to read this. I figure this is really simple or its a gotcha with boot from SAN.

 

The problem:  In our testing, we have documented that our blades only atempt to boot from FC HBA mezzanine cards at BIOS reset, as in power cycle, reset or Ctrl+Alt+Del. This is due to the FC HBA BIOS is getting disabled after not seeing the SAN on the first boot attempt.

 

There does not appear to be any option in the BIOS to reset the PCI bus, ROM ICs or anything that will make the blade rescan/reinitialize the HBA.

 

In the HBA config, there is only one relevant option: Enable LIP (Loop Initialization Process) reset. Tried enabling this option with no change, as it appears related to the OS requesting reset and login.

 

BIOS.jpg

 

We killed power to the entire system at the PDUs and then restored. The servers attempt boot in about a minute but the SAN is not ready for about 3 or 4 minutes. The servers will never boot if the SAN doesn't respond the very first time it tries to boot, but it will try to boot from CD, LAN etc. every 5 seconds for eternity.

 

We could probably set a delay in the boot on the blade enclosure, but that is a bandaid. If 1 or 16 servers rebooted for updates, etc or whatever reason, at the same time the SAN wasn't availiable, they would never boot. If the blades go into a loop of trying all other boot options, I want that same ability for the boot from SAN. ASR is enabled, but makes no difference as it doesnt look for BIOS boot failure loop.

 

The servers go into an endless loop of the following two BIOS screens:

 

BIOS_1.jpg

 

I think an automatic PCI reset would fix the problem after this screen:

 

BIOS_2.jpg

 

We have the following setup:

C7000 with 16x BL260cG5 blades, Dual Cisco MDS, P2000 G3 FC/iSCSI

 

 - Each blade has a Qlogic QMH2462 dual-port 4Gb FC HBA.

 - We have 2 Cisco MDS9124e SAN Switches with the port upgrade license for full redundancy.

 - Each FC HBA has a connection to each SAN switch.

 - Each switch has a connection to each controller on the P2000 G3.

 - Each HBA BIOS is enabled and all 4 targets are listed for boot.

 - OS is Server 2008 using native MPIO

 

Just to validate the setup, all 16 blades survive all of the following while running IOMeter:

I have pulled 3 FC cables during IOMeter load test.

I have hot ejected a SAN switch.

I have hot ejected a controller.

I have done all of these simultaneously about 10 times with different switches, cables, controllers in different orders.

Bandwidth drops, but is restored within 2-6 seconds of restoring connection.

System appears to be rock solid.

 

SAN_Diagram_Detail.jpg

 

Thank you!

1 REPLY 1
Casper42
Respected Contributor

Re: Blades never boot if SAN goes offline, even after SAN is availible; because HBA BIOS gets disabl

Everything there is working as expected unfortunately.

What you should look into is on there is an option in the Onboard Administrator to delay the boot of various blades upon the c7000 receiving power after being totally down.
It was really meant to allow Blade 1 to boot before Blade 10 if there are dependencies there, but I don't see why it wouldnt work for your needs.
Set the blades to boot with 300 seconds of delay, and the OA should hold them in a powered off state for 5 minutes and then start powering them on.

If you need them up in a particular order, then just add to that.
300 for Blades X and Y
330 for Blades A and B
400 for Blades M and N
etc