SAN interrupted by a Fabric member reboot

angyesther · ‎04-12-2021

Hello,
So this happened, we had an electrical task on a rack so we powered off a HPE synergy chassis and a SAN switch (we’ll refer to it as SW-C). Fabric is composed by only three switches, SW-A is master and is connected to SW-B, SW-B is connected to SW-C, there is no direct communication between SW-A and SW-C.

SW-C was powered off (not gracefully, I’m sorry), all electrical adecuacions were made, then it was powered on and a couple of minutes later we powered on the Synergy Chassis.

As we were monitoring the booting, we were reported that a server and 3par storage connected through SW-B had lost connection (we didn’t touch SW-B) and they’re blaming us. Fabric is not supposed to be interrupted by a member’s reboot, or is it? Connection was lost for a couple of seconds but enough for system (Centos) to report that file system was mounted read-only.

All switches are redundant. Other OS are VMware and Windows, none of them reported that interrupt, but the two Centos did.

Please help me. What could have happened?

Thanks and regards,
Angy

parnassus · ‎04-12-2021

Hard to say without a topology and zoning map of your Storage-SAN Switches-Hosts.

Generally speaking a SAN should be made of two logically and physically separated SAN Fabrics (let me call them as SAN Fabric "A" and SAN Fabric "B"), each SAN Fabric could be made by one SAN Switch or by more SAN Switches (interconnected by means ISL) and one of these SAN Switches will act as "commander" (or "primary").

Hosts and Storages should connect simmetrically some HBA ports on SAN Fabric A and some others on SAN Fabric B so in case of failure of an entire SAN Fabric the other one will keep them up (the same could be said about HBA or HBA ports), the way you connect Hosts' HBA ports and Storages' ports should reflect this redudancy scheme (resiliency) in a way that a single path/HBA/node could break BUT there should be always at least (in the worst case) one path - per SAN Fabric - still up and running.

Add to that that - within each SAN Fabric - there should be the zoning to help you define who is seeing who (or, better, who is able to communicate with who), this could make thing less easier to understand but also it keep easier to know who will be impacted if something will happen into a specific SAN Fabric...zoning acts within a SAN Fabric...so you should have SAN Fabric "A" zoning which should be simmetrical to SAN Fabric "B" zoning...and, clearly, SAN zoning let you to pilot (to route) where SAN traffic goes (avoiding, as example, ISL if your storage and your hosts are connected on the same specific SAN Switch of that particular SAN Fabric).

I'm not an HPE Employee

Cali · ‎04-14-2021

Hi Angy,

>> Fabric is not supposed to be interrupted by a member’s reboot, or is it?

A Device reboot issue a Fabric LIP, this is a short disruption.

https://www.thegeekdiary.com/when-to-use-rescan-scsi-bus-sh-i-lip-flag-in-centos-rhel/

To avoid this for the hole Fabric, every Server should be in a separate Zone.

Check this first.

Cali

I'm not an HPE employee, so I can be wrong.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

SAN interrupted by a Fabric member reboot

SAN interrupted by a Fabric member reboot

Re: SAN interrupted by a Fabric member reboot

Re: SAN interrupted by a Fabric member reboot