HPE EVA Storage
1825719 Members
3037 Online
109686 Solutions
New Discussion

Cluster-Node lost Connection to Storage Disk on FC-SAN

 
Oliver Schultz_1
New Member

Cluster-Node lost Connection to Storage Disk on FC-SAN

Hi,

hope that someone has a Idea
or a fact what we can prove to solve this behaviour in our FC-SAN.

We run a SAN with a FC 9000-Director-Switch and and a ESS F20 Storage. The Cluster which is connected with a QLogic 2200 GF-CK HBA makes the Trouble.
Sometimes in unexpected Time Stamps the Cluster with the running-Ressources loses the Path to only ONE Disk of the dedicated Storage-Disks.
These Disk`s change, sometimes
it`s as example Harddisk 4 then 3 or even sometimes the Cluster-Disk.
If you then take a look at the Cluster Console you can realy check out that the Access to that Disk has been lost.
We make a Zoning for the Cluster to be sure no other SAN-Communication makes the Problems, but it`s still there.
So, here`s the Question: Everything from the Qlogic-Driver`s to the Bios, Setting etc. have been done in according to the Whitepaper`s from IBM.
The Cluster run`s weeks trouble free - and then even at Weekends he looses one Disk of the SAN-Storage and the Cluster moves the Ressources.
I know that we could use two HBA`s and "Multipath-SW" but I like to know and find out to explain and solve this Problem.

So, hopefully someone has an Idea or what to check.

Thanks

Reg.

Oliver
If you think you`re good enuogh-you had stop to get even better...
5 REPLIES 5
Eugeny Brychkov
Honored Contributor

Re: Cluster-Node lost Connection to Storage Disk on FC-SAN

Oliver,
some questions 4u:
- are drivers for QLA2200 up-to-date? Dig into registry/use qlconfig utility to review card's settings;
- was storage device configured properly? What's it? Does it have latest firmware?
- does switch log any errors on the port(s) at this time?
- did you check FC cables lengths? Do you use patch panels?
Please keep in mind that if server loses one disk this does not mean that all disks do not dissapear at that time. This may mean that server was accessing this disk when it noticed disk dissapeared
Eugeny
Oliver Schultz_1
New Member

Re: Cluster-Node lost Connection to Storage Disk on FC-SAN

Hi Eugeny,

thanks for Your quick reply.
To your Question:

Yes, Qlogic/Qlconfig show`s the right Disk`s.
The Storage is configured right and we use a IBM ESS F20 with the latest Firmware.
As I already wrote - everything was installed and setup like IBM wrote in the Whitepapers. And it runs some Weeks without Problems.

What exactly do you mean with "This may mean that server was accessing this disk when it noticed disk dissapeared". Is that a behaviour which points to something?

No, Pathchpanels. You`re right the Windows-Cluster looses sometimes and somehow one of his dedicated Storage-Disk`s from the SAN. The other Node then get`s the Ressource and keep it online.
But I like to know, what`s this Problem coming from.
The Cables(already changed) length is OK. The Ports on the Switch show nothing special in the Logs.
But anything in the Communication must be getting lost , interrupted or so.
If the Server would loose all the Disk`s - there should be a big Problem on the FC-SAN. But we only loose one SAN-Disk over the same Paht sometimes. And these lost Disk also Changes between the configured SAN-Disk for the Server.

Any more Idea`s to Troubleshoot?

Reg.

Oliver
If you think you`re good enuogh-you had stop to get even better...
Eugeny Brychkov
Honored Contributor

Re: Cluster-Node lost Connection to Storage Disk on FC-SAN

Oliver,
intermittent problem is a tough problem. Could you please move cluster tasks to another cluster server and monitor if that server will behave the same way? If yes, then servers are not the cause, if not then something wrong is with this server we're suspecting now
Eugeny
Vincent Fleming
Honored Contributor

Re: Cluster-Node lost Connection to Storage Disk on FC-SAN

Oliver,

From the information you've provided, it sounds to me like you might be experiencing timout issues.

You should be able to raise the I/O timeout period for the Host HBAs. You don't say what kind of host (Windows/HP-UX, Solaris, etc) you are running so I can't give you specifics, but it should be in the documentation for the adapter cards; for example, on Windows, most adapters use a Registry entry to set the I/O timeout period to a higher value.

I would suggest a minimum of 60 seconds for the timeout. Max of 90 seconds. Start with 60 and see how it goes.

Good luck,

Vince
No matter where you go, there you are.
Oliver Schultz_1
New Member

Re: Cluster-Node lost Connection to Storage Disk on FC-SAN

Hi Vince,

sorry if I forget to give you special Details.
We are running a Win2000 Adv. Server 2 Node Cluster with QLogic 2200 GF-CK HBA.
Can you please let me know exact what Registry Values for the HBA / MS-Registry-Values I should Test-Out to get rid of that Time-Out Problem.

On another Post from You I read that You Report about a Behavior with FC-attached and shared Tapes to be connected to a seperate HBA in a Windows Environment.
Can you please let me knwo more about that in Detail.

Thanks for You Help.

Reg.

Oliver
If you think you`re good enuogh-you had stop to get even better...