Operating System - Tru64 Unix
1756753 Members
3071 Online
108852 Solutions
New Discussion юеВ

Re: EVA Blocks I/O To The Outside World.

 
Chris Ellam
Advisor

EVA Blocks I/O To The Outside World.

This entry has been made in the Storage forum and is not cross-posted in the Tru64 forum for additional input.

Our problem is that whenever a disk fails in the EVA the I/O is blocked to the outsite world for a number of seconds. Our problem here is that this virtual disk is serving flight plan data and that when we see a delay of over 3 seconds the application fails over to the redundant Server and storage.

We the following very simple configuration an EV3000 2C2D, 8 * 72 GB disks, VCS 3.110. There are 2 * ES80 AlphaServers in the SAN and 2 * SAN Appliance III, 2 * EVA3000 and 2 * SAN Switch 2/16 and its a NSPOF environment. The ES80 AlphaServer OS is Tru64 V5.1B-4 There are 2 * FCA2384 HBAs in each server, the Driver is 2.25 and the F/W is 1.91X6 (1.4A0). There 2 switches in the Fabric and they are SAN 16/2 B-Series and the firmware is V3.2.1. There is no zoning activated on the switches. We only use 1 virtual disk and its 20 Gb in size. The disk group has double protection and we use Vraid1.

We can simulate the problem by just pulling a disk. At this moment no more reads/writes are processed by the EVA and this delay varies from between 1 second to 10 seconds and we have yet to determine what factors determine the length of the delay and how we can adjust/configure this to remain under 3 seconds.

We believe that when a disk that fails or when it is predicted to fail and then de-configured by the EVA that the same delay occurs as the disk is taken out of the loop.
5 REPLIES 5
Hein van den Heuvel
Honored Contributor

Re: EVA Blocks I/O To The Outside World.

Hello Chris,

Be sure to escalate this to HP Storage support asap. Do not waste time waiting for a forum hint.

And get ready to prepare a statement how more was promissed than can actually be delivered and how to mitigate the situation.

The data in your parallel topic ( # 1169494 ) suggests it is strictly a SAN issue, but you never know!

A reader here may have seen this / worked with this. Just don't hold your breath for an answer here.

What was not immediately clear to me is whether you are talking about a (forced) failure to the very (redundant) storage the application itself is using, or wheter any failure on the Eva, to any virtual disk, in any zone, build from any group could cause this.

If the effect is seen when an other Vdisk fails, then I would experiment with switch zoning (unlikely) and mutliple groups (to carve storage from).

But in the end I would not be surprised to learn if the delay depends on which physical bus (fibre loop) the actual spindle is connected to, and you do not reall want to be aware to those when setting up storage. You want to think that the back-end connection is all powerfull adn transparent! It's complex enough without adding that as a variable!
[Although we did in the HSZ / HSG days, didn't we!? ]

Cheers,
Hein.
Rob Leadbeater
Honored Contributor

Re: EVA Blocks I/O To The Outside World.

Hi Chris,

I agree with Hein, that this is more than likely a storage issue. However it would be useful to know a bit more about how the systems are set up...

Is the single VDisk being used for the operating system, as well as the flight plan data ?

Have you tuned anything within Tru64 to lower the disk I/O timeout ? I thought the default was 60 seconds... I'm just wondering where the 3 second bit fits into things - part of your application maybe ?

Cheers,

Rob
Rob Leadbeater
Honored Contributor

Re: EVA Blocks I/O To The Outside World.

Out of interest, I just set up a similar(ish) test system and couldn't replicate your problem... At least I couldn't see any notable disk problems when I pulled a disk.

How are you measuring the delay that you get when you pull a disk from the EVA ?

Cheers,

Rob
Chris Ellam
Advisor

Re: EVA Blocks I/O To The Outside World.

Hi Hein & Rob,

The OS is now on a local disk, so all we have is 1 raw disk on the EVA. We did have the OS previously on the EVA but had a bad experience with the EVA having a multi-disk failure, the OS continued to run for over 30 minutes before we crashed the system.

The results/delays were verified with collect. I also have a small test program that reads every 100 millisecs from the raw disk in different places sysnchronously and I just compare the time prior to the read and then after the read.
Rob Leadbeater
Honored Contributor

Re: EVA Blocks I/O To The Outside World.

Hi Chris,

I tried again yesterday, and still couldn't get things to time out.

This was using a EVA3000 2C4D with 8 disks in one group. ES45, running same OS as yourself, with the same HBAs.

I did notice some SCSI events when the disk was pulled, and the EVA was redistributing things, but nothing that appeared to stop access to the disks - I had a small script to write out a file to the filesystem every couple of seconds, and none of those were missed... Whether things happen differently with raw disks I'm not sure.

One thing I did think about is what you've got the Direct Eventing option set to for the hosts on the EVA ? If it's Enabled try turning it off. I'm just wondering whether the extra events that get sent through to the OS with this enabled are affecting things somehow...

Cheers,

Rob