Operating System - OpenVMS
1752276 Members
4831 Online
108786 Solutions
New Discussion юеВ

Re: Disk IO retry - OpenVMS 7.3-2

 
SOLVED
Go to solution
Kevin Raven (UK)
Frequent Advisor

Re: Disk IO retry - OpenVMS 7.3-2

Thanks for the input so far. I will be trying to emulate the IO hang on our development cluster Friday. I will let you know how it goes or what I find out.

Thanks
Kevin
Volker Halle
Honored Contributor

Re: Disk IO retry - OpenVMS 7.3-2

Kevin,

when trying to reproduce the IO hang, consider to start some of the OpenVMS 'built-in' SDA extensions to capture some more detailled data.

You can get some help and examples of using them at:

http://eisner.encompasserve.org/~halle/

The following extensions may be useful:

$ ANAL/SYS
SDA> DKLOG
SDA> IO
SDA> FC

Volker.
Wim Van den Wyngaert
Honored Contributor

Re: Disk IO retry - OpenVMS 7.3-2

Concerning IO blockage :

I did a test on a 4100 with HSZ70 running 7.3 and found that

1) splitting a shadow set froze IO for 0.2 seconds
2) reforming a shadow set froze IO for 2 seconds
3) upon shadow copy completion (with bitmaps, didn't check it in the test without them), IO's were blocked for 0.3 sec, 0.6 sec and 0.51 sec (3 times a lock was taken ?)

With/without bitmap had no big influence. During the shadow copy some IO's took 0.07 sec instead of 0.01.

Fwiw

Wim
Wim
Wim Van den Wyngaert
Honored Contributor

Re: Disk IO retry - OpenVMS 7.3-2

Same test on interbuilding cluster of GS160 with dual HSG80 in each building but only with bitmap.

1) dism 2.5 sec
2) mount 1.9 sec
3) on completion copy 8.9 + 0.6 + 1.9 sec

Wim
Wim
Kevin Raven (UK)
Frequent Advisor

Re: Disk IO retry - OpenVMS 7.3-2

Did a test Friday with a test/development EMC storage array. A config was written. Some disks that were not live were changed on the EMC storage and the config written.

A looping DCL script wrote time stamps to a flat file. The time stamps were written at a rate of 55 per 1/100th of a second or 55*100 per second.

During the EMC config several IO stalls took place that ranged from .03 seconds to a massive 1.8 seconds.

Now need to run further test.

This was the first pass.

Regards
Kevin

Rob Young_4
Frequent Advisor

Re: Disk IO retry - OpenVMS 7.3-2


>During the EMC config several IO stalls took
>place that ranged from .03 seconds to a
>massive 1.8 seconds.

I'll bet you are adding storage and pushing
out zoning changes (RCSNs are the gotchas).

You will want to avoid storage changes and
particularly zoning changes during normal
working hours.

In a previous job I heard about how they
used to merrily make storage and zoning changes
during working hours. Guess what? Real-time
instrument acquisitions don't like long
pauses - do they?

So, painfully all that worked was moved to
off hours - (2 a.m. on weekends).

Welcome to the real-world Neo...

Rob Young_4
Frequent Advisor

Re: Disk IO retry - OpenVMS 7.3-2


Kevin,

Another thought ... I realize you probably
aren't zoning on this EMC config. But what
may be happening is when a new hyper/meta is
created and presented it may require the
Symm to momentarily place a global lock
on the cache (or section of cache) to set
aside cache lines for the newly created
hyper/meta. With multiple gigabytes of cache
it may take a while to take the lock out,
do the work and release (>1 second being
a "while").

I have a lot of "may" above - I don't know.
The problem is there is a good deal of
unknowns (to me) about how the Symm cache
works, and I have been digging for a long
time so it might just be a closely held
piece of engineering knowledge (or I haven't
stumbled upon the right person - yet).


You're going to have to open a call with EMC
support and describe your problem, perhaps
they can shed some light.

Commenting on the hang you are experiencing..
it really isn't that great but in a real-time
data acquisition scenario it could well
be unacceptable. My comment about storage
and zoning changes moving to off hours was
based on my personal history with EMC Symms.

Rob
Wim Van den Wyngaert
Honored Contributor

Re: Disk IO retry - OpenVMS 7.3-2

Typo in my last post :
3) on completion copy 8.9 + 0.6 + 1.9 sec
must be
3) on completion copy 0.9 + 0.6 + 1.9 sec
Wim
Kevin Raven (UK)
Frequent Advisor

Re: Disk IO retry - OpenVMS 7.3-2

The config changes were as below ...extract from e-mail from our EMC chaps....


However, the change we did make was to the SCSI3 bit on 3 devices. This array would still go through the same process to prepare and commit the change therefore forcing an IML of the directors. It is at this point where we are seeing a delay. It's normal for the array to behave in this fashion and at this stage it looks like any config change we make is going to effect your servers..............
Rob Young_4
Frequent Advisor

Re: Disk IO retry - OpenVMS 7.3-2


> SCSCI 3 bit set, IML the directors

Well you can close the loop on this one.
Curiously, why would 4 drives out of dozens
not have that bit set?

When they went to change control and had
approval for doing this work did they inform
change control that the directors on the Symm
would be rebooting? etc.