- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Disk IO retry - OpenVMS 7.3-2
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-08-2007 08:53 PM
05-08-2007 08:53 PM
We have a EMC storage array that does Raid for us and presents OpenVMS with 10 disk.
We are running OpenVMS 7.3-2 with Update V8. Yes I know we are a little behind with the updates. We access disk via all four node in the cluster with 2 HBA cards per server. Both cards are single port and have fibre cables connected to them. WE do not MSCP serve disk between node.
A few weeks ago someone did a reconfig of some type on the EMC storage. As a result IO to the disk on all 4 node was stalled for 4.9 seconds. Then things resumed.
The EMC support team claim that no other systems connected were effected. These other systems being Windows and Solaris. They also claim that the stall in IO would have only been approx 1 second.
My point is the following:
- The other systems might not measure antthing more than a second of stall as an outage.
- If the outage was only 1 second approx ..then IO would have only stalled for 1 second and not 4.9.
Would this be the case ?
Could a 1 second stall in IO cause VMS to stall IO for approx 4.9 seconds.
Checked the operator logs and other logs. No multi path switching took place during the IO stall.
We consider a .5 second outage as application unavailable. Our cluster is more or less as realtime as you can get ..cluster RECN Interval set as low as 4 seconds and associated params.
Comments ?
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-08-2007 10:14 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-08-2007 10:16 PM
05-08-2007 10:16 PM
Re: Disk IO retry - OpenVMS 7.3-2
My response to this situation...
The word glib comes to mind.
ANY IO delay is unacceptable!
I would say the people managing the EMC box to fix it.
(Ie. if the EMC box was working before, then it can work again).
So what did they change?
Have these delays started on ALL systems since the EMC revision?
Is there something implied by the change to the EMC that implies revising the FABRIC configuration?
You claim there's no path switching going on, which could account for a delay, as there's obviously a problem with the new configuration,
To confirm this, do a:
$sho device
Look and see if all "operations completed"
are where you expect them to be.
Regards
Steven
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-08-2007 10:27 PM
05-08-2007 10:27 PM
Re: Disk IO retry - OpenVMS 7.3-2
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-08-2007 10:47 PM
05-08-2007 10:47 PM
Re: Disk IO retry - OpenVMS 7.3-2
could an IO error have triggered mount-verification on the disks ?
Mount-verifications might not be logged to OPCOM, see the MVSUPMSG_INTVL and MVSUPMSG_NUM sysgen parameters.
Volker."
$ mc sysgen show MVSUPMSG
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
MVSUPMSG_INTVL 3600 3600 0 -1 Seconds D
MVSUPMSG_NUM 5 5 0 -1 Pure-numbe D
$
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-08-2007 10:59 PM
05-08-2007 10:59 PM
Re: Disk IO retry - OpenVMS 7.3-2
Purely Personal Opinion
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-09-2007 02:13 AM
05-09-2007 02:13 AM
Re: Disk IO retry - OpenVMS 7.3-2
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=1066685
May be the minimum recovery time is about 5 seconds ?
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-09-2007 02:19 AM
05-09-2007 02:19 AM
Re: Disk IO retry - OpenVMS 7.3-2
While deploying additional storage (HDS) we had a cluster hang. Any attempt to I/O on EMC would "hang" that server.
No Mount verification never came back from the frame. So apparently, not all communication you would expect to see is available on the EMC paths.
We backed out the HDS changes /crashed rebooted and all the I/O was restored.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-09-2007 07:06 AM
05-09-2007 07:06 AM
Re: Disk IO retry - OpenVMS 7.3-2
Bottom line is that I think that the controller has returned an error, and that recovery on such a serious event may take a couple of seconds.
Jur.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-09-2007 08:02 AM
05-09-2007 08:02 AM
Re: Disk IO retry - OpenVMS 7.3-2
--
Multipath can add some additional time, but since multipath does its work in the context of mount verification, you'd expect to see the OPCOM messages. However, if mount verification suppression is enabled (as it is by default), then it's difficult to figure out what's going on.
Attempting to troubleshoot this after the fact is nearly impossible. A Tool to use *while this problem is happening* would be the DKLOG SDA extension, which will log all the SCSI commands and the SCSI statuses coming back from the controller.
-- Rob
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-10-2007 02:46 AM
05-10-2007 02:46 AM
Re: Disk IO retry - OpenVMS 7.3-2
Thanks
Kevin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-10-2007 06:49 PM
05-10-2007 06:49 PM
Re: Disk IO retry - OpenVMS 7.3-2
when trying to reproduce the IO hang, consider to start some of the OpenVMS 'built-in' SDA extensions to capture some more detailled data.
You can get some help and examples of using them at:
http://eisner.encompasserve.org/~halle/
The following extensions may be useful:
$ ANAL/SYS
SDA> DKLOG
SDA> IO
SDA> FC
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-13-2007 10:00 PM
05-13-2007 10:00 PM
Re: Disk IO retry - OpenVMS 7.3-2
I did a test on a 4100 with HSZ70 running 7.3 and found that
1) splitting a shadow set froze IO for 0.2 seconds
2) reforming a shadow set froze IO for 2 seconds
3) upon shadow copy completion (with bitmaps, didn't check it in the test without them), IO's were blocked for 0.3 sec, 0.6 sec and 0.51 sec (3 times a lock was taken ?)
With/without bitmap had no big influence. During the shadow copy some IO's took 0.07 sec instead of 0.01.
Fwiw
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-13-2007 11:35 PM
05-13-2007 11:35 PM
Re: Disk IO retry - OpenVMS 7.3-2
1) dism 2.5 sec
2) mount 1.9 sec
3) on completion copy 8.9 + 0.6 + 1.9 sec
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-13-2007 11:37 PM
05-13-2007 11:37 PM
Re: Disk IO retry - OpenVMS 7.3-2
A looping DCL script wrote time stamps to a flat file. The time stamps were written at a rate of 55 per 1/100th of a second or 55*100 per second.
During the EMC config several IO stalls took place that ranged from .03 seconds to a massive 1.8 seconds.
Now need to run further test.
This was the first pass.
Regards
Kevin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-14-2007 05:48 AM
05-14-2007 05:48 AM
Re: Disk IO retry - OpenVMS 7.3-2
>During the EMC config several IO stalls took
>place that ranged from .03 seconds to a
>massive 1.8 seconds.
I'll bet you are adding storage and pushing
out zoning changes (RCSNs are the gotchas).
You will want to avoid storage changes and
particularly zoning changes during normal
working hours.
In a previous job I heard about how they
used to merrily make storage and zoning changes
during working hours. Guess what? Real-time
instrument acquisitions don't like long
pauses - do they?
So, painfully all that worked was moved to
off hours - (2 a.m. on weekends).
Welcome to the real-world Neo...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-14-2007 12:34 PM
05-14-2007 12:34 PM
Re: Disk IO retry - OpenVMS 7.3-2
Kevin,
Another thought ... I realize you probably
aren't zoning on this EMC config. But what
may be happening is when a new hyper/meta is
created and presented it may require the
Symm to momentarily place a global lock
on the cache (or section of cache) to set
aside cache lines for the newly created
hyper/meta. With multiple gigabytes of cache
it may take a while to take the lock out,
do the work and release (>1 second being
a "while").
I have a lot of "may" above - I don't know.
The problem is there is a good deal of
unknowns (to me) about how the Symm cache
works, and I have been digging for a long
time so it might just be a closely held
piece of engineering knowledge (or I haven't
stumbled upon the right person - yet).
You're going to have to open a call with EMC
support and describe your problem, perhaps
they can shed some light.
Commenting on the hang you are experiencing..
it really isn't that great but in a real-time
data acquisition scenario it could well
be unacceptable. My comment about storage
and zoning changes moving to off hours was
based on my personal history with EMC Symms.
Rob
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-14-2007 06:34 PM
05-14-2007 06:34 PM
Re: Disk IO retry - OpenVMS 7.3-2
3) on completion copy 8.9 + 0.6 + 1.9 sec
must be
3) on completion copy 0.9 + 0.6 + 1.9 sec
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-14-2007 09:31 PM
05-14-2007 09:31 PM
Re: Disk IO retry - OpenVMS 7.3-2
However, the change we did make was to the SCSI3 bit on 3 devices. This array would still go through the same process to prepare and commit the change therefore forcing an IML of the directors. It is at this point where we are seeing a delay. It's normal for the array to behave in this fashion and at this stage it looks like any config change we make is going to effect your servers..............
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-15-2007 02:48 AM
05-15-2007 02:48 AM
Re: Disk IO retry - OpenVMS 7.3-2
> SCSCI 3 bit set, IML the directors
Well you can close the loop on this one.
Curiously, why would 4 drives out of dozens
not have that bit set?
When they went to change control and had
approval for doing this work did they inform
change control that the directors on the Symm
would be rebooting? etc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-15-2007 09:46 PM
05-15-2007 09:46 PM
Re: Disk IO retry - OpenVMS 7.3-2
Today a disk was replaced in a raid set on the GS160/HSG80 (same config as above).
During this operation I monitored the drive and IO was stalled 3 times for 0.1 sec, then 0.4 sec and finaly for 1.6 sec.
Fwiw
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-29-2007 07:31 PM
05-29-2007 07:31 PM
Re: Disk IO retry - OpenVMS 7.3-2
1 out of 5 writes were delayed with by average 0.6 sec !!!
During a backup of the file to another disk 1 out of 2 writes to the FROM disk were delayed with by average 0.3 sec.
During a backup of a file from another disk to the disk the writes were not delayed at all.
During normal operation of the disk (or even a copy of a large file) I saw only delays of a few 1/100 sec.
Fwiw
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-29-2007 08:07 PM
05-29-2007 08:07 PM
Re: Disk IO retry - OpenVMS 7.3-2
Jur.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-29-2007 08:16 PM
05-29-2007 08:16 PM
Re: Disk IO retry - OpenVMS 7.3-2
I know. Just had no idea of the size of the delay.
Just saw during normal Sybase operations a delay of 0.8 seconds ! So, now I do the test with an increased priority to be sure I'm first in getting the cpu. But with the same results.
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-29-2007 08:57 PM
05-29-2007 08:57 PM
Re: Disk IO retry - OpenVMS 7.3-2
It's not 1 out of 5 IO's that's delayed but 10 IO's per minuut when doing CONTINOUS IO.
Idem for 1 out of 2 : it's 30 per minute.
Sorry
Wim