Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Logical name timeout on OVMS8.3?

Milton Baar
Frequent Advisor

Logical name timeout on OVMS8.3?

I have created a logical name in the form of a search list, define/sys/trans=conc logname dev1:,dev2:

I can, for example, do dir dev1 or dir dev2, also dir logname and they all work. If the actual physical connection to dev1 or dev2 goes away, I can still do a successful dir logname and see the contents of the remaining device. But, it takes about 30s for the logical name to drop through the list and hit the "online" device.

Is there any way (SYSGEN parameter?) to make this faster - much faster?
23 REPLIES
Volker Halle
Honored Contributor

Re: Logical name timeout on OVMS8.3?

Milton,

what happens to DIR dev1: if the 'actual physical connection' to that device goes away ? What kind of disks are these ? What do you do them ?

Is there an error message regarding dev1: in DIR logname, if that device is 'offline' ?

Volker.
Milton Baar
Frequent Advisor

Re: Logical name timeout on OVMS8.3?

OK, they are not "real" disks, they are NFS-served disks from another set of servers. I can do a $dir on the disk names nfs1:[000000] and nfs2:[000000] when they are both attached, if I pull a cable, the $dir on nfs1:[000000] fails, but I can still do it on nfs2:[000000]. I can also do a $dir logname:[000000] on the logical search list when the cable is pulled, and I get data returned from nfs2:[000000] with no error messages, just a 20 second or so delay. I need to minimise this delay - sub-second would be good :)
Volker Halle
Honored Contributor

Re: Logical name timeout on OVMS8.3?

Milton,

does the DIR nfs1:[000000] fail immediately or after around 20 seconds ? What kind of error is reported ?

This has nothing to do with logical names. This may involve a NFS-related timeout mechanism.

Volker.

Milton Baar
Frequent Advisor

Re: Logical name timeout on OVMS8.3?

Hi Volke

It is, from memory, an NFS related timeout - perhaps I need to find an NFS parameter so that it "fails" faster?
Volker Halle
Honored Contributor

Re: Logical name timeout on OVMS8.3?

Milton,

read $ TCPIP HELP MOUNT

This may give parameters, which could be 'tweaked'...

Volker.
Milton Baar
Frequent Advisor

Re: Logical name timeout on OVMS8.3?

The default timeout is 1s, but there is a /cache parameter, which has a default of 30s, this may be the one to try. Will have a play and report back!
Wim Van den Wyngaert
Honored Contributor

Re: Logical name timeout on OVMS8.3?

Try mount/retry.

At my site it took about 15 seconds when retry was 4. When retry was 1 it took 4 seconds.

Strange but tcptrace shows 5 retrans when retry is 1.

Wim
Wim
Milton Baar
Frequent Advisor

Re: Logical name timeout on OVMS8.3?

Well, getting very frustrated....did a mount/cache=(dir:::01) which should, in theory, timeout after 1 second....no luck there, still 30s wait.

There appear to be no other parameters I can set, OVMS is a client for NFS services, so the server parameters seem unlikely. Looked at ACP and RMS parameters in SYSGEN to see if something there was set to 30s, couldn't find anything. Certainly, when a $dir fails, I get an RMS and ACP error which leads me down that path, but it seems to lead nowhere....sigh.
Volker Halle
Honored Contributor

Re: Logical name timeout on OVMS8.3?

Milton,

again, this timeout has nothing to do with Logical names, RMS, F11BXQP etc. so there are no knobs in OpenVMS itself to be turned. If the device would be a failed SCSI disk, the DIR command would actually hang, until the disk aborts mount-verification (after MVTIMEOUT seconds).

You would need to research and study the NFS timeout mechanisms e.g. using tcpdump and some tests to figure out, which timers may influence this behaviour.

Volker.
Milton Baar
Frequent Advisor

Re: Logical name timeout on OVMS8.3?

Hi Volker

I will use WireShark and see where I get....but how can I tell what timers to tweak? As far as I can tell from the OVMS NFS client documentation, I have already experimented with all of them. Could you give me some idea of how to tell, from a TCP trace, what timers may be involved other than those I already know?

Cheers
Volker Halle
Honored Contributor

Re: Logical name timeout on OVMS8.3?

Milton,

by watching the packets on the wire and looking at the timestamps, one may find certain packets repeating at certain intervals. Then it's time to look for timer settings in the various protocol levels.

OpenVMS does a DIR, which translates to a QIO to the mounted NFS disk. This will need to cause some TCPIP (RPC ?) traffic to the NFS server, probably repeating due to no responses received.

Volker.
John Gillings
Honored Contributor

Re: Logical name timeout on OVMS8.3?

Hi Milton,

Not a solution to the timeout setting, but something that might help.

When using systems that depend on network connections, you need to think in terms of network breakages being EXPECTED conditions, rather than rare exceptions. Your systems need to be able to detect and adapt to any connectivity issues.

Your issue is that after detection, your processes don't remember that a breakage has occurred, so have to rediscover it multiple times (which costs significant time). You need a way to flip a switch to place the system into a "degraded" state until the issue is resolved. You may then need a mechanism to recover any necessary state and reintroduce the lost resource.

Assuming this is a system wide logical name, you could have a process monitor the connections to the search list entries. If it finds one getting slow, or lost, redefine the logical name, either dropping the dead entry, or changing the order (putting it on the end of the list). Also send an alert to someone to fix whatever's wrong with it.

Getting back to the timeout... be careful about reducing it too far. You risk introducing the issue of false triggers. A one second delay to a network device is hopefully rare, but it's entirely possible it's a transient. If it were generally "a good plan"(tm) to set defaults that low, that's what they would be!
A crucible of informative mistakes
Milton Baar
Frequent Advisor

Re: Logical name timeout on OVMS8.3?

Hi John

Nice to hear from you after such a long time :)

The particular issue here is that I have a clustered system of DS10Ls serving a web site running WASD - so there is no real redundancy in the "common" file system as there is none - the data source lives on an NFS server and each cluster node gets data from it.

Although I have front-end redundancy (multiple DS10Ls), I want to get back-end redundancy and the simplest (?) way seems to be to have multiple NFS servers. The NFS servers are synchronised in real-time (sub-second anyway) and the site never changes between updates, so there is no issue with one WASD server showing different pages to the other.

Anyhow, I need some mechanism so that if one of the NFS servers dies, OVMS will default to the other without manual interaction. This is not possible within the WASD functionality and the best solution appeared to be a logical search list - sigh.

Of course, I could have a batch job running that, every second, checks that both NFS servers exists and if one doesn't, alters the system-wide logical name for the data source. But I was hoping for something more elegant within OVMS!!
Hoff
Honored Contributor

Re: Logical name timeout on OVMS8.3?

This is a re-post. The previous attempt to submit this (as has happened with ITRC) failed. Apologies for any duplicate postings.

The classic OpenVMS approach here would be a cluster, with round-robin DNS or load-balancing web appliances out front or other such distribution. You'd bring the contents of your storage on-line as cluster host(s) or storage. Not as NFS. Perhaps via FC SAN with some MSA widgets, or via HBVS with client-local spindles.

Or you'd see some form of replication; how that might work depends on what sort of data synchronization requirements are here. There are various replication offerings and options here, though comparatively few are seen on OpenVMS Alpha.

More recently, you might see hosting on EC2 with EBS, or with another (distributed) content provider. But that's the entire retirement of these WASD boxes.

I'd not expect to find searchlists of NFS devices used for this sort of thing. Which leads me to guess you have some particular reason here for using NFS and not clustering.

Ping HP customer support directly and see if they have suggestions beyond the NFS MOUNT /TIMEOUT and /RETRIES stuff. (The /CACHE stuff is around how long existing data within the client is stashed away for another I/O before a fresh fetch and a fresh copy of the data is needed; this is not the server timeouts, BTW. If the stuff is not changing often, then keeping the data bits in the cache is no big deal.)
Milton Baar
Frequent Advisor

Re: Logical name timeout on OVMS8.3?

Hi

Um, Hobbyist licenses on a small cluster, serving web pages for a dancing group. I do it because I have used OVMS since 1978 and still like to keep my skills up - or not, in this case :) So, no ability to use any of the "correct" technology, just some robust DS10Ls that run WASD and just don't stop, relatively immune from nasty people trying to do things, static web pages that only change a after a dancing competition, but over 200Gb of data (mainly photos, stretches back to 1997).

So, I realise this is not the way to do it, but I just don't have the ability to use the better/correct tools and technologies. Hence, in the spirit of all hackers/DECUS members (since 1979), I am trying to kludge/force something to work with what I have or what I can get :)
Hoff
Honored Contributor

Re: Logical name timeout on OVMS8.3?

Scrounge up a multi-host SCSI controller for these old AlphaServer DS10L boxes (for the open PCI slot) and some disk and a shelf? The hobbyist license has the license PAK for clustering and the other licenses you need here.

Or (if you're within shipping range, or have a good friend that is) one of the c. US$200 HP rx2600 boxes that have been available around, and retire the existing Alpha boxes.

Alpha prices are in free-fall (AlphaServer ES45 at US$500), and the Integrity prices are often low, so spending much on parts for old boxes is something you'll want to carefully consider.

Milton Baar
Frequent Advisor

Re: Logical name timeout on OVMS8.3?

Sigh - here in the unfashionable part of the universe - that is Australia, prices are high, the dollar is low, and shipping and import tax from the US is approximately the price of a MacPro!

I actually have 1.8Tb of shelf, but the FC cards, switches etc just makes it too expensive, and there are no cheap integrity units here...haven't seen a multi-host SCSI card, I will do some checking
Milton Baar
Frequent Advisor

Re: Logical name timeout on OVMS8.3?

So, I have now done a lot of reading in the last few days and also spent a couple of hours on Hoff's site, HP, OpenVMS.org and, of course, Google. It seems that:
1. OpenVMS has poor support for NFS clients and uses outdated versions (no surprise there) that perform poorly.
2. Ditto SMB client.
3. Ditto iSCSI, even if I was using Itanic (which I'm not and can't really afford to).

I just need 3 DS10Ls to access a 250Gb data source that will be replicated - for redundancy. That data source will be populated by OSX and Windows (whatever I am using at the time), so SMB/AFP/FTP are all methods to get data onto it. It currently sits on replicating FreeNAS servers because a) they work, b) they are cheap, c) it is a single manageable data repository that all cooperating OSs can share (Windows via SMB, OSX via SMB or AFP, OVMS via NFS). Easy to manage, maintain and backup.

Apart from migrating servers to Itanic (not happening - far too expensive), installing new SCSI cards and replacing the shelf with an MA500 (same problems as former option), or dumping OVMS/WASD and moving to the Dark Side, can anyone offer other suggestions? Again, this is a self-funded, non-income community service and I really want to try to do it within the technologies at hand, and I *really* want OVMS to be in there!
Hoff
Honored Contributor

Re: Logical name timeout on OVMS8.3?

Given there's apparently no SCSI gear here and given the ATA disk addressing limits (c. 137.4 gigabytes) within SYS$DQDRIVER, you're rather limited on what storage you can share in a cluster (and with what you can shadow) among your OpenVMS Alpha and AlphaServer DS10L boxes.

I went through a fairly similar decision process a while back, and posted up some notes from that. Though you are fairly well stuck here, given the available hardware gear and the budget and the application requirements and the current implementation. Or you can continue to operate with NFS and such.

The (used) HP rx2600 series boxes available in the US are running c. $200, plus the cost of a DVD and the requisite SCSI disks, and scrounging up an OpenVMS I64 DVD kit. The cost here is going to be the shipping. And if the aggregate costs including the shipping costs and the import fees are up in the Mac Pro range, well, then the various alternatives here probably aren't really fodder for this forum.

Milton Baar
Frequent Advisor

Re: Logical name timeout on OVMS8.3?

I think you are correct about the discussion on this thread. Certainly the reading and experimentation is always interesting, but there comes a point when, due to the South Pacific Peso (Australian Dollar to everyone else) and shipping, that the best solution is one that is unimplementable. So, thanks all for your help - I am off to play with DCL to see if it can identify the non-existence of an NFS share and adjust the logical names accordingly. f$getdvi doesn't help here as, strangely, a non-readable NFS share is still shown as EXISTS and AVL, so I will play with other lexicals or system calls.
Jeremy Begg
Trusted Contributor

Re: Logical name timeout on OVMS8.3?

Milton, if this is a hobbyist-licence system perhaps you should consider using MultiNet instead of TCP/IP services. In my experience, albeit somewhat biased (part of my business is providing technical support for Process Software here in Oz), MultiNet tries to do better than TCP/IP Services in most areas. (I know that's a pretty ambitious claim and I'm not going to debate it here!)

Of course changing TC/IP stacks is not a trivial exercise and I can't make any guarantee that MultiNet will give you a better result in this situation.

Regards,
Jeremy Begg
Milton Baar
Frequent Advisor

Re: Logical name timeout on OVMS8.3?

Hi Jeremy

Yes, this is the Irish Dancing site I host/run etc and this thread is the result of the email you probably received from Mark last night. I can always build a new PersonalAlpha test environment and put MultiNet on that, but then the migration on the remaining real cluster will be ......fun? Will keep it in mind as I play with DCL.
Milton Baar
Frequent Advisor

Re: Logical name timeout on OVMS8.3?

It appears that I am stretching the capabilities of something inside VMS, so I will leave well enough alone this time!