Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

copy/ftp not responding

 
SOLVED
Go to solution
tim lloyd_1
Frequent Advisor

copy/ftp not responding

Hi All, I maintain a system which consists of 4 Itanium servers running on Open VMS8.2-1.
The application is restarted every morning. One of the servers is classed as the “master” server and the other 3 are real time backups.

Typically the master would start up at approx 6:30am with the other servers following. The location of the servers is split between a master site (where the master and another server live) and a recovery site where the other 2 servers are situated.

So, if the master dies, its partner should take over immediately. If the master site dies the servers at the recovery site should be able to take over immediately.

There are a couple of mechanisms for transferring data between servers: messages coming from external sources go to the master and are then distributed to the other servers via the Ethernet. But my concern is user data which is passed via the “copy/ftp” command.

FTP just works! Or that is the way the system is designed. At several points of the day “copy/ftp” sends information from one server (not necessarily the master) to all other servers.

One day the customer had huge network problems. The link between master and recovery site was down for several hours. The link was restored eventually and the system continued with the disaster recovery capability enabled and FTP working.

But, at end of day I was alerted to the fact that the system “locked up”: the operators tried to initiate the end of day activity and their interface did not respond – the screen hung.

After a bit of digging around I realised that an FTP job was sitting in the batch queue. The operators initiate end of day activity on the master server. Before this server can begin its own processing it uses FTP to alert all other servers to begin their activity.

I tried to FTP from DCL on the master to another server and the job hung also – I didn’t get an error returned it simply sat there. I can’t remember how long I waited but I would say 5-10 minutes before I used CRTL-C to kill this action.

This is the first and only time I have ever seen this problem. I am convinced that the customer had started to investigate their earlier problem and the knock on of that affected me. The silence on their part begins to confirm this.

BUT, my experience with FTP is that it either works or it doesn’t work. The command issued is:
COPY/FTP filename node username password::dest_file_name
This is part of a command file which is submitted to the SYS$BATCH
I would be grateful if anyone can advise:
• Why would FTP hang?
• I don’t see a timeout qualifier on COPY/FTP. Is there a more prudent way to use FTP?

Thanks
10 REPLIES 10
Steven Schweda
Honored Contributor

Re: copy/ftp not responding

> FTP just works! [...]

Sure it does. But, for a VMS-to-VMS COPY
operation, I'd tend to use DECnet rather than
FTP. (Or, in a cluster, plain-old COPY
to/from a served/shared disk.)

> [...] Open VMS8.2-1.

TCPIP SHOW VERSION

> [...] Why would FTP hang?

DNS problems? Network problems? Many things
are possible.

HELP COPY /FTP /VERBOSE

Hard to say what happens without some info on
what it's doing when it happens.
tim lloyd_1
Frequent Advisor

Re: copy/ftp not responding

Hi, apparently DECNET was used before but FTP was adopted when the system migrated from Vax to Itanium. Unfortunatly, as this is largely seen as a legacy system, it is unlikely a fundemental change such as this will be made.

Your thoughts on why FTP doesn't work sort of correspond with mine. That is - it could be any one of a number of reasons.

My remit is to support the host system and let the network guys field network problems. I would though like to ensure the host is as informative as it can be. I will investigate the /VERBOSE switch.


I did check the TCPIP information:

$ tcpip show version

HP TCP/IP Services for OpenVMS Industry Standard 64 Version V5.5
on an HP rx2600 (1.40GHz/1.5MB) running OpenVMS V8.2-1
Steven Schweda
Honored Contributor

Re: copy/ftp not responding

> HP TCP/IP Services for OpenVMS Industry Standard 64 Version V5.5

I see ECO 3 for that one:

ftp://ftp.itrc.hp.com/openvms_patches/layered_products/i64/

Or, perhaps V5.6 ECO 5, if it's suitable.
(That's what's on my V8.3-1H1 system.)

No bets that either would matter, but a look
might pay off. The release notes seem to be
hidden in the kits, not available separately
from the FTP server. (Which, of course, is
itself set to disappear soon, so get those
patches while you can.)
John Gillings
Honored Contributor

Re: copy/ftp not responding

Tim,

Check the time period of the "hang". No distrespect intended, but in my experience, when a user says "I would say 5-10 minutes", it's usually significantly less. A DNS timeout can be several minutes, especially if there are several name servers defined (timeout on each in turn). To prove the point, start the COPY and go to lunch. If it's still hanging when you get back, I'll believe it really is hung!

You need to check that both nodes can see ALL their name servers, and that they can successfully translate each other's node names in both directions (name->number and number->name).

> I donâ t see a timeout qualifier on
>COPY/FTP. Is there a more prudent way
>to use

There are several possible timeouts, but they're defined system wide in TCPIP. As a hack, (and assuming IMCP messages are allowed), you could check PING connectivity prior to starting the COPY (but if it's a DNS issue, you'll possibly suffer the same delay).

A crucible of informative mistakes
Jeremy Begg
Trusted Contributor
Solution

Re: copy/ftp not responding

Hi Tim,

I've seen similar problems on systems using COPY/FTP with MultiNet. Usually it was a problem on the FTP server but that didn't explain all occurrences.

In the end I wrote a program which would issue a $FORCEX on the COPY/FTP. You can pick up a copy of this from
ftp://ftp.vsm.com.au/kits/timeout.zip

Here's how I used it:

$ timeout := $ute:timeout.exe
$ timeout 00:02:00 /forcex
$ copy/ftp/log 'sourcefile' remote.node::'destfile'
$

If the COPY/FTP takes more than two minutes it will be forced to exit. Depending on the job, you could replace /FORCEX by /DELPRC or /KILLJOB. See TIMEOUT.PAS in the zipfile for details.

Regards,
Jeremy Begg
Jim_McKinney
Honored Contributor

Re: copy/ftp not responding

I share Jeremy's experience (also with MultiNet). It seems that FTP client software can occasionally get into a state where it is waiting for some (unknown to me) event that (apparently) won't occur and thus never issues a subsequent socket IO that could tell it that the connection is no longer viable and that the stack has timed the connection out. So, it just sits. My solution was to embed the FTP within a wrapper program where one thread performed the FTP and the other monitored the FTP for resource consumption (CPU/DIO/BIO) - if the progress stalled for some period of time the monitor thread aborted the FTP thread. This condition was rare but did occur periodically (100+ sites doing hundreds of FTPs daily), typically after some sort of network disruption along the FTP path.
labadie_1
Honored Contributor

Re: copy/ftp not responding

On the remote node, FTP should spawn a process with a name TCPIP$FTPxxxxx, with xxxxx such as C0000E

Try acc/fu/since=just before the copy/ftp to see if you have a weird

Do on the remote node repeated

$ sh sys/net
during the copy/ftp

and check to see the final status code
Check the file tcpip$ftp.log in the account of the remote node.

By the way, you should leave VMS 8.2-1, and go to 8.3 (or 8.3-1H1) with the latest patches, but this may be completely unrelated to your problem.
Hoff
Honored Contributor

Re: copy/ftp not responding

The first time you've seen it, maybe.

Far from the first case here.

That stinking and festering and absurd and hideous pile of galactic-scale stupid known as ftp isn't the first suspect here. Your network hardware is. Then your IP. Then your routers. Your switches are absolutely suspect. VLANs and firewalls, too. As is your DNS. As are your network duplex settings.

Trust your network staff, but remember to verify what you're told.

Here's the most recent discussion of ftp as a "canary" for a lower-level issue:

http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1439155

And no, I'm not a fan of (gag) ftp.
Robert Gezelter
Honored Contributor

Re: copy/ftp not responding

Tim,

As has been noted there are a variety of possible causes. If the cause is not obvious, please consider using a LAN monitor such as WireShark to see what is actually happening on the network.

I have seen these sort of problems caused by DNS, switches, firewalls, routers, LOGIN.COM, SYLOGIN.COM, disk problems, and numerous other causes. In the end, the LAN trace is often a useful step (to rule in/out network issues). I would also check the logs on the remote system (both FTP and accounting) to see if there is any information there.

- Bob Gezelter, http://www.rlgsc.com
tim lloyd_1
Frequent Advisor

Re: copy/ftp not responding

the comments provided have provided invaluable information in closing this issue with the customer.