Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Network file copy hangs

SOLVED
Go to solution
smsc_1
Regular Advisor

Network file copy hangs


Good morning all,
I'm going crazy with a OpenVMS network copy problem.

I need to copy files via network between Alpha Server ES40 and Itanium RX3600. Both machine has OpenVMS as Operation System (ES40 = Version 7 and Itanium version 8).

Systems are properly connected through Catalyst 3750 and they have no problem on network settings.


The fact:
ES40 gets file from RX3600, then If I try to MANUALLY copy/get 1 file using FTP or DECNET I have no problem. I'm using following command on ES40 (BIWIXX):

copy SMWIXX::SYS$LOGIN:FILE.DAT SYS$LOGIN:FILE.DAT /LOG

IT WORKS!!!!

So, since I need to copy these files every 5 minutes, I created a script with above command. The script is very simple. Submit itself each 5 minutes and copy the file.

For the first 3 o 4 times copy is ok, THEN COPY HANGS!!!!! Also the script remain in EXECUTING, and the only way is kill it manually.

I tried also using FTP copy like "COPY /FTP....." same problem...

Please help!
Thanks

./ Lucas
11 REPLIES
marsh_1
Honored Contributor
Solution

Re: Network file copy hangs

hi,

welcome to the itrc openvms forum.

you say the copy hangs , have you done a show process/cont/id=nnnn what does it show ?
a simple script - could you post this up
when this happens are network communications between the systems normal - try continuous ping or dtsend in decnet from both ends.
what else uses 'sys$login:file.dat' that you have to copy this file every 5 minutes is that process accessing/updating the file ?
have you tried a set proc/dump=now/id=nnnn
and done an anal/proc on it ?
what exact versions of vms and tcpip services are you using and patch levels ?




Volker Halle
Honored Contributor

Re: Network file copy hangs

smsc,

look at the 'hung' batch job with SDA:

$ ANAL/SYS
SDA> SET PROC/ID=
SDA> SHOW PROC/CHAN
... any 'busy' devices ?
SDA> SHOW PROC/LOCK
... any 'waiting' lock ?
SDA> EXIT

What is the state of the process ?

$ SHOW SYS/PROC=

Does $ MC LANCP SHOW DEV/INT show any problems on the LAN interfaces on each node ? Are speed/duplex settings o.k. ?

From your description, the problem affects both DECnet and TCPIP, so it must either be a problem in the lower layers (LAN) or higher up (XQP, disk, quotas ?).

Volker.
Robert Gezelter
Honored Contributor

Re: Network file copy hangs

smsc,

Welcome to the ITRC OpenVMS Forum!

It may be counter-intuitive, but this is likely an example of "Just because it works, does not mean that it is not broken".

As has been noted by Mark and Volker, many software settings can cause "hangs". In my personal experience, I have often seen this caused by a mis-configured network, particularly issues with mis-matches between half/full duplex settings. These cause packets to be unseen and thus dropped, resulting in long timeouts.

This problem can be very variable depending upon load. It is not uncommon for some tests to work, and others to fail, which is quite confusing.

For a start, check each piece of equipment in the path and ensure that it is set to FULL duplex. Other than sheer analysis, WireShark or another LAN analyzer is often my tool of choice to understand where/when the packet is being dropped.

On the software side, start by checking the SYLOGIN.COM and LOGIN.COM files. Login processing that is inappropriate for network file copies can also cause problems.

As usual, more informatiion than "hung" is needed to diagnose.

- Bob Gezelter, http://www.rlgsc.com
smsc_1
Regular Advisor

Re: Network file copy hangs


Thanks to all for a very precious tricks.
Now I can analyze better the problem, and then report here the result and more information as soon as possible.

Again Thanks!
Gianluca Rossi
Italy/Milan
./ Lucas
smsc_1
Regular Advisor

Re: Network file copy hangs

Q: you say the copy hangs , have you done a show process/cont/id=nnnn what does it show ?

A: I don't have pid (nnnn), since copy command hangs when script is in queue, and this is the result:
Entry Jobname Username Status
----- ------- -------- ------
311 GET_LOGSRV_NEW SMSC Executing
314 GET_LOGSRV_NEW SMSC Executing
317 GET_LOGSRV_NEW SMSC Executing

Scripts in queue remain in Executing and if I check the related log file I see last command as COPY.....
Script is very simple, just submit itself every 5 minutes and perform a copy operation.

When the copy hangs no matter for remote file, it will be copied by next script execution. THE ONLY PROBLEM is the multiple instance in executing.
Is there a way to automatically kill it?? Otherwise I need to manually kill every times!!!




Q: when this happens are network communications between the systems normal - try continuous ping or dtsend in decnet from both ends.
A: Yes, network comunication is ok. I'm currently and remotely connected to both machines:





Q: have you tried a set proc/dump=now/id=nnnn and done an anal/proc on it ?
A: Again, I don't have a pid to analyze. Is there a way to find it?



Q: what exact versions of vms and tcpip services are you using and patch levels ?

OpenVMS V7.3-1 = ES40
TCP VER: Compaq TCP/IP Services for OpenVMS Alpha Version V5.3 - ECO 2

OpenVMS V8.3-1H1 = RX3600
TCP VER: HP TCP/IP Services for OpenVMS Industry Standard 64 Version V5.6 - ECO 2





Q: Does $ MC LANCP SHOW DEV/INT show any problems on the LAN interfaces on each node ? Are speed/duplex settings o.k. ?
A: Yes, speed is ok.
The only issue should be:
ES40 = SPEED100 FULL DUPLEX <-> SPPED100 FULL DUPLEX Catalyst3750 SPEED1000 FULL DUPLEX <-> SPEED1000 FULL DUPLEX RX3600

Is it clear????




$ ANAL/SYS
SDA> SET PROC/ID=
SDA> SHOW PROC/CHAN
... any 'busy' devices ?
SDA> SHOW PROC/LOCK
... any 'waiting' lock ?
SDA> EXIT

Can you please explain me how find ?? With "show queue/all" I don't see any pid, only entry nr!!!
./ Lucas
Volker Halle
Honored Contributor

Re: Network file copy hangs

Gianluca,

use $ SHOW SYS/BATCH to find the currently executing batch jobs in your system. The process name may contain the batch entry number, e.g. BATCH_311. This will give you the Process-ID of your batch-jobs and also show you the state.

With a little bit of DCL programming (F$GETQUI), you could find the 'other' GET_LOGSRV_NEW batch jobs in the queue and DELETE them.

$ MC LANCP SHOW DEV/INT will show the recent messages issued by the LAN driver at the bottom of the display. If there are no 'possible duplex mismatch' errors, the speed settings should be o.k.

Now that you know ho to obtain the process-ids, you can answer the other questions as well...

Volker.
Hein van den Heuvel
Honored Contributor

Re: Network file copy hangs

>> Can you please explain me how find ?? With "show queue/all" I don't see any pid, only entry nr!!!

Use $ SHOW SYSTEM /BATCH
That will allow you to see all batch processes with their PID and Process Names.

If you did not do anything special, then the batch jobs process names will be BATCH_.

This can be used in ANAL/SYS... SET PROC BATCH_12345

But before you go digging that deep, and look for something that perhaps is broken, why not assume you are simply doing something wrong.

Let's see that SCRIPT!
Is there a WAIT or SYNC command in there?


>> For the first 3 o 4 times copy is ok, THEN COPY HANGS

Since the jobs remain executing you are likely to run into the JOB_LIMIT for the batch queu being used.
Check with $SHOW QUE /FULL

hth,
Hein.

marsh_1
Honored Contributor

Re: Network file copy hangs

smsc,

the log file will only show more when the job has has filled up it's buffer and flushed it down to the log file or it completes.
do need to see the script though, it looks like you do the resubmit before the copy has completed ?
you have'nt said what creates/writes to this file.dat

marsh_1
Honored Contributor

Re: Network file copy hangs

smsc,

also on the system you are getting the file from could you do a
$ sh dev/files 'disk_where_file_is'
to see whether anything is accessing that file ?

Hein van den Heuvel
Honored Contributor

Re: Network file copy hangs

smsc,

How about you replace your copy to an other node with a copy to the NULL devlce (NL:) and see if the script itself work huh?

Hein.
smsc_1
Regular Advisor

Re: Network file copy hangs


Ok definitely issue solved, let me explain what's the problem. ES40 is a two node's cluster (NODE1 + NODE2). Copy Command hangs only when issued on NODE2. Scripts hangs only when start on NODE2_BATCH.

Since script was submitted on SYS$BATCH, system starts it alternately on NODE1_BATCH (OK) and NODE2_BATCH (NOT OK).

So I checked NODE2 Ethernet interface's configuration and.... bang! Interface was set to SPEED 100 HALF DUPLEX!

That's why sometimes copy was ok and sometimes copy hangs!!!


*** LET ME THANKS ALL PEOPLE THAT DRIVE ME TO THE SOLUTION LEARNING A LOT OF COMMANDS ***
10 points to all people that reply me and explain how to solve the issue!
./ Lucas