Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

VMS processes left hanging

 
SOLVED
Go to solution
The Brit
Honored Contributor

VMS processes left hanging

A user, using a docked laptop, opens up a session to my Itanium system (bl860c, OpenVMS 8.3-1H1, Tcpip Services 5/6 ECO 3), through a terminal emulator such as Reflections, and logs in.

Subsequently, it is necessary for them to undock, automatically switching them to Wireless, and dropping their telnet sessions as a new IP is acquired.

When they attempt to reconnect, the connection fails because

Username: baxterd
Password:
You are at maximum allowed processes for your user name

(This is a company-wide restriction (to 2 sessions per node) to stop licenses being locked up in idle sessions)

The salient point however is that, although undocking caused the telnet sessions to terminate, the VMS processes did/do not hangup, therefore requiring intervention by an admin to remove them.

Is there a System Parameter or TCPIP Sysconfig parameter that can be set to cause these sessions to hang up when the Telnet session terminates???

I know that Virtual Terminals are generally an option, however there have been bad experiences (before my time) here, which takes them off the table, at least for the time being. I will be more than happy to listen to conversations about Virtual Terminals, however I would really like to concentrate on alternative methods of fixing the problem, if there are any.

Thanks

Dave.
14 REPLIES 14
Jim_McKinney
Honored Contributor

Re: VMS processes left hanging

> cause these sessions to hang up when the Telnet session terminates???

Unless VMS' TCP stack is notified that the other end has disconnected it can't do anything - and evidently this is not occurring. Keepalives are a common method for controlling this but unfortunately Reflection (and many other emulators as well) does not support them directly though I think that you can modify something in the Windows registry to enable/control them.

> I would really like to concentrate on alternative methods of fixing the problem

Perhaps a workaround rather than a cure?

Have you considered possibly writing a bit of code - could even be DCL - that would run detached and perhaps PING your terminals' source addresses on some regular cycle and if not responsive take whatever remedial steps your admins currently do to terminate them?

Or perhaps you could have your SYLOGIN determine the maximum number of terminal sesssions the user was permitted, and, if they were exceeding, that offer them the opportunity to terminate one (or more)?

I suspect that there are numerous other ways to deal with this as well. You didn't say what state you find the processes in when the terminal has been disconnected nor how you terminate them. I presume that VMS is fat, dumb, and happy with a process in LEF or HIB and you need only STOP/ID or perhaps issue some database termination command?
The Brit
Honored Contributor

Re: VMS processes left hanging

Usually the sessions end up in LEF state. and in general, the response is Stop/ID.

What I don't understand is that we just migrated from Alpha to Itanium and changed stack from TCPWare to TCPIP Services.

Since the migration, this issue has arisen, and also a related issue where Telnet sessions appear to timeout after ~2 hours. This normally happens if the sessions are idle, however they don't have to be (although I have not seen one drop while actually entering commands). It is possible that the "timeout" issue is firewall-related.

To return to the initial problem. I will look into your suggestion of setting up a detached process to watch for dropped sessions, however this seems to be a very risky approach, i.e. could kill an important process.

Dave.
Jim_McKinney
Honored Contributor

Re: VMS processes left hanging

> It is possible that the "timeout" issue is firewall-related.


I'd bet on it. Leave a telnet session logged in and idle that does not have its connection pass through the firewall and see what happens.
Jim_McKinney
Honored Contributor

Re: VMS processes left hanging

> I will look into your suggestion of setting up a detached process to watch for dropped sessions

It is possible that the act of PINGing the source of a disconnected telnet session will clue the TCP stack in to the fact that the remote end is gone and drop the telnet session. You might just experiment with PING and wait and see if after a short time - or maybe even immediately - the telnet session drops. You might also try broadcasting something to the session that you suspect is gone. Or maybe use the SHARE privilege to open a channel to it and write something, anything, like a nul character perhaps and that might cause the session to drop.
Hoff
Honored Contributor

Re: VMS processes left hanging

Hi Dave.

Is this related to your earlier ....

http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1317706

The use of telnet-based virtual terminals and keepalive settings does look applicable here.
Jim_McKinney
Honored Contributor

Re: VMS processes left hanging

I agree with Hoff regarding keepalives being the typical solution for instances like yours (and you should ignore what I wrote previously regarding setting this on the client side - you really only care about whether or not the VMS end can see the PC - and when it doesn't, drop the connection). So you might set the keepalives on the VMS telnet service. But, keepalives might not be timely enough for you if your users expect to disconnect from the docking station and immediately log back in - and decreasing the timeout period to some really small value has the same risks you're concerned with relative to pinging a source port and terminating the associated session if it doesn't answer - momentary network outages might seem like the remote host has disconnected. The stack that I'm most familiar with is MultiNet and by default the keepalive idle time is 2 hours - this means that the keepalive probes don't begin until a connection has been idle for 2 hours and then up to 8 probes are sent every 75 seconds trying to get a response before terminating the connection.

Anyway, maybe you can have your SYLOGIN attempt to contact any connections already owned by someone logging in interactively and if they appear not viable let the user decide whether or not to terminate them. Or maybe just probing them will result in the stack terminating them for you.
Jim_McKinney
Honored Contributor

Re: VMS processes left hanging

I probably also should have said that use of virtual terminals here seems to be appropriate if you can find a way to get the local stack to recognize that the remote host has been disconnected.
The Brit
Honored Contributor

Re: VMS processes left hanging

Thanks Jim, Hoff.

While I agree that VT might be the most appropriate way to go, I am struggling with the problem "This didn't happen before the cutover, and we weren't using VT's then"

I guess my attitude is that we shouldn't have to go that route to stop something that wasn't happening before.

At the moment, I am looking at the settings of

tcp_keepalive_default = 0
tcp_keepcnt = 8
tcp_keepidle = 14400
tcp_keepinit = 150
tcp_keepintvl = 150

in sysconfig. These values are the defaults. In particular, "tcp_keepalive_default = 0" (turned off ???)

would you have any recommendations for appropriate values??

Dave
Jim_McKinney
Honored Contributor

Re: VMS processes left hanging

> would you have any recommendations for appropriate values??

When turning this knob one has to weigh the possibility of terminating a valid process against discovery of no longer viable connections in a timely manner. How quickly do you need to discover that a process has been disconnected? Keepalives are generally not used for rapid response situations. I would expect that your current values, if enabled, are interpretted thusly

tcp_keepidle = 14400
tcp_keepintvl = 150
tcp_keepcnt = 8

After 14400 half seconds (2 hours) of idleness, begin sending keepalive polls every 150 half seconds (75 seconds) as many as 8 times hoping to get a response (times are in 500 millisecond or half second units). If this total time passes without a response back then terminate the connection. With this config it would take 2:10 minutes of idle time before a session would drop.

I suspect that tcp_keepalive_default = 0 does indicate that keepalives are "off". That is the default behaviour from most stacks. It is usually enable on an application by application basis - in this instance "telnet".

I do think that enabling keepalives for telnet will help you with catching and terminating disconnected sessions. But, you need to consider the reliablility of your network when lowering these values or risk terminating viable telnet connections. You don't want a short network hiccup to result in all telnet sessions being terminated because they couldn't respond during a too short keepalive idle+intervals duration. If you're trying to catch a disconnect immediately in order to permit a user to disconnect from their docking station and immediately re-connect then this probably isn't the right tool. You want to be fairly conservative in configuring it and be considerate of the health of the networks that your telnet users traverse.
Hoff
Honored Contributor

Re: VMS processes left hanging

Given there was an IP stack swap, it seems reasonable to assume that a keepalive was previously implemented and that the setting was lost when the stack switch occurred.

And I'd tend to enable the virtual terminals in any case.
Matt Muggeridge
Occasional Advisor
Solution

Re: VMS processes left hanging

The following text is planned for the next release of TCP/IP Services. It may help clarify a few details. (This is draft-only).

Virtual Terminals
=================

Overview
--------
Virtual terminals allow a user to seamlessly continue an interactive
terminal session across a network disconnect. This is achieved by the
system saving the virtual terminal's process context at the time of the
disconnect. When the user establishes a new login session, they will
be prompted to connect to any pre-existing virtual terminals, and so
seamlessly continuing their session.

Both TELNET and RLOGIN may be configured to use virtual terminals.

When virtual terminals have been enabled, the user's terminal will now
appear as a VTA device, rather than a TNA device. This can be observed
using the SHOW TERMINAL command.

NOTE: Virtual terminals will not be created for users with
communication proxies. The terminal type will continue to be TNA.

Managing Virtual Terminals and Logical Names
--------------------------------------------
Follow the steps below to enable and manage virtual terminals:

1) Create the VTA device (if it is not already created)

The TTDRIVER may be loaded dynamically via:

$ SHOW DEV VTA0:
%SYSTEM-W-NOSUCHDEV, no such device available

If the VTA0 device does not exist, then create it using SYSMAN:

$ RUN SYS$SYSTEM:SYSMAN
SYSMAN> IO CONNECT VTA0 /NOADAPTER /DRIVER=SYS$TTDRIVER

The SYS$STARTUP:SYSTARTUP.COM procedure contains the template for
loading the TTDRIVER during system startup.


2) Enable virtual terminals for the service
For TELNET, use:

$ DEFINE/SYSTEM/EXEC TCPIP$TELNET_VTA "TRUE"

For RLOGIN, use:

$ DEFINE/SYSTEM/EXEC TCPIP$RLOGIN_VTA "TRUE"

It is recommended this be placed in one of the system startup
command procedures, e.g. SYS$STARTUP:TCPIP$SYSTARTUP.COM.


3) Allow Terminal Disconnect when a Hangup Occurs

This is achieved by modifying the SYSGEN parameter, TTY_DEFCHAR2 to
enable the TT2$M_DISCONNECT feature on the terminal.

Edit MODPARAMS.DAT to set the TT2$M_DISCONNECT bit. For example, in
MODPARAMS.DAT add a line similar to:

! Set AUTOBAUD + EDITING + DISCONNECT.
MIN_TTY_DEFCHAR2 = 2 + %X1000 + %X20000

To take affect, this will require an AUTOGEN and reboot of
the system.


4) Disconnected terminals are deleted after TTY_TIMEOUT

Virtual terminals in the disconnected state will be automatically
deleted after the sysgen TTY_TIMEOUT interval has expired.

By default, the TTY_TIMEOUT value is 900 seconds, (15 minutes).
This means that after the TCP keepalive has expired and the TCP
connection closes, the user has an additional 15 minutes to
reestablish the virtual terminal before it is disconnected.
If this is inadequate, it may be modified by editing MODPARAMS.DAT
and using AUTOGEN. Note that this parameter can be modified
dynamically. For example edit MODPARAMS.DAT with a line similar to:

MIN_TTY_TIMEOUT = 60 * 60 * 24 * 14 ! 14 days


Modifying Disconnect Time
-------------------------
When a communication path is broken, the time it takes for TCP connections
to be closed is affected by several factors. The network administrator
can modify the time it takes for TCP to detect a network outage by
adjusting the keepalive attributes.

Faster detection of network outages may be desirable when using virtual
terminals. For instance, after the keepalive timeout, the user can telnet
back into the system, (probably via another path), to continue working in
the previously disconnected session.

For more information, refer to the tuning and troubleshooting guide where it
discusses the keepalive attributes, tcp_keepidle, tcp_keepintvl, tcp_keepcnt.
Note that these attributes will affect all subsequent connections on the
system.

To dynamically modify the sysconfig attributes, use:

$ @sys$manager:tcpip$define_commands
$ sysconfig -r inet tcp_keepalive_default=1 ! Enable keepalives
$ sysconfig -r inet tcp_keepidle=150 ! 75 seconds

The services must be restarted to make use of these dynamically modified
attributes. E.g. to restart TELNET:

$ @SYS$STARTUP:TCPIP$TELNET_SHUTDOWN
$ @SYS$STARTUP:TCPIP$TELNET_STARTUP

For permanent changes, it is recommended that the sysconfig attributes be
modified in TCPIP$ETC:SYSCONFIGTAB.DAT. E.g. add a stanza similar to:

inet:
tcp_keealive_default=1
tcp_keepidle=150


Example of Virtual Terminals
----------------------------
With virtual terminals enabled, a user's interactive login session
will display a VTA terminal type.

$ telnet hang10
Username: rider
Password: xxxx

HANG10> write sys$output f$getdvi("TT", "DEVNAM")
_VTA1:
HANG10> disconnect

By disconnecting the session without logging out, (e.g. close the
window or issue DISCONNECT), the VTA1: will persist. When a subsequent
login occurs, the user is prompted to reestablish their connection to
the virtual terminal, e.g.:

$ telnet hang10
Username: rider
Password: xxxx

You have the following disconnected process:
Terminal Process name Image name
VTA1: _VTA1: (none)
Connect to above listed process [YES]:

In addition, you may use the DCL commands DISCONNECT and CONNECT. For
example:

$ show device vt

Device Device Error
Name Status Count
VTA0: Offline 0
VTA2: Online 0
VTA3: Disconnected 0

$ connect vta3
HANG10> write sys$output f$getdvi("TT", "DEVNAM")
_VTA3:

From this point onward, you are now in the VTA3: process context.

Refer to the DCL help for more information on the CONNECT and
DISCONNECT commands.
Hein van den Heuvel
Honored Contributor

Re: VMS processes left hanging

Hi Matt,

Thanks for the heads up in the ITRC Forum!
A few quick question (Sorry Dave for hijacking the topic a little)

1) You mention TELNET and RLOGIN, but not SSH.
Maybe you want to be explicit about SSH in the article?

2) I often see TT2$M_HANGUP recommended along with TT2$M_DISCONNECT.
That may be misinformed. And it maybe a cause / effect confusion.
It maybe good to indicate whether it is relevant or not for a TCP/IP connection.

3) I much like the self-documenting definition: 2 + %X1000 + %X20000
But why not have the decimal result there as well ( = 135170 ) for the hardcore direct SYSGEN users to set it, but more importantly for folks who want to see whether the current definition matches the recommended definition

Thanks!
Hein.

Matt Muggeridge
Occasional Advisor

Re: VMS processes left hanging

>The SYS$STARTUP:SYSTARTUP.COM procedure >contains the template for
>loading the TTDRIVER during system startup.

Of course, that should have been SYS$STARTUP:SYSTARTUP_VMS.COM.

Matt.
PS: Thanks to Heins feedback. We have taken that up in email and I've incorporated feedback where applicable.
Kelly Stewart_1
Frequent Advisor

Re: VMS processes left hanging

We had a similar problem - users walking out of wireless range while logged in - and fixed it by setting the Telnet keepalive timer to 10 minutes. We use Multinet which allows a different keepalive setting for each TCP/IP service (that's probably true of all IP stacks) and Extra! Personal Client on the PCs. Seems to work fine for us.

Kelly