Operating System - OpenVMS
1748047 Members
4842 Online
108757 Solutions
New Discussion юеВ

Re: VMS processes left hanging

 
SOLVED
Go to solution
The Brit
Honored Contributor

VMS processes left hanging

A user, using a docked laptop, opens up a session to my Itanium system (bl860c, OpenVMS 8.3-1H1, Tcpip Services 5/6 ECO 3), through a terminal emulator such as Reflections, and logs in.

Subsequently, it is necessary for them to undock, automatically switching them to Wireless, and dropping their telnet sessions as a new IP is acquired.

When they attempt to reconnect, the connection fails because

Username: baxterd
Password:
You are at maximum allowed processes for your user name

(This is a company-wide restriction (to 2 sessions per node) to stop licenses being locked up in idle sessions)

The salient point however is that, although undocking caused the telnet sessions to terminate, the VMS processes did/do not hangup, therefore requiring intervention by an admin to remove them.

Is there a System Parameter or TCPIP Sysconfig parameter that can be set to cause these sessions to hang up when the Telnet session terminates???

I know that Virtual Terminals are generally an option, however there have been bad experiences (before my time) here, which takes them off the table, at least for the time being. I will be more than happy to listen to conversations about Virtual Terminals, however I would really like to concentrate on alternative methods of fixing the problem, if there are any.

Thanks

Dave.
14 REPLIES 14
Jim_McKinney
Honored Contributor

Re: VMS processes left hanging

> cause these sessions to hang up when the Telnet session terminates???

Unless VMS' TCP stack is notified that the other end has disconnected it can't do anything - and evidently this is not occurring. Keepalives are a common method for controlling this but unfortunately Reflection (and many other emulators as well) does not support them directly though I think that you can modify something in the Windows registry to enable/control them.

> I would really like to concentrate on alternative methods of fixing the problem

Perhaps a workaround rather than a cure?

Have you considered possibly writing a bit of code - could even be DCL - that would run detached and perhaps PING your terminals' source addresses on some regular cycle and if not responsive take whatever remedial steps your admins currently do to terminate them?

Or perhaps you could have your SYLOGIN determine the maximum number of terminal sesssions the user was permitted, and, if they were exceeding, that offer them the opportunity to terminate one (or more)?

I suspect that there are numerous other ways to deal with this as well. You didn't say what state you find the processes in when the terminal has been disconnected nor how you terminate them. I presume that VMS is fat, dumb, and happy with a process in LEF or HIB and you need only STOP/ID or perhaps issue some database termination command?
The Brit
Honored Contributor

Re: VMS processes left hanging

Usually the sessions end up in LEF state. and in general, the response is Stop/ID.

What I don't understand is that we just migrated from Alpha to Itanium and changed stack from TCPWare to TCPIP Services.

Since the migration, this issue has arisen, and also a related issue where Telnet sessions appear to timeout after ~2 hours. This normally happens if the sessions are idle, however they don't have to be (although I have not seen one drop while actually entering commands). It is possible that the "timeout" issue is firewall-related.

To return to the initial problem. I will look into your suggestion of setting up a detached process to watch for dropped sessions, however this seems to be a very risky approach, i.e. could kill an important process.

Dave.
Jim_McKinney
Honored Contributor

Re: VMS processes left hanging

> It is possible that the "timeout" issue is firewall-related.


I'd bet on it. Leave a telnet session logged in and idle that does not have its connection pass through the firewall and see what happens.
Jim_McKinney
Honored Contributor

Re: VMS processes left hanging

> I will look into your suggestion of setting up a detached process to watch for dropped sessions

It is possible that the act of PINGing the source of a disconnected telnet session will clue the TCP stack in to the fact that the remote end is gone and drop the telnet session. You might just experiment with PING and wait and see if after a short time - or maybe even immediately - the telnet session drops. You might also try broadcasting something to the session that you suspect is gone. Or maybe use the SHARE privilege to open a channel to it and write something, anything, like a nul character perhaps and that might cause the session to drop.
Hoff
Honored Contributor

Re: VMS processes left hanging

Hi Dave.

Is this related to your earlier ....

http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1317706

The use of telnet-based virtual terminals and keepalive settings does look applicable here.
Jim_McKinney
Honored Contributor

Re: VMS processes left hanging

I agree with Hoff regarding keepalives being the typical solution for instances like yours (and you should ignore what I wrote previously regarding setting this on the client side - you really only care about whether or not the VMS end can see the PC - and when it doesn't, drop the connection). So you might set the keepalives on the VMS telnet service. But, keepalives might not be timely enough for you if your users expect to disconnect from the docking station and immediately log back in - and decreasing the timeout period to some really small value has the same risks you're concerned with relative to pinging a source port and terminating the associated session if it doesn't answer - momentary network outages might seem like the remote host has disconnected. The stack that I'm most familiar with is MultiNet and by default the keepalive idle time is 2 hours - this means that the keepalive probes don't begin until a connection has been idle for 2 hours and then up to 8 probes are sent every 75 seconds trying to get a response before terminating the connection.

Anyway, maybe you can have your SYLOGIN attempt to contact any connections already owned by someone logging in interactively and if they appear not viable let the user decide whether or not to terminate them. Or maybe just probing them will result in the stack terminating them for you.
Jim_McKinney
Honored Contributor

Re: VMS processes left hanging

I probably also should have said that use of virtual terminals here seems to be appropriate if you can find a way to get the local stack to recognize that the remote host has been disconnected.
The Brit
Honored Contributor

Re: VMS processes left hanging

Thanks Jim, Hoff.

While I agree that VT might be the most appropriate way to go, I am struggling with the problem "This didn't happen before the cutover, and we weren't using VT's then"

I guess my attitude is that we shouldn't have to go that route to stop something that wasn't happening before.

At the moment, I am looking at the settings of

tcp_keepalive_default = 0
tcp_keepcnt = 8
tcp_keepidle = 14400
tcp_keepinit = 150
tcp_keepintvl = 150

in sysconfig. These values are the defaults. In particular, "tcp_keepalive_default = 0" (turned off ???)

would you have any recommendations for appropriate values??

Dave
Jim_McKinney
Honored Contributor

Re: VMS processes left hanging

> would you have any recommendations for appropriate values??

When turning this knob one has to weigh the possibility of terminating a valid process against discovery of no longer viable connections in a timely manner. How quickly do you need to discover that a process has been disconnected? Keepalives are generally not used for rapid response situations. I would expect that your current values, if enabled, are interpretted thusly

tcp_keepidle = 14400
tcp_keepintvl = 150
tcp_keepcnt = 8

After 14400 half seconds (2 hours) of idleness, begin sending keepalive polls every 150 half seconds (75 seconds) as many as 8 times hoping to get a response (times are in 500 millisecond or half second units). If this total time passes without a response back then terminate the connection. With this config it would take 2:10 minutes of idle time before a session would drop.

I suspect that tcp_keepalive_default = 0 does indicate that keepalives are "off". That is the default behaviour from most stacks. It is usually enable on an application by application basis - in this instance "telnet".

I do think that enabling keepalives for telnet will help you with catching and terminating disconnected sessions. But, you need to consider the reliablility of your network when lowering these values or risk terminating viable telnet connections. You don't want a short network hiccup to result in all telnet sessions being terminated because they couldn't respond during a too short keepalive idle+intervals duration. If you're trying to catch a disconnect immediately in order to permit a user to disconnect from their docking station and immediately re-connect then this probably isn't the right tool. You want to be fairly conservative in configuring it and be considerate of the health of the networks that your telnet users traverse.