Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

SSH Users Dropped

 
Steve Calnek
Advisor

SSH Users Dropped

Hi,

I have an Alpha Server DS10 running OVMS 8.3 with TCPIP Services 5.6-ECO2.

We use SSH to allow PC terminal connections to the server in a secure LAN.

On a regular day I'll have 3 to 8 users, different one's on different days, tell me that they were dropped in the middle of data entry. They can log back in right away.

I've verified that its during data entry and not idle timeouts.

I've tried several different emulator programs and the results are the same. The network is not anywhere close to capacity, all PC are new(ish). There are maybe 35 users logged in at anyone time.

The SSH documentation really offers nothing and what log files there are don't show anything.

Any ideas, all are welcome.

Thanks.

Steve.
19 REPLIES 19
Steven Schweda
Honored Contributor

Re: SSH Users Dropped

Are users of other services (rsh, Telnet)
similarly affected? Any reason not to
suspect some generic network (hardware,
firewall, ...) problem?

Run a continuous "ping" to a problem client
system, and record the results to compare
with failure complaints?
Steve Calnek
Advisor

Re: SSH Users Dropped

My first thought had been a firewall problem but I'm assured by the network manager that thats not it.

There are no telnet sessions on this LAN but prior to upgrading to 8.3 of OVMS we used and open source port of an SSH1 server that I think was done by David Jones. We never had any problems with these random drops. This is new since we upgraded a month ago.

There are a lot of other network services that would be effected if it was server hardware and I've not seen anything in there logs to indicate a problem.

I will try the continuous ping. Thanks for the idea.

Steve/
Volker Halle
Honored Contributor

Re: SSH Users Dropped

Steve,

does the OpenVMS accounting file (process or image accounting records) show any specific error status if the user is dropped ? If so, you could obtain exact date and time data about the dropped connections.

Volker.
Wim Van den Wyngaert
Honored Contributor

Re: SSH Users Dropped

There is a server config item called IdleTimeOut. Is it active ?

Wim
Wim
Volker Halle
Honored Contributor

Re: SSH Users Dropped

Steve,

after logging into our Alpha via TELNET, then using SSH to login to our rx2600 and then killing the TELNET-connection, the interactive process on the rx2600 is (according to accounting) terminated with:

Final status code: 0001C0F4
Final status text: %RMS-F-RER, file read error

If I log out normally on the rx2600 and thus normally terminate the SSH connection, the final status code is 10000001 %SYSTEM-S-NORMAL, normal successful completion

So you may indeed try to check your accounting file with:

$ ACC/SINCE=/STATUS=0001C0F4/PROCESS=INTERACTIVE

to see if this shows any pattern or shows more SSH users getting dropped without complaining.

Volker.
Steve Calnek
Advisor

Re: SSH Users Dropped

Gentlemen!

Thanks for the great suggestion's. I'll look into this today and report back shortly.

Steve/
Kees L.
Advisor

Re: SSH Users Dropped

Hello Steve,

In our system in SYS$SYSDEVICE:[TCPIP$SSH.SSH2]SSHD2_CONFIG.; we have the entry IdleTimeOut 8h.
Should only logout after idle time, but I have had times where I was logged off after having been typing in the windows minutes earlier. So I fear IdleTimeOut doesn't in all cases only work after being idle. Try change the value to see if it makes any difference.
Steve Calnek
Advisor

Re: SSH Users Dropped

Using the suggestions to date I've try setting idle timeout with out effect and i've used Volkers command to review the logs in detail. The error you found is not what is happening here but I have found a consistent error.
NorthVancouver >acc/since=today/type=process/process=interactive/user=dsutherland/full

INTERACTIVE Process Termination
-------------------------------
Username: DSUTHERLAND UIC: [GRUSSELL]
Account: Finish time: 11-APR-2008 10:04:28.28
Process ID: 0000145C Start time: 11-APR-2008 09:04:27.25
Owner ID: Elapsed time: 0 01:00:01.03
Terminal name: FTA26: Processor time: 0 00:00:00.36
Remote node addr: Priority: 4
Remote node name: Privilege <31-00>: 00108000
Remote ID: DSUTHERLAND(LOC Privilege <63-32>: 00000000
Remote full name: 192.168.0.63
Posix UID: -2 Posix GID: -2 (%XFFFFFFFE)
Queue entry: Final status code: 00002BD4
Queue name:
Job name:
Final status text: %SYSTEM-F-EXITFORCED, forced exit of image or process by SYS$DELPRC
Page faults: 1053 Direct IO: 249
Page fault reads: 106 Buffered IO: 1027
Peak working set: 7504 Volumes mounted: 0
Peak page file: 176064 Images executed: 7


NorthVancouver >help/message "%SYSTEM-F-EXITFORCED"


EXITFORCED, forced exit of image or process by SYS$DELPRC

Facility: SYSTEM, System Services

Explanation: Another process caused the image to exit using the SYS$DELPRC
system service. In this use of SYS$DELPRC, an image exit
occurs (rather than a process deletion).

User Action: None. This is a fatal error message.

Any suggestions as to what could cause this?

Thanks

Steve/
Richard Whalen
Honored Contributor

Re: SSH Users Dropped

does the system have an idle process killer?
Steve Calnek
Advisor

Re: SSH Users Dropped

Hi,

No, there are no programs like hitman running on this system.

Steve/
Hoff
Honored Contributor

Re: SSH Users Dropped

Turn on process control auditing via SET AUDIT /ENABLE [...more /stuff...]. Calls to various of the core process control services can be selectively audited.

Also start swapping some of the component pieces involved in the connection, and try and see where the problem lurks. I might load in a Mac OS X or Linux box here, and turn somebody loose with that to see if that connection dropped. Also establish "dummy" connections from the failing Windows boxes to another target, and see if those are (also) dropped.
Steve Calnek
Advisor

Re: SSH Users Dropped

Thanks Hoff, saw this very suggestion on your blog and did this about an hour ago,

$set audit/audit/enable=(process=(creprc,delprc,forcex))

We'll see what the results are on Monday.

Steve/
Steve Calnek
Advisor

Re: SSH Users Dropped

I had an SSH session dropping situation happen on another server at another site this weekend.

The user has been logged in all day working without dropping out once but then in a time frame of less than 10 minutes he dropped twice.

When I review the audit log the error is the same as the server I've been working on that last week. This users a bit more technical and experimented a bit before reporting it to me. He said the drop out occurs when ever he' was doing high speed data entry. He managed to replicate the situation consistently.

Could this be the alt type ahead buffer being set to low?

If it makes sence that this is it, what should it be set at?

Any comments?

Steve/
Volker Halle
Honored Contributor

Re: SSH Users Dropped

Steve,

if you're looking at the audit log entry, which process does the $DELPRC (or $FORCEX) ? Is it TCPIP$S_BGxxxx process handling the IP SSH communication ? Could you post the audit log entry ?

If the user can 'reproduce' the problem, can you have him log in direcly (e.g. console) on one VMS system, SSH to another VMS system (or even the same one) and reproduce it ?

Volker.
Wim Van den Wyngaert
Honored Contributor

Re: SSH Users Dropped

I'm using multinet so I can't test it.

A SSH session has 2 processes. 1 session and 1 process doing the SSH stuff. You looked at the interactive process (in accounting).

Could it be that the interactive one is killed by the network process meaning SSH took the initiative to kill the session ?

Wim
Wim
Volker Halle
Honored Contributor

Re: SSH Users Dropped

Steve,

you can enable debugging for the SSH server process, the one with the name TCPIP$S_BGxxxx, by setting the system-wide logical

$ define/sys tcpip$ssh_server_debug yes

(see the code in TCPIP$SYSTEM:TCPIP$SSH_RUN.COM). It will log an enourmous amount of information into the file TCPIP$SSH_DEVICE:[TCPIP$SSH]TCPIP$SSH_RUN.LOG, but maybe this can help. Be warned, the logfile grows pretty big...

Volker.
Wim Van den Wyngaert
Honored Contributor

Re: SSH Users Dropped

Specifies the level of debug information. By default, this logical name
is set to 99.

Default is maximum debug ????

Wim
Wim
Volker Halle
Honored Contributor

Re: SSH Users Dropped

Wim,

the DCL code in TCPIP$SSH_RUN.COM (at least with TCPIP V5.5) looks for the existance of the TCPIP$SSH_SERVER_DEBUG logical and then just invokes the SSH server with params = "-i -d 99" but you could certainly edit this procedure temporarily to reduce the debug level, once you've found what you're looking for.

Volker.
Steve Calnek
Advisor

Re: SSH Users Dropped

Hi Gentleman,

Sorry for the delay getting back to this but I turned it over to HP. This issues now resolved but I wanted to share what it was.

An SSH client and server will exchange new keys every hour. A failed attempt at a rekey will result in the connection being dropped.

In my situation the problem was that the audit log showed the drops occurring at intervals of hour(s) rather than just an hour. What we ultimately surmised was that a rekey attempt was being made during heavy data input periods by the user on their respective workstations.

To resolve the issue I modified the SSH server config file to allow a rekey attempt every 10 hours.

RekeyIntervalSeconds 36000

This resolved the issue for me. In our situation this is not a problem but I'm certain there are higher security operations people not impressed by this.

Its not clear if this is a server or client issue. I adjusted the setting on the server because none of the clients we tried allow adjustment. We tried 5 different client emulators, but never changed the host OS which was MS Windows 2000 Pro ( I know its old, I don't control this). It could be a threading issue with that OS.

The HP help desk was not away of this issue but they found some notes on the issue from OVMS engineering that said that they had seen this occur with certain client emulators. They certainly know about it now.

I appreciated the efforts by all of you to help me. I hope this information is helpful to you or others in the future.

Regards.

Steve Calnek