ssh and sftp stop working after 3-4 days

John Butler_4 · ‎03-23-2006

Hi

We have a situation on a customer site where ssh and sftp stop working after 3-4 days. The only way to fix the problem is to kill and restart the sshd daemon. I've even fixed it by killing some of the spawned sshd processes. The frequency is variable, mostly 3-4 days but we've seen it go longer between "hangs".

Some background:
We have two Itanium servers running HP-UX B.11.23. One server takes data feeds and processess the data. The processed data is then transferred to the 2nd server. The processing and transfer batch takes place from 22:00 until 07:00. We do not find out that the xfers have failed until the following morning.

The error that we see in the xfer logs is:
scp [] to [] failed with [1]
ssh_exchange_identification: Connection closed by remote host
lost connection

We've looked through the syslogs but there are no sshd messages for the time around the failures. We've tried running sshd with debug on but becaus of infrequency of failures this was impractical because logs got too big.

telnet and rsh are not allowed on this network therefore all connections are using ssh. so when this happens we cannot connect and have to find someone with a session already logged in to fix the problem. (We don't have access to the Console)

I've looked through the forums but cannot find a similar case.

To work around the problem we've added a cron job to do an sshd stop and start daily. This is not an ideal solution so we need to fix it.

I don't think it's the o/s, maxuprc is quite high, 3780, and the number of sshd user never seems to get to that limit.

Is there's limit to the number of sshd processes. I cannot find an sshd_config parameter for this.

Any help would be greatly appreciated

Rgds
John

Steven E. Protter · ‎03-23-2006

Shalom,

Upgrade secure shell to the latest version.

Check network connectivity is correct with lanadmin -x

Run cstm mstm or xstm and check the lan hardware.

Check the ping rates, if they increase, you may have a bad nic.

Check the switch port settings for inconsistencies.

Check /var/adm/syslog/syslog.log for clues before connectivity is lost.

Check the switch logs.

Any of the above steps could lead to a solution.

I've had a similar problem with a RH AS 2.1 box. We have cron flipping the daemon once a day and now the issue is gone. Its an old server though and will be replaced shortly. I hope.

SEP

Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

John Butler_4 · ‎05-17-2006

Thanks Steve

Your first pointer fixed the problem. I upgraded SSH to the latest version from the HP Software Depot. Disabled the dron joband monitored for a week or so and no hangs.
We were able to hand the system over to the customer without this problem and the "hack' to fix it.

Thank you.

John

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

ssh and sftp stop working after 3-4 days

ssh and sftp stop working after 3-4 days

Re: ssh and sftp stop working after 3-4 days

Re: ssh and sftp stop working after 3-4 days