Operating System - HP-UX
1748128 Members
3687 Online
108758 Solutions
New Discussion

Re: Unexpected disconnects in Progress on HP-UX 11.00

 
Matt Harrell
Advisor

Unexpected disconnects in Progress on HP-UX 11.00

The system is an HP 9000/879 with 768 MB of RAM and 4 processors running HP-UX 11.00 (September 2002 patch bundles). There is a Progress database which is accessed via our custom Progress client software from Windows PCs (various versions of Windows, some across a T1 line and others in the same subnet).

It was running 100% fine until last Friday morning. Now, we're getting disconnected users in the plant across the T1, and bad slowness for everyone. Also, they keep banging into the -n limit, apparently because Progress is not properly removing the connections of these disconnected clients.

The Progress error is:

Error writing msg, socket=##, errno=32, usernum=## disconnected. (796)

When I researched this error on Progress's KB and on the Progress DBA mailing list, it said to check HP-UX error number 32, which is "broken pipe". All indications in my search said it was probably caused by a network problem. A tech support case with Progress resulted in the same information.

It turns out the smart jack on the DB server network side was causing lots of errors on the T1. I thought for sure this was it. I rebooted the DB server after it was fixed today around 11:00 Eastern Time. However, the errors came back quickly.

Nothing in HP-UX (glance, top, syslog.log, etc.) appears to show anything wrong with the DB server, but the indications from Progress and my searches still seem to indicate a networking, or possibly HP-UX problem.

Any ideas?
1 REPLY 1
cxtwo
Frequent Advisor

Re: Unexpected disconnects in Progress on HP-UX 11.00

Couple of things to check if it's a networking problem -

Are you losing local network connections at all?

See if you get packet loss with ping across the T1.

Check the "#netstat -p tcp -I lanX" output over a period of time that the problem occurs and see which error stats increase.
lanX is the lan interface which you use for these connections. You might see retransmit timeouts for example. Any dropped connections / full queue?

Check the nettl log if there are any driver level problems

netfmt /var/adm/nettl.LOG00

If users on your local subnet are Ok, I would suspect that there might be something wrong on the T1 link, perhaps congestion or just general slowness. Have more users been added recently? Perhaps the app can be tuned to handle network delays better...

hopefully these things will be a start...