Operating System - OpenVMS
1753818 Members
8593 Online
108805 Solutions
New Discussion юеВ

Re: TCPIP services do not always react

 
labadie_1
Honored Contributor

Re: TCPIP services do not always react

I do no know if you can afford to do that, but can you simply take a crash dump when you have the problem ?

then you will have plenty of time to analyse the hang.

Regards

Gerard
Willem Grooters
Honored Contributor

Re: TCPIP services do not always react

Gerard,

I would suggest this when it were just a test system. But this is a procuction system running a database. Last resort, perhaps, and only if really unavoidable.
Willem Grooters
OpenVMS Developer & System Manager
Antoniov.
Honored Contributor

Re: TCPIP services do not always react

Hello Willen,
here some clue to analyze.
a)TCP/IP is good installated?
TCPIP>sysconfig -s
You must see inet,socket and arp loaded and configurated (on all hosts).
b)Have you sufficient socket?
TCPIP>sysconfig -q socket
somaxconn must be at least 1024
HP hints a high value (also 65536) on server (on NodeA and NodeB). Also HP hints on server set pmtu_enabled=0. Here you can read more details: http://h71000.www7.hp.com/doc/73final/6631/6631pro_contents.html

Reread you attachment; I've seen on NodeB out-of-order packets are 0,27% while on NodeC rate 2,16%; may be trouble is on NodeC?
On NodeC:
TCPIP>SH DEV
Look for dev used for request service, then
TCPIP>SH DEV /FUL
Here you could find some insuficient value.
Can you repeat on server NodeB, too.

Bye
Antoniov
Antonio Maria Vigliotti
Willem Grooters
Honored Contributor

Re: TCPIP services do not always react

Antonio,
My guess is indeed that NodeC causes the problems. However, it's not the services that go wrong. NodeC issues the request so outgoing traffic seems to be the problem. It may depend on other TCPIP traffic (Telnet sessions...), so I've asked for some more details - when the application seems to hang.
(Alas, I have no direct access to that machine, I have to rely on others....)
Willem Grooters
OpenVMS Developer & System Manager
Willem Grooters
Honored Contributor

Re: TCPIP services do not always react

Found on NodeB that one counter is larger - see attachement.
What is "sobacklogdrops" - connections dropped due to time-out?
Willem Grooters
OpenVMS Developer & System Manager
Antoniov.
Honored Contributor

Re: TCPIP services do not always react

Hello Willem,
in link I posted upper, you can read:
[...]
Network performance can degrade if a client overfills a socket listen queue
with TCP SYN packets, thereby blocking other users from the queue. To
eliminate this problem, increase the value of the sominconn attribute to its
maximum value. If the system continues to drop SYN packets, decrease the
value of the tcp_keepinit attribute to 30 (15 seconds). Monitor the values of
the sobacklog_drops and somaxconn_drops attributes to determine whether the
system is dropping packets. (See Section 2.3.2 for more information about event
counters.)
You can modify the tcp_keepinit attribute without rebooting the system.
[...]2.3.2
The socket subsystem has three attributes that monitor socket listen queue
events:
├в ┬в The sobacklog_hiwat attribute counts the maximum number of pending
requests to any server socket.
├в ┬в The sobacklog_drops attribute counts the number of times the system
dropped a received SYN packet because the number of queued SYN_RCVD
connections for a socket equaled the socket├в s backlog limit.
├в ┬в The somaxconn_drops attribute counts the number of times the system
dropped a received SYN packet because the number of queued SYN_RCVD
connections for the socket equaled the upper limit on the backlog length
(somaxconn attribute).
The initial value of these attributes is 0. Use the sysconfig -q socket command
to display the current attribute values. If the values show that the queues are
overflowing, you may need to increase the socket listen queue limit.
The value of the sominconn attribute should equal the value of the somaxconn
attribute. When these two attributes are equal, the value of somaxconn_drops
will have the same value as sobacklog_drops.
However, if the value of the sominconn attribute is 0 (the default), and if one
or more server applications uses an inadequate value for the backlog argument
to its listen system call, the value of sobacklog_drops may increase at a rate
that is faster than the rate at which the somaxconn_drops counter increases. If
this occurs, you may want to increase the value of the sominconn attribute.

H.T
Antonio Maria Vigliotti
Willem Grooters
Honored Contributor

Re: TCPIP services do not always react

I've asked for more details:
Node B is 4100, 2G memory. I counted over 400 IP sessions.
NodeC is ES40, 2Gb memory, with 124 IP sessions.
Testmachine - functionally equal to NodeC - is some small, old Alpha system.
Whenever NodeC cannot connect (hangs), the very same request is repeatedly sent from this (relatively slow)testmachine, and it succeeds time after time. This kind of proves there is something wrong on NodeC.

A sudden thought: Could it be a case that ES40 is far to fast compared to 4100?

I have asked for tracing (TCPTRACE) on both nodes to see what traffic occurs. I will come back to this later.
Willem Grooters
OpenVMS Developer & System Manager
Willem Grooters
Honored Contributor

Re: TCPIP services do not always react

Now we tried TCPTRACE - default settings, on both NodeB and NodeC, and on testmachine.
NodeC had a problem with TCPTRACE, couldn't lock the pages in the working set. After /BUFFERS=50 (half the default) no data could be written.

Could it be a memory assignement problem - Too many connections perhaps? That could explain why one request will succeed one time and fail another....
Willem Grooters
OpenVMS Developer & System Manager
Ian Miller.
Honored Contributor

Re: TCPIP services do not always react

re problem with TRACE - this suggested a thought that your process quotas in SYSUAF are insuffient to run TRACE with the requested numbers of buffers but on one system the PQL_M system paramters are raising the quotas to a level sufficent to allow TRACE to run. Parhaps a similar problem exists with the original application. Compare PQL_M and PQL_D parameters on the systems. Check actual quotas that relevent processes are getting (not necessarily what you specify due to PQL_ parameters).
____________________
Purely Personal Opinion
Willem Grooters
Honored Contributor

Re: TCPIP services do not always react

A bit of an update
After consulting HP we found this:
The application on NodeC starts the communication with the right IP address: 10.21.0.12 (we can prove that!). However, a BG-device than allocated says the remote system is 108.21.0.12. It won't find that machine - so the connection times out.
If we specify the nodename : NodeB, all is running fine. Without a problem!
So my first idea was to suspect routing tables that contain the wrong information, but in second thought that couldn't be true, since when nodename was specified, taht would than show the same problem. So it's not the routing tables....

Final possibility: The module that initiates the connection is erring. It uses the socket interface. Still I don't get it. This module is used so very often, in so many applications that my thought is that it should have problems elsewhere. But this is the first (and so far: only) place that we've got trouble with it.

Willem Grooters
OpenVMS Developer & System Manager