TCP stack problem

hpuxsa · ‎03-17-2005

Hi,

We have oracle CRM application. The users are getting blank screen while accessing the application. Most of the time it happens when the user switches from a forms screen to a JSP screen. The vendor says it is possibly a TCP stack problem in the HP servers.

We have two HP rp5470 servers running apps and the DB is on an rp7410 server. They are all connected to a cisco catalyst 6509 and are in the same vlan. The servers have 1Gbps fibre interface. The user connections to apps nodes are load balanced using cisco content switches.

Please find below the netstat o/p from webnode and db node. Could someone please tell me how we can find out if the tcp stack has a problem.

webnode:
tcp:
73545107 packets sent
67150130 data packets (3647195122 bytes)
79016 data packets (58766373 bytes) retransmitted
6382987 ack-only packets (1575077 delayed)
679 URG only packets
7107 window probe packets
7738 window update packets
5535450 control packets
63028297 packets received
48393416 acks (for 3565628400 bytes)
103862 duplicate acks
0 acks for unsent data
42254696 packets (3464861751 bytes) received in-sequence
0 completely duplicate packets (0 bytes)
139 packets with some dup, data (142361 bytes duped)
3601 out of order packets (3471362 bytes)
0 packets (0 bytes) of data after window
14972 window probes
849661 window update packets
24255 packets received after close
7 segments discarded for bad checksum
0 bad TCP segments dropped due to state change
1045111 connection requests
1326133 connection accepts
2371244 connections established (including accepts)
2529095 connections closed (including 158112 drops)
577 embryonic connections dropped
45430245 segments updated rtt (of 45430245 attempts)
35827 retransmit timeouts
49 connections dropped by rexmit timeout
7107 persist timeouts
725049 keepalive timeouts
724861 keepalive probes sent
1919 connections dropped by keepalive
2 connect requests dropped due to full queue
170759 connect requests dropped due to no listener

DB Node:
tcp:
158211252 packets sent
145724434 data packets (1861681907 bytes)
2165 data packets (451890 bytes) retransmitted
12488728 ack-only packets (689314 delayed)
0 URG only packets
235 window probe packets
1922 window update packets
96454 control packets
106306530 packets received
68458120 acks (for 1861727692 bytes)
11700 duplicate acks
0 acks for unsent data
60937710 packets (2060673752 bytes) received in-sequence
0 completely duplicate packets (0 bytes)
0 packets with some dup, data (0 bytes duped)
522 out of order packets (1032308 bytes)
0 packets (0 bytes) of data after window
3 window probes
80035 window update packets
13 packets received after close
0 segments discarded for bad checksum
0 bad TCP segments dropped due to state change
15351 connection requests
32251 connection accepts
47602 connections established (including accepts)
49616 connections closed (including 2275 drops)
2269 embryonic connections dropped
68410053 segments updated rtt (of 68410053 attempts)
1170 retransmit timeouts
6 connections dropped by rexmit timeout
235 persist timeouts
2730 keepalive timeouts
2695 keepalive probes sent
2 connections dropped by keepalive
0 connect requests dropped due to full queue
2546 connect requests dropped due to no listener

Patch details are as follows
PHNE_25083 - Streams cumulative
PHNE_25388 - Lan product cumulative
PHNE_26939 - 1000BaseSX cumulative
PHNE_28089 - ARPA cumulative.

Regards,
Franklin.

The Real MD · ‎03-17-2005

I would remove any of the complications of networking and plug it all in to a single switch and get the basic functionality working first. Remove the V-lan and remove the load balancing. If it still doesn't work its an app issue.

If you can log what the app is doing while it's being used even better.

Hope This helps

Martin.

hpuxsa · ‎03-17-2005

One more thing i fogot to mention was that while analysing the traffic with ethereal on apps server (webnode) we have found that sometimes the RTT for an ACK between the apps server and DB server was taking around 60ms. While doing a ping the response was always 0ms.

The Real MD · ‎03-17-2005

unplug one of the load balanced machines and see what happens then. Do you still experience the same results.

Regards

Martin.

harry d brown jr · ‎03-17-2005

Using a vlan is perfectly fine.

How do you resolve DNS?

Have you considered applying more current patches?

This is your current and what you should be at:

s700_800 11.11 LAN product cumulative patch:
PHNE_25388 created: 2002/01/29
PHNE_28923 created: 2003/06/12

s700_800 11.11 Streams Pty cumulative patch:
PHNE_25083 created: 2001/08/24
PHNE_30103 created: 2003/12/11

s700_800 11.11 cumulative ARPA Transport patch:
PHNE_28089 patch has warnings created: 2002/10/23
PHNE_31247 created: 2004/08/07

s700_800 11.11 GELAN 1000Base-SX/T B.11.11.14-19 cum. patch:
PHNE_26939 created: 2002/06/03
PHNE_28883 created: 2003/12/29
PHNE_32491 passed functional tests, created: 2005/01/31 notes:critical fix reboot required special instructions

live free or die
harry d brown jr

Live Free or Die

doug mielke · ‎03-18-2005

If using DNS to resolve dbase server name, I'd consider resolving locally at the apps servers /etc/host file.

Maybe doing a few nslookup dbase-server-name from the app servers would show a slow or lack or response.

hpuxsa · ‎03-18-2005

We use DNS and the order is hosts file first and then DNS. For the DB and apps servers we have entries for all these servers in /etc/hosts file.

Thanks for the information on patches.
We will try installing the latest patches.

rick jones · ‎03-19-2005

_Which_ vendor is casting aspersions against our stack, and just exactly what to they think might be wrong with it?-) Oracle, the LB vendor, or someone else?

The 60 ms ACK time is normal. In broad handwaving terms it means that the application took longer than the standalone ack interval to send a response back on the connection, so the TCP standalone ACK timer fired and sent a standalone ACK. The default setting for the standalone ACK timer (tcp_deferred_ack_interval) is 50 ms, and it is based on a timer with 10ms granularity, so seeing anything between 40 and 60 is more or less normal.

If you can take a packet trace from a user's system doing the switch, that might be useful. As suggested, going through the LB's and stuff is additional complication so checking that thigns work well bypassing the LBs would indeed be a very good thing.

You netstat statistics look reasonable - it might be better to take a few snapshots over an interval and run them through beforeafter (ftp://ftp.cup.hp.com/dist/networking/tools/) since the stats are cumulative since boot and can wrap (being only 32 bits until HP-UX 11.notsurewat)

there is no rest for the wicked yet the virtuous have no pillows

hpuxsa · ‎03-20-2005

Hi Rick,

We have increased the number of Java process in the system and the frequency of the problem has reduced. Now the vendor is not complaining about TCP stack. they were initially highlighting the 60ms and blaming the stack and I could not explain why that was happening. Thanks a lot for the help.

Regards,
Franklin.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

TCP stack problem

TCP stack problem

Re: TCP stack problem

Re: TCP stack problem

Re: TCP stack problem

Re: TCP stack problem

Re: TCP stack problem

Re: TCP stack problem

Re: TCP stack problem

Re: TCP stack problem