1827853 Members
1483 Online
109969 Solutions
New Discussion

TCP RST problem

 
Arnaud Veron
Valued Contributor

TCP RST problem

Hello,

I hope my post fills in the appropriate forum.

I got a strange problem :

A HP DL380 G3 running RedHat ES 2.1 is hosting a weblogic app that reply to a POST request by a 302 redirect. As soon as the "302"
fragments are sent, the server emits a TCP RST. The client is then unable to do
its consecutive GET ordered by the redirect : it shows a "404".

Facts :

This behaviour is reproductible on vanilla lk 2.4.26, latest redhat EL kernel
and various Linux distros.

This behaviour is not reproductible with Windows NT/XP weblogic servers and
Tru64 servers.

This behaviour shows only when using Internet Explorer at the client side :
relyably with IE5, IE5.1, IE5.5 and less relyably with IE6. Mozilla browsers
don't ever trigger the RST. Latest IE Service Packs seem to solve the problem
too, but I don't have the leasure to force the upgrade on a *really* big client
park (+4000 PC).

Masking the Weblogic server behind a proxy or a LVS load-balancer mostly solve
the issue : RST get triggered less than 1 hit out of 50.

Droping the outgoing RST packets on the weblogic server fixes the problem 100%,
but may induce other problems.

Questions :

Does anyone have insights to share about how to solve the problem "cleanly" on
the server side, or simply an explanation of the phenomenon ?

What perturbations can I expect from filtering the outgoing RST on these
servers, given they will take hits from slow WAN clients ?

regards.
3 REPLIES 3
rick jones
Honored Contributor

Re: TCP RST problem

It might help if you could post an exmaple packet trace.

If you filter RST's, then the remote systems will never know that the server has toasted the TCP connection. They will likely keep retransmitting their next request, the server TCP will keep emitting RSTs that you filter, and eventually the client TCP will retransmit timeout and raise an error to the client application.

In a phrase, don't do that.

Perhaps there is some bogus use of SO_LINGER, or the server side is thinking there are not persistent connections, calling close() and then another request arrives from the client - that would trigger a RST because the server application, by calling close(), has "told" TCP it expects no more data. A system call trace could show that and/or perhaps setting SO_LINGER badly.
there is no rest for the wicked yet the virtuous have no pillows
Arnaud Veron
Valued Contributor

Re: TCP RST problem

Hi,

First of all, thanks for your answer to this tricky problem.

I uploaded a packet trace here http://dsit.free.fr/sniff.eth.bz2
( linux RHES server : 10.40.89.27 / nt4 client : 10.9.238.105 )
The problem occurs 2 times at the end of the log.

I will focus on debugging weblogic system calls this week.
Thomas Bianco
Honored Contributor

Re: TCP RST problem

i took a look at the packet trace. it looks like before RSTing the connection, .27 sends out 4 acks for 7440236. it does this again with 7426897, then rsts the same way.

additionally, 7428806 is acked for 18 times, and most of these are ack,psh meaning "this is the next packet i want, and i'm out of things to say so dump everything out of your buffer into the stream"

i'm looking for duplicate acks elsewhere in the trace, but packet loss is the theory de jour.

There have been Innumerable people who have helped me. Of course, I've managed to piss most of them off.