Trying to understand zero window size behaviour

MohitAnchlia · ‎02-04-2010

We are trying to configure replication accross 38 ms latency network over OC-48 between 2 oracle hosts. It's a big pipe and we are the only ones using at this time with just 30Mbits/Sec.

I have read through lot of posts but those posts have not answered some (or more :)) of my questions:

Problem Description:
1. Seeing lots and lots of re-transmitts, duplicate acks and lost segments
2. Seeing Zero window size from receiver

We use Linux and our send and receive buffer is set to 16M. Window scaling is set and other recommended tcp tuning parameters are set to. We are using Broadcom Nic with OEL 5.2 OS.

I have done some detailed analysis and we are currently looking at upgrading OS but I have some questions. Oracle Linux support is not able to answer my questions

1. When SYN is established I see WS=9 and also in SYN+ACK. I am assuming 9 means windows scaling will be used

2. Consistently packets going from Source to Destination has WIN=47. I am assuming this is sender's window. But why is it only WIN=47? Does it mean that sender can send only 47 bytes every time?

3. But on the other hand receive window gradually increases. But sender window is still WIN=47. I am not sure which one is used and how much data is being sent if WIN=47 on sender and WIN=80000 on the receiver.

4. Slowly even with light load and application keeping up the receiver's window shrinks down to zero.

Now this is the biggest mystery I am trying to resolve. Why is receiver shrinking so much even though there is not much to do? Application is idle during this time and memory on the revceiver is 16GB free with 7% CPU in use.

Is it that retransmissions and lost segments has something to do with it. For eg: Sequence 1 through 10 were sent but receiver received only 2 but is still expecting 3-10. In the meantime more packets were sent from 20-30 out and some more data loss occurred. So even though receiver doesn't have the data it's reduces the window expecting that data to arrive? I am just trying to understand. (This is all on one socket connection).

5. I am also seeing in netstat -s:

92664771 packets collapsed in receive queue due to low socket buffer

This increases even when there is no load.

Could someone please help me in my analysis? thanks.

Michael Steele_2 · ‎02-04-2010

Hi

Too much information. Please paraphrase into specific questions, lists commands being executed and paste in the errors or results.

Thanks!

Support Fatherhood - Stop Family Law

Matti_Kurkela · ‎02-05-2010

1.) Yes, WS=9 means the WIN values will be bit-shifted by 9 bits, so that WIN=1 actually means the window size is 512 (= 0x200 in hex, or 10 0000 0000 in binary).

2+3.) When WS=9 is in effect, WIN=47 means the window size is 24064 bytes, and WIN=80000 means the window size is 40 960 000 bytes.

3.) The gradual increase of the receive window is probably the effect of the TCP slow-start algorithm.

http://en.wikipedia.org/wiki/Slow-start

If WIN=47 on the server and 80000 on the receiver, that means data is primarily flowing from the sender to receiver: the receiver is sending back to the sender mostly acknowledge packets and not much more.

The WIN value is not an agreement between the sender and the receiver: it is a report of the current state of each party.

The receiver has seen that the sender has a lot of data to send, has already completed the slow-start algorithm and indicates it's prepared to receive large volumes of data.

4.) TCP is designed to present an error-free connection to the application layer. This means, if packets 1 - 10 are sent and the receiver received all except packet 2, the receiver OS cannot release packets 3...10 to the application while packet 2 is still missing.

The OS needs to hold all packets from packet 3 onward until packet 2 has been successfully re-transmitted. Once that happens, all the packets 2...10 (or more, if more than 10 packets has been received) are released to the application and acknowledged to the sender.

While waiting for packet 2, the receiver may receive still more packets.

The receive window indicates how much *new* data the receiver can accept at the moment. The buffer space for the earlier missing packet(s) has already been allocated. When the window drops to zero and there are still earlier packets left unacknowledged, it means the receiver is saying to the sender: "I cannot accept any new data until you resend me those packets that failed before."

Of course the application is idle: the application cannot even see the data until all missing packets have been successfully re-transmitted and the stream is re-assembled by the receiver OS.

Another reason for the zero window condition would be that the application simply won't read its receive buffer, because it's overloaded or waiting for something else to happen first. These situations can be identified by looking at which packets have not been acknowledged.

5.) This is a consequence of the above.

MK

MK

MohitAnchlia · ‎02-05-2010

Thanks this helps a lot! Some more question I have is:

1. How does it calculate WS=9? And is it the right value we should be using?

2. Why does sender always stays static at WIN=47. Even though there is so much data to be sent and receiver already has given hint that it can take more data. Also, according to slow start the window should increase.

3. I forgot to mention that in wireshark I am seeing the following:
- DCERPC malformed payload
- SMPP malformed payload

Please help me understand what these malformed packets are? Could it have any impact on huge re-transmits that we are seeing?

I am also attaching some wireshark analysis from the packet capture that I did from receiver side.

Matti_Kurkela · ‎02-06-2010

1.)
The window size is decided by the TCP driver in the operating system. The decision takes into account many things, such as:
- available space in the buffer of this particular socket (this is the maximum value the window size can be; however, the OS may automatically allocate a larger buffer if it detects the socket has been receiving a lot of data at high speed)
- the transmission error rate on this connection (if there are a lot of errors, the window is reduced)
- TCP slow-start algorithm
- any other network-congestion-management algorithms the OS may have

With a modern OS, there is a lot of auto-tuning going on here, so it's complex.

The WIN field in the TCP header is only 16 bits wide, so the maximum value without WS is 65535 bytes. The formula for the true window size when the WS option is in effect is:

window size = WIN * (2^WS)

2.)
How much data is flowing from the receiver to the sender?

For the purposes of window size optimization, the TCP protocol considers sender -> receiver and receiver -> sender as two separate flows. This is because some connections may be asymmetric: for example, an ADSL connection often has a downlink speed much higher than the uplink speed.

If there is not much data going in the receiver -> sender direction, the sender side may still be in the slow-start phase and therefore announcing that "his" receive window is small. It is perfectly legitimate to have WIN=80000 in the sender -> receiver direction, and WIN=47 in the receiver -> sender direction.

Alternatively, if there has been a lot of transmission errors in the receiver -> sender direction, and the "selective acknowledge" TCP option (SACK) is not available, this will be another reason to keep the window small.

3.) Something is using the TCP/UDP ports that are normally used by DCERPC and SMPP(typo?), but the data inside the packets does not look like valid DCERPC/SMPP data.

If those ports are being used by a custom application, you may have to tell Wireshark what it is so that it can use a correct analysis pattern on those packets.

Even if those packets were errors, those errors happen at a higher protocol layer (at the application protocol layer, in a completely different TCP/UDP connection), and so they should not affect any other connections.

I assume your .docx file was supposed to contain some graphs? My OpenOffice could not display them: I could only see the headings for the graphs. One of them says:

> Bad checksum indicates bad NIC or problem with segment offloading. We are aware of this issue and working on upgrading the Kernel. Also looking at getting new box

So you *know* you have hardware or NIC driver problems? It's very much possible that this is the root cause for all your troubles.

MK

MK

MohitAnchlia · ‎02-07-2010

I am attaching PDF for your analysis. Please let me know if my assumption with checksum and other things are not right.

1. Is there a better way to see the latency of the packet in tcpdump? I am using filter provided by wireshark but not sure if that's correct. it's in the pdf

2. Could you please advise if assumption regarding checksum is correct? TSO is on.

3. Network engineers here are all after the fact that we are seeing DCERCP malformed packet and assuming it's a host issue. But it looks like wireshark may be reporting it but doesn't really demonstrates any problem.

Is there a definite way to tell if it's a NIC issue of network? I am not seeing any packet drops on WAN router. And engineers say that firewall looks clean.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Trying to understand zero window size behaviour

Trying to understand zero window size behaviour

Re: Trying to understand zero window size behaviour

Re: Trying to understand zero window size behaviour

Re: Trying to understand zero window size behaviour

Re: Trying to understand zero window size behaviour

Re: Trying to understand zero window size behaviour