so_sndlowat

SOLVED

We are using a legacy application based on Providex.

When Providex opens the tcp-channel it uses setsocketopt to set so_sndlowat.

I am unable to find any documentation for HP-UX on this flag except this from the Linux stk. http://devrsrc1.external.hp.com/STKL/impacts/i61.html

Does this mean that so_sndlowat can not be used in HP-UX?

Thanks,

Richard

10 REPLIES 10

Hi Richard:

It would appear not. From the "BSD Sockets Interface Programmer's Guide":

http://docs.hp.com/hpux/onlinedocs/B2355-90136/B2355-90136.html

/begin_quote/

This option allows the user to set or fetch the low water mark for the socket's send socket buffer. At present, this option is not used. It is supported in anticipation of future use.

/end_quote/

Regards!

...JRF...

The HP-UX docs basically mean that while you can issue a setsockopt() to set SO_SNDLOWAT, it will be a noop. You can use it, but it will not do anything.

The application (if well-written) should still "work" - it should never rely entirely on SO_SNDLOWAT et al for things like pseudo-framing and such. The app _may_ make more small send calls if it manages to fill the socket buffer.

there is no rest for the wicked yet the virtuous have no pillows

First of all sorry, for not replying back and assigning points until now, I never got notified that there was answers.

Rick,

If a well-written application should not rely on SO_SNDLOWAT and poll() for flow control what shall it rely on?

We really just want to check if there is room in the buffer for a send() before we send it otherwise the application waits until there is room in the buffer.

Richard

An application can rely on poll to tell it when _some_ bytes can be written to a socket. Since SO_SNDLOWAT is not universal, the application cannot rely on being able to write a specific number of bytes to the socket. So, the "best" thing for a portable application to do here is to set the socket non-blocking, and be prepared to deal with a partial write. It can use SO_SNDLOWAT as a hint and on those systems where SO_SNDLOWAT is actually implemented, the application's partial write logic will be less often excercised (perhaps not at all), and the app _may_ perform better.

Depending on the timing and the send sizes involved and such, it is unlikely that even without SO_SNDLOWAT the application would be given only one byte at a time to put into the socket - TCP's window update heuristics tend to mean that an applicaiton will be able to put at least one MSS worth of data into the socket at one time.

Also, if the application has some idea of how much data it will ever have outstanding at one time, say because it only ever puts N requests or responses on the connection at once, and knows their maximum sizes in advance, it can also simply set the SO_SNDBUF size to be that many bytes and know that whenever it goes to check the socket will always be writable. natually, this only works if the app has advance knowledge of the traffic patterns. Some apps do, some apps do not.

there is no rest for the wicked yet the virtuous have no pillows

Rick,

Are saying that there is no way to determine if a socket is writeable or how much space is available in the buffer?

We know that we will never send more than 3 Mb is there a drawback in setting the buffer size that big with SO_SNDBUF? Is there a NDD parameter that will do the same?

This is actually a webserver which just sends data to clients browsers, would this be a good application for "non-blocking" sockets?

The send is "always" 4096 Bytes and that is what we are trying to set SO_SNDLOWAT too.

I appreciate your input,

Richard

I'm saying that there is no way to know how many bytes are "free" in the send socket buffer. Poll/select will tell you that there is at least one byte free.

If you know that you never provide more than 3 MB (B bytes, b bits :) you could indeed set SO_SNDBUF to 3MB. Socket buffer settings are limits, not preallocations. However, I would suggest that it be done on an application by application basis rather than changing the system-wide default with ndd. There could in theory be other applications that send large quantities of data, but don't really need/want a large SO_SNDBUF size.

As for the send sizes, you mention sending 4096 bytes at a time. That is fine, but if you already have all the data present, go ahead and try to write as much of it as you can. If there is 8192 bytes free in the socket buffer, might as well get one 8192 byte write in there rather than two 4069 byte writes. This will minimize the number of system calls, and may also allow TCP to better package the data it sends onto the wire.

If a webserver is handling more than one connection at a time, non-blocking is pretty much _THE_ way to go. I personally like it much better than process-per-connection or even thread-per-connection models.

there is no rest for the wicked yet the virtuous have no pillows

Rick, once again I did not get notified that there was a reply, I guess I can't rely on that anymore.

Both you, the manuals and HP tech support are telling me that poll() will return 1 if 1 byte is available in the buffer.

When I test it here this is not the case, it will only return 1 if there is 4096 available for a local to local connection and 1460 for a LAN connection i.e. it behaves exactly like the other unix flavors.

I have gone through all my ndd settings and I am unable to change these settings.

Would you happen to know what setting might affect this?

It would be awesome if I could change this dynamically but I will even settle for a static change to get up and running.

We have started to look at doing "non-blocking" ports instead since it seems like it will reduce a lot of the I/O involved but this would not be a quick fix.

Thanks again,

Richard

That you see 4096 bytes being free on a local connection and 1460 on a "remote" connection is consistent with what I was saying earlier about window updates and thus socket buffer free's being related to the MSS (Maximum Segment Size) of the connection. TCP is a maze of twisty heuristics, all entertwined.

From a theoretical standpoint, a window update/ACK and thus the freeing of data in the socket buffer, could be one byte at a time. TCP is after all a byte-stream protocol. However, giving just a one byte window update leads to something called the "silly window syndrome" whereby TCP ends-up sending ntohing but tiny segments. That is very inefficient.

So, the implementation heuristics state that window updates should (should != must) be at least for a MSS or some reasonable fraction of the window size). There is more to Silly Window Syndrom Avoidance but it doesn't directly apply here.

The MSS of a connection over an Ethernet link is typically 1460 bytes (1500 byte MTU less a 20 byte IPv4 header and a 20 byte TCP header).

The loopback interface and all "local" to the machine traffic will have an MSS of 4096 bytes. You can see this in the putput of netstat -rn on your system(s).

1500 bytes is the maximum MTU for Ethernet links. You can have an MTU smaller than that, but not larger.

The one exception to that would be Gigabit Ethernet links that support "jumbo frames" which is where the MTU is made 9000 bytes. However, that is not a de jure standard, only a de facto one, so not all gigabit equipment supports it (GbE NICs on HP-UX support it, but not all switches support it)

However, those MTU's are not necessarily the only MTU's an application might experience. There are some links with larger MTU's (The GbE JF being only one example). There are some links with smaller MTU's. Window updates are often for multiples of the MSS, but they are not necessarily _always_ multiples of the MSS, etc...

So, what the docs say is what an application should be prepared to deal with - while it is unlikely that a poll call would come back with only one byte writable, it is still _possible_ and a well-written application should be able deal with it.

there is no rest for the wicked yet the virtuous have no pillows

Thanks for the explanation Rick,

Sincerely,

Richard.

glad to be of help.

upon further reflection, I've realized that it would be the TCP ACKs that "control" how much data is freed from the socket buffer, not the window updates. the TCP ACKs mean that the remote TCP now has the data, so the local TCP no longer needs the reference to the data for retransmission, so it can free the space in the socket buffer. at that point the application can put more data into the socket buffer. when that data is then sent onto the network depends on the window updates (among other things). window updates and ACK's are logically distinct, but many of us often mix them together since they often travel in the same TCP segment... it all interacts, Mobius would be quite at home in TCP...

there is no rest for the wicked yet the virtuous have no pillows