Operating System - HP-UX
1833847 Members
2033 Online
110063 Solutions
New Discussion

question about send(2) fail

 
askforsocket
Occasional Advisor

question about send(2) fail

We encountered the following question:
a client run on HPUX 11.23 IPF & a server run on HPUX. 11.23IPF
The client read some files and send them to the server.
The details about the application is:
The client has two threads: main thread and second thread.
The main thread commands the second thread. Then the second thread connect the server with a socket set by fcntl(NOBLOCK). The second thread send files by ourself protocol decribed following:
Step1: send the property of the file:file name, size, etc.
Step2: Then read from the file, and send the file by a function called comm_send() to server.
The server firstly read the property of the file and know the size of the file, then receive the files by a function comm_receive().
Function comm_send():
select(2) the socket until it can be write
send(2) the data
Function comm_receive():
select(2) the socket until it can be read
recv(2) the data.

Somtimes the property of the file is successed to be sent, but failed when send the data of file. But the server didn't close the socket.
So why did the send(2) fail?
Unfortunately we cann't get the errno of send(2), by we guess it may be EPIPE. And when the send(2) fails, the /var always be full.
There is any relatio between send(2) fail and /var full?
Or because Step2 needs reading file, so recv(2) is timeout on the server? In this case, we think there should be some tracelog of select(2) timeout error.But we didn't find it in the tracelog of the server.
Additionally, the problems happen not on all the machine but only one. So there is some problem of PATCH or the lan card of the machine?
12 REPLIES 12
rick jones
Honored Contributor

Re: question about send(2) fail

Get the errno. Either correct the software to report errno when the send fails, or kludge it by using tusc to trace the application.

There is no direct relationship between the send() system call and the /var file system. Whether or not there is a relationship between the /var filesystem and your file transfer program is left as an excercise to the reader :)

There is no timeout on a recv() call under HP-UX.

Select can indeed have a timeout, but that is not an "error" it is just a timeout.

WRT selecting for write(), a better practice would be to just go ahead and try to write and only fall-back to select() when the send() call returns less than you wanted to write, or EWOULDBLOCK or somesuch - since you are using non-blocking
there is no rest for the wicked yet the virtuous have no pillows
askforsocket
Occasional Advisor

Re: question about send(2) fail

Thanks for you answer, but I am sorry for can not get the errno from the user yet.
But I found there is somthing strange.

1) There has 6 machines, but it happens only in the one machine.
2) when I swith the program to debug mode(can produce many more log containing errno)
or trace the program by tusc, it didn't not happen. It only happened in normal mode!!
3)the image of the program is as following:

/*SEND FILE PROCESS begin */
open socket
for file in (file1, file 2, file 3 ...) {

while (size of file) {
read(file, BUF, 4K); /*every time read 4K*/
log(DEBUG, "Begin comm_send()") /* in normal mode, not log*/
comm_send(BUF);
log(DEBUG, "End comm_send()") /* in normal mode, not log*/

}

}
close socket
/*SEND FILE PROCESS end */

3) When we
the first time call SENF FILE PROCESS, OK
the second time call SENF FILE PROCESS, OK
the third time or the forth time call FAIL
then FAIL untill we restart the program and get into this fail loop again.

4)So, the diffrence between DEBUG mode or NORMAL mode is the time in
every comm_send loop. The DEBUG mode is slowly than NORMAL mode.
and I think that the program is running slowly under tusc too.

5)So, is the next the reason?
In normal mode, the program sends too rapidly, then the receive client can
not complete to receive before next send, so the buffer of socket send is
full, then the send is fail??
But I did use the fcntl(NOBLOCK) and select the socket before send. So it
should stop for the reason of select timeout, not failed by send(2). or the send is failed by EAGAIN?? but I donn't actually understand EAGAIN.
And why did it happen only on the one machine?
There is some probelms with the machine? the NIC is not on correct status??
shiwudao
Occasional Advisor

Re: question about send(2) fail

I got the errno.
strerror(errno)= Resource temporarily unavailable.

The errno is EAGAIN, then whY?
shiwudao
Occasional Advisor

Re: question about send(2) fail

I get the same result in my machine!
# ndd -set /dev/tcp tcp_xmit_hiwater_def 4096
# ndd -set /dev/tcp tcp_xmit_lowater_lnp 4096
the first time SEND file OK
the second time SEND file OK
the third time SEND file FAILED by EAGAIN.

Does select(2) complete for the sendbuf size >= tcp_xmit_lowater_lnp as Linux??
But what the real reason??


rick jones
Honored Contributor

Re: question about send(2) fail

No, I believe the lowater marks are noops under HP-UX. Select() will return when there is at least one byte free in the socket. Upon receipt of an EAGAIN, the application must go back and go through select()/poll again.

Unless there is something else for the second thread to do while it is sending data, that is, it does something periodically in between the send() calls, it might just as well be a blocking socket and just call send() with the whole thing. Even better still would be a sendmsg() or writev() with the "header" info of file properties followed by the file data.

Also, there is the sendfile() call one can make.
there is no rest for the wicked yet the virtuous have no pillows
rick jones
Honored Contributor

Re: question about send(2) fail

btw, you should probably alter those ndd settings back to defaults.
there is no rest for the wicked yet the virtuous have no pillows
askforsocket
Occasional Advisor

Re: question about send(2) fail

Thanks for your answer. But I still have two questions:
1) select(2) complete for there has at least 1 bytes in sendbuf, so why send(2) fails with EAGAIN, should not it return 1?
2)Can I use setsockopt(SO_SNDBUF) to avoid this problem? setsockopt(SO_SNDBUF) modify what para? tcp_xmit_hiwater_def or
/dev/tcp tcp_xmit_lowater_lnp or both of them?

shiwudao
Occasional Advisor

Re: question about send(2) fail

I read the source of fressBSD. select() complete for at lease 1 bytes can use. but in send(), the kernel firsty try to lock the sendbuf, if it fail and the socket is NOBLOCK, then send() will return EAGAIN or EWOULBLOCK, but does HPUX? and except this case, are there any other reason??
shiwudao
Occasional Advisor

Re: question about send(2) fail

and the other reason is:
send() break the data into some mbuf, the length of each mbuf is 256? 4096? , then when the buffer size is less then the mbuf size, send() return with EAGAIN in NOBLOCK mode. So select complete only means there has at lease one bytes but not enough bytes to send the mbuf?
rick jones
Honored Contributor

Re: question about send(2) fail

If you "know" that the application will never have more than N bytes outstanding at a time, you could use setsockopt() to set SO_SNDBUF to N. You would not modify any ndd parameters.

Don't confuse "hiwat" and "lowat" ndd parms with recv or send watermarks in the setsockopt() sense. They may behave somewhat like it, but for an application to depend on stack settings like that is at best "brittle."

From HP-UX 11 and later, HP-UX networking internals are nothing like BSD networking internals.

How is the return from select() being examined? Is the code just looking for a value >= 1 from select and then assuming that it means the socket is writable? What does the code calling select() look like?

It could be that there was some other sort of event on the socket that cause the exit from select. The correct thing to do is examine the bitmasks. Similarly if one were using poll() one has to check the revent field in the pollfd structure(s).
there is no rest for the wicked yet the virtuous have no pillows
askforsocket
Occasional Advisor

Re: question about send(2) fail

Thanks for you kind anwser. my code is looks like:

set SIGRESTART
set SIGPIPE handler to ignore
fcntl(socket, O_NONBLOCK)
connect socket /* here also use select to handle INPROCESS errnor */
send(xxxx); /* here works well*/
recv(xxxx); /* here works well*/
/* then begin to send something looply */
while (size < Totalsize) {
FD_ZERO(&wrmask);
FD_SET(socket, &wrmask);
num = socket +1;
result1 = select(num, null, &wrmask, null, &timeout);
if (result1 <=0) {error or timeout, break;}
if !FD_ISSET(socket, &wrmask) {error break;}

result2 = send(socket, BUF, len);
if(result2 <0 ) {error; break;}/* get EAGAIN here!!!!*/
size += result2;

}

restore SIGPIPE handler

and I find the decribtion in the following URL:http://docs.hp.com/en/B2355-90136/ch03s02.html
A select of a socket descriptor for writing is useful on:
A connected socket, because it determines when more data can be sent without blocking.
This implies that at least one byte can be sent; there is no way, however, to determine exactly how many bytes can be sent.

I checked /usr/include/socketvar.h and I think select(2) uses MACRO sowriteable to justify is there
any buffer left(>0).
rick jones
Honored Contributor

Re: question about send(2) fail

actual code would be better, but the pseudo code you provided seems OK. it might be time to consider perusing the patch catalog to make sure you have the latest sockets/transport patches installed.
there is no rest for the wicked yet the virtuous have no pillows