Operating System - HP-UX
1822439 Members
2543 Online
109642 Solutions
New Discussion юеВ

Re: Enterprise replication

 
hari jayaram_1
Frequent Advisor

Enterprise replication

I have two Informix data base servers running ER between them. Currently my transaction queues are building up. Informix ran some diagnostics and here is what they get
XTF_SOCKETS XTF_SYSCALLS imcsoc_be.c 2906 2041 22:57:12 300858111 0 232 7406e308 7402fac8 61 sendsocket():send() rc:-1 erno:246 localfd:10 netappen:c00000007
62b87d8

Has anybody seen this error. the errno.h indicates this as EWOULDBLOCK. Any replies appreciated and thanks in advance
10 REPLIES 10
Sridhar Bhaskarla
Honored Contributor

Re: Enterprise replication

Hi Hari,

I have never seen a problem like this. But there are two interesting things. sendsocket() and send(). sendsocket may be a private function while send() is a socket system call. This may indicate that there is a problem in the network subsystem.

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
Sanjay_6
Honored Contributor

Re: Enterprise replication

Hi Hari,

Check the patch level on your system. Looks like you need some ARPA / network patch and DCE/9000 patch on you system. you have to search for the appropriate patch by searching on the version of OS you have.

Hope this helps.

Regds


hari jayaram_1
Frequent Advisor

Re: Enterprise replication

sanjay and sri both thanks for your input. I checked for all the ARPA patches and found only one to be out of date and had this rectified. I will now check for the DCE patches and get back.
Steven Gillard_2
Honored Contributor

Re: Enterprise replication

The EWOULDBLOCK error on a send() indicates that the local send socket buffer is full. This usually indicates that you have a network bottleneck - ie the link between the two systems is not fast enough to keep up with the volume of data you are trying to send through it.

The Informix software should handle this error, as it is a very normal error when you have sockets set up in non-blocking mode. See the send(2) man page for details.

Definitely ensure you have the latest patches, but keep pushing Informix because this appears to be an application problem (caused by a possible network performance problem).

Regards,
Steve
hari jayaram_1
Frequent Advisor

Re: Enterprise replication

Steve, thanks We have been pushing Informix and HP. The network problem on the Wide are can be ruled out as the network has been sniffed and analyzed more than a dozen times and for once I can confidently say that the network is clean. Can u give me more details on how it can be an application problem ? Thanks in advance for your input
Steven Gillard_2
Honored Contributor

Re: Enterprise replication

Sure... according to the send(2) man page:

If the available buffer space is not large
enough for the entire message and the socket
is in nonblocking mode, errno is set to
[EWOULDBLOCK].

This indicates that the informix application has put the socket in non-blocking mode. When in this mode, it is perfectly reasonable for a send() call to return EWOULDBLOCK. The application should be coded to handle this condition and re-try the send() after ensuring that the socket is ready to send more data with the select() or poll() system calls (ie after the send buffer has emptied a little). This subsequent send() would succeed. If the application is simply giving up when it gets EWOULDBLOCK then this is a bug as far as I'm concerned.

Have a close look at the send(2) man page for full details on how it works in non-blocking mode.

Of course, I'm not saying for sure that there is a bug in the Informix software. Perhaps the application is simply logging the -1 return code from send(), then going on to correctly handle the condition. Still, it is a question for Informix to answer.

Regards,
Steve
hari jayaram_1
Frequent Advisor

Re: Enterprise replication

steven, thanks will ask the dba guys to speak to informix and update.
hari jayaram_1
Frequent Advisor

Re: Enterprise replication

steven,

Here is the reply I got back from Informix
This is exactly as we thought. We are handling the 246 error by trying to
> resend upon receiving the 246 error, so the application is definitely
> handling the situation correctly as HP points out. We get the -1 return
> code with errno 246 and then try to resend. In the one instance we spoke
> about, there were more than 5,000 resends before send() finally succeeded.
>
> The comment below "it is perfectly reasonable for a send() call to
> return EWOULDBLOCK" is exactly what's happening. Is it reasonable for this
> to happen 5,000 times for a single buffer? If we get 5,000 EWOULDBLOCKS for
> each buffer and need to send 20 million buffers, we're in the mode that your
> system is experiencing. I wouldn't think this would be considered
> reasonable.
>
> If there's anything further you need from me, please let me know. I can
> talk to HP directly if they have any questions on how we're handling the 246
> error
Steven Gillard_2
Honored Contributor

Re: Enterprise replication

As I mentioned earlier, they should be using select() or poll() to determine when the socket is able to send data again. If they are simply calling send() in a loop until it succeeds then 5000 in a row is not unexpected - send() will return very quickly with the EWOULDBLOCK error.

One other thing you might want to look at is the performance of the system on the receiving end of the data. If it is not keeping up that will also cause the send buffer to fill.

Regards,
Steve
hari jayaram_1
Frequent Advisor

Re: Enterprise replication

Here is the latest after speaking to HP. It looks like the below when set to 4096K buffers resolved the problem. I will come back in case the problem is not resolved.

tcp_xmit_hiwater_def
> >
> > For every TCP connection a buffer is allocated.
> > The application writes into this buffer and TCP is
> > responsible for sending it to the distant host.
> > Sometimes it happens that the other host
> is not able to
> > receive further data, so TCP can not send more data out on
> > the interface.
> > In this case the allocated buffer fills
> up and at one
> > point we reach a limit where we must stop the application from
> > sending more data to the buffer.
> > This higher limit is called the high-water mark.
> > We prevent the application from sending
> any further data
> > until TCP got the chance to send enough packets out so that
> > we reach another lower limit the
> tcp_xmit_lowater_def
> > which again allows the application to write data into the buffer.
> > Since different connections could differ
> in their speed of
> > filling up this limit we have two other high-water marks.
> > tcp_xmit_hiwater_lfp is for fast
> connection, whereas
> > tcp_xmit_hiwater_lnp is for slow connections.
> > Also the low-water mark has two other
> equivalents for fast
> > connections it is tcp_xmit_lowater_lfp and for the slow
> > connections it is tcp_xmit_lowater_lnp.
> >
> > This value is given in bytes.
> > The minimum is 4096, there is no defined maximum.
> > The default is set to 32K (32768).
> >
> > Why should this be changed?
> >
> > Normally it is not necessary to change
> this value, but if
> > the connections are faster than expected you could increase it
> > or if they are slower decrease them.
> >
> > Usable commands:
> >
> > Check the current value:
> >
> > ndd -get /dev/tcp tcp_xmit_hiwater_def
> >
> > Set the high-water mark to 64K:
> >
> > ndd -set /dev/tcp tcp_xmit_hiwater_def 6553