1839204 Members
4750 Online
110137 Solutions
New Discussion

Re: select(2) timeout

 
Steve Zwach
Occasional Advisor

select(2) timeout

I've got a fairly serious issue at a customer's production system. We have an process that coordinates communications between a couple of hundred processes. As is the usual case its mainloop is driven by select. It uses the timeval parameter to control processing events that we know need to occur at some time.

The problem is that occassionaly we call select(2) with a timeval of something like 0.25 seconds (tp->tv_sec = 0 and tp->tv_usec = 250000) and select takes much longer than this to return. In these cases, it seems to completely ignore the timeout and blocks until input is received on a socket. This causes us to process some time based events much later than we should which in our protocols can cause the process to be declared nonresponsive by one of its peers.

What kinds of things can cause select() to take much longer than the specified timeout to return? I'm trying to get the customer to run glancplus or anything else that will collect performance data so I can see what was going on during the time that select was taking its time.
6 REPLIES 6
A. Clay Stephenson
Acclaimed Contributor

Re: select(2) timeout

When you don't bother to identify your OS version, it's difficult to be specific but there have been a number of patches related to select() and timeouts failing to wakeup. My psychic, Miss Cleo, tells me that you are on HP-UX si she suggest that you look at PHKL_34173.
If it ain't broke, I can fix that.
Steve Zwach
Occasional Advisor

Re: select(2) timeout

my bad! I just realized I spaced on some pertinent details. This on HP-UX 11.23 parisc 32 bit.

Customer already has PHKL_33374 installed.
A. Clay Stephenson
Acclaimed Contributor

Re: select(2) timeout

Now I am confused. If this is 32-bit code then it can't be 11.23. Do you mean that this is 32-bit 11.x PA-RISC code executing under 11.23 and since it's 11.23 that could mean Itanium (under Aries) as well as PA-RISC. Is this a threaded application? If so are you approaching any of the thread-related kernel tunables?
If it ain't broke, I can fix that.
Kent Ostby
Honored Contributor

Re: select(2) timeout

The only other patch that seems to apply for 11.23 is the Mega Patch September 04 base patch, PHKL_31500 which fixes:

select(2) takes a longer delay if a timeout of 10ms is
specified.
"Well, actually, she is a rocket scientist" -- Steve Martin in "Roxanne"
Steve Zwach
Occasional Advisor

Re: select(2) timeout

Sorry for the confusion. This on hp-ux 11.23 parisc. The code is 32 bit and was compiled and linked on hp-ux 11.23. i.e we compile with +DA2.0 and not +DA2.0W.

The process is multithreaded but runs only one thread actually runs at a time -- threads have preemption points that allow them to give up the CPU and allow another thread to run.

When we're in the select call, we know that we just went through a cycle of giving time slices to threads and they are now all waiting on a condition to be given another time slice.

Threads are created to handle messages received from other processes. If we ever had more than 250 concurrent threads running in this process, it would display a warning which I'm not seeing.

After running threads we check for input messages by calling select(). In the cases where there select is delaying, it appears that select is returning multiple fds ready -- say 10 or 16. There are about 250 fd's being passed into the select call.
rick jones
Honored Contributor

Re: select(2) timeout

While it may or may not address your issue with select() taking longer than the timeout to return, passing 250 FDs to select(), particularly if those FD's are not coming and going all the time via accept/connect seems like a very good candidate for eventports:

ftp://ftp.cup.hp.com/dist/networking/briefs/use_eventports.txt

if you are on 11.23, eventports are there already and you can look at the section 7 manpage for poll(). Eventports can be significantly faster than select/poll when there are large numbers of file descriptors involved.
there is no rest for the wicked yet the virtuous have no pillows