Operating System - HP-UX
1819872 Members
2737 Online
109607 Solutions
New Discussion

Need help finding source of unhandled SIGALRM that terminates process

 
SOLVED
Go to solution
Max_Belanger
Visitor

Need help finding source of unhandled SIGALRM that terminates process

Hello all,

 

I have been pulling my hair for a while trying to find the source of a problem that causes a process in a multi-process application to exit without a CORE after a few days of load. Any advice or tips to help pinpoint the source would be greatly appreciated.

OS is HP/UX 11.11 and the application is written in C.

The problematic process is multithreaded, and it basically receives messages on an IPC queue and spawns connection threads through libcurl to send HTTP requests to external servers and handle asynchronous reponses that update the application. The application uses semaphores  to synchronize between processes and threads.

 

What I have found so far is that an unhandled SIGALRM seem to be finding its way to the process without being handled. I have tried setting a dummy signal handler (that just reassigns itself to SIGALRM when a SIGALRM is received) to no avail and I have also tried ignoring it right at the start of the main like so:

  new_action.sa_handler = SIG_IGN;
  sigemptyset (&new_action.sa_mask);
  new_action.sa_flags = 0;
  sigaction (SIGALRM, &new_action, NULL);

 I know that the library handling the semaphores used by the process overrides the alarm handler with its own SIGALRM handler for semaphore events (to avoid deadlocks), but it restores the old handler after completing like so:

    sigemptyset(&act.sa_mask);
    act.sa_flags = SA_RESETHAND;
    act.sa_sigaction = NULL;

    act.sa_handler = was_alarm;
    sigaction(SIGALRM, &act, NULL);
    alarm(time_left);

 

 

If I run the process from GDB and set breakpoints on _exit, when the process exits after a few days all I get is:

warning: Temporarily disabling or deleting shared library breakpoints:
warning: Disabling breakpoint #2

Program terminated with signal SIGALRM, Alarm clock.
The program no longer exists.
Stopped due to shared library event
(gdb) bt
No stack.


There are two things I need to find out:

1- What is sending the alarm. I went through the code and do not understand why an alarm signal would come up unhandled. If I tell GDB to stop on SIGALRM, I still can't find where an alarm expired by going through the stack and the threads.  Is there any way in GDB to find the source of a signal? 

2- Why is the SIGALRM not handled by either the handler or the SIG_IGN. Any ideas would be appreciated.

 

 

Thanks for your time,

Max

 
2 REPLIES 2
Dennis Handly
Acclaimed Contributor
Solution

Re: Need help finding source of unhandled SIGALRM that terminates process

You might try using tusc to see what's going on.

 

sigemptyset(&act.sa_mask);
act.sa_flags = SA_RESETHAND;
act.sa_sigaction = NULL;
act.sa_handler = was_alarm;
sigaction(SIGALRM, &act, NULL);
alarm(time_left);

Why are you calling alarm(2) here?  It seems you should check was_alarm for SIG_IGN and not call alarm.

 

Are you blocking SIGALRM while you are in your library signal handler?

 

Max_Belanger
Visitor

Re: Need help finding source of unhandled SIGALRM that terminates process

Thanks Dennis, using tusc did help me narrow it down to a racing condition with threads and calls to alarm().

 

There was a problem with our signal stacking function that saved and restored signals when a semaphore lock was needed. This code was not thread safe since the alarm timer is global to the whole process.