1826281 Members
3825 Online
109692 Solutions
New Discussion

Re: SIGBUS catchable?

 
SOLVED
Go to solution
Ralph Grothe
Honored Contributor

SIGBUS catchable?

This maybe is a dumb question.

I've got a buggy DB application which seems to cause a lot of SIGBUS signals.

Apart from the coredumps thus produced (which maybe could be surpressed by "ulimit -c 0", or symlinked core files to /dev/null I guess), I wonder if there wouldn't be a better solution by coming up with ones own signal handler that could be wrapped around it.

Unfortunately neither the customer nor the developers of the application are willing (or able) to disclose any details that could help in tracing the reasons for the core dumps.
Looking at the core files' access times it seems nobody is taking care of them (e.g. debugger).

In the manpage of signal.h it only says that SIGKILL and SIGSTOP are not catchable.

Madness, thy name is system administration
9 REPLIES 9
Peter Kloetgen
Esteemed Contributor

Re: SIGBUS catchable?

Hi Ralph,

yes, the signals *are* catchable:

trap 'action_which_you_want_for_signal' signal_number

trap catches the signals, sent to a process and in the action field you can put one or more commands, delimited by "," which you want to be executed when the signal occurs.

You get the signal number with kill -l , which lists all signals with their numbers.

Allways stay on the bright side of life!

Peter
I'm learning here as well as helping
Steve Steel
Honored Contributor

Re: SIGBUS catchable?

Hi

SIGBUS : An attempt was made to access an unaligned address or a memory
location to which the process has incorrect access right.


To trap this will certainly help but debugging must be the final solution.

You should access
www.docs.hp.com

Select
public domain software

Search on tusc.

Make a tusc wrapper and see if the same condition alwauys kills you.


steve Steel
If you want truly to understand something, try to change it. (Kurt Lewin)
Ralph Grothe
Honored Contributor

Re: SIGBUS catchable?

Hi Peter,

I haven't done much system programming yet.
That's why I was not sure whether SIGBUS is catchable.
Of course did I know the shell built-in trap and its usage.
But I would like to have more control, that's why I would like to implement a signal handler.
While the direct usage of the the C-library (viz. signal()) is a bit too arcane for me I have almost the same features through Perl's %SIG hash, and a callback subref.
Madness, thy name is system administration
David Johns
Advisor
Solution

Re: SIGBUS catchable?

Hello Ralph:

Try compiling the program below. Open another terminal to get the process id, and send the signal.

terminal1$ ./sigbus
Entering wait loop...
SIGBUS: trying to recover...
Entering wait loop...

terminal2$ ps -ef | grep sig
dj 25834 25776 0 12:12:51 pts/0 0:00 ./sigbus
$ kill -BUS 25834

Cheers,
Dave


/*
sigtest.c: test SIGBUS.

To compile:
cc -g +w1 -Ae -DPOSIX_SOURCE -DCATCH_SIGBUS sigbus.c -o sigbus
or,
gcc -Wall -DPOSIX_SOURCE -DCATCH_SIGBUS sigbus.c -o sigbus
*/
#include
#include
#include
#include
#include
#include
#include

typedef void Sigfunc(int); /* APUE p. 271 */

static void sig_bus(int);
static sigjmp_buf jmpbuf;
Sigfunc * Signal(int, Sigfunc *);

#ifdef CATCH_SIGBUS
static volatile sig_atomic_t canjump;
#endif

int
main(int argc, char *argv[])
{

#ifdef CATCH_SIGBUS
if (Signal(SIGBUS, sig_bus) == SIG_ERR) {
fprintf(stderr, "%s: can't install SIGBUS handler", argv[0]);
exit(0);
}
#endif


#ifdef CATCH_SIGBUS
if (sigsetjmp(jmpbuf, 1)) { /* initialize jump buffer */
fprintf(stderr, "SIGBUS: trying to recover...\n");
}
canjump = 1; /* okay to jump */
#endif
fputs("Entering wait loop...\n", stdout);
for ( ; ; ) {
sleep(1);
}
return EXIT_SUCCESS;
}

/* Signal: Reliable version of signal(), using POSIX sigaction.
APUE p. 298 */
Sigfunc *
Signal(int signo, Sigfunc *func)
{
struct sigaction act, oact;

act.sa_handler = func;
sigemptyset(&act.sa_mask);
act.sa_flags = 0;
if (signo == SIGALRM) {
#ifdef SA_INTERRUPT
act.sa_flags |= SA_INTERRUPT; /* SunOS */
#endif
} else {
#ifdef SA_RESTART
act.sa_flags |= SA_RESTART; /* SVR4, 4.3+ BSD */
#endif
}
if (sigaction(signo, &act, &oact) < 0)
return(SIG_ERR);
return(oact.sa_handler);
}


#ifdef CATCH_SIGBUS
/* sig_bus: siglongjmp to try to recover from SIGBUS. */
static void
sig_bus(int signo)
{
if (canjump == 0)
return; /* unexpected signal, ignore */
canjump = 0;
siglongjmp(jmpbuf, 1); /* jump back to main, don't return */
}
#endif

Peter Kloetgen
Esteemed Contributor

Re: SIGBUS catchable?

Hi Ralph,

sorry, i am not as fit in system programming as i should be to help you here. What about the program posted by David? If you try it, please post here how/if it works, ok?

Allways stay on the bright side of life!

Peter
I'm learning here as well as helping
Ralph Grothe
Honored Contributor

Re: SIGBUS catchable?

Hi David,

many thanks for taking the effort giving me a lesson in writing C signal handlers.
Unfortunately I haven't time right now to give your code a try (besides I will have to use gcc, because they wouldn't buy me a HP-UX C license here).
I have some urgent work on another host to do now.
I will print a hard copy of your code to peruse it on the tube or at home later.
But I think I will stick to Perl, because I feel more comfortable with it.
Maybe I will find time for some C hacking at home on my Linux box (there no one will care if I screw up the system ;-)
Madness, thy name is system administration
A. Clay Stephenson
Acclaimed Contributor

Re: SIGBUS catchable?

Hi Ralph:

While it is possible to catch this signal; SIGBUS indicates a very serious error and is usually a programming design flaw. It's certainly not a signal that one would wish to ignore or otherwise render innocuous because in most case the damage is not localized. Basically the program is trying to alter data at a bad address.

The only person who can fix this is the application's developer. I'm amazed that the customer is unconcerned about this. I don't understand how the deveoper's are not responsible for a solution.

If it ain't broke, I can fix that.
Ralph Grothe
Honored Contributor

Re: SIGBUS catchable?

Hi Clay,

this also stuns me, how the application's developers don't feel responsible for their buggy code, nor seem to have any urge to fix it.
Unfortunately this seems to be the rule (blame it on our sales dept.).
We have got a few applications that contain at least annoying bugs that put an extra burden on our system administration (e.g. silly port resets of a buggy MVS <-> Unix connectivity tool).
Because the contracts betw. the companies that develop or sell the applications have long been expired (some of the companies are already defunct or refirmed under new names) we don't have a cludge.
Because the code unlike OpenSource is concealed there isn't even a remote chance to try to find out where things could go wrong (apart from that we are no developers, and only do some part-time hacking as a hobby).
Strangely, we sysadmins are held responsible by the customer for erratic behaviors of applications that we have no influence on.
Sorry, for deviating.

Yes, I know that one should be extremly cautious when supplying ones own signal handlers to handle (or worse) ignore signals sent from the OS because an application went berserk.
I'm fully aware that this is a loathsome, dirty, evil hack.
It's only because the customer wants us to do something about the coredumps.

Maybe I should implement a signal handler that initiates a mail flood to the developers whenever a SIGBUS is caught ;-)
Madness, thy name is system administration
Mike Stroyan
Honored Contributor

Re: SIGBUS catchable?

I can think of three common causes for SIGBUS.

1. The application is using junk data for addresses. It is doomed. Let the SIGBUS put it out of its misery.

2. The application ran out of a resource and did not check a returned value for errors before using the resulting address. Look for the failed system call with the tusc utility and try to increase the necessary resource limit.

3. The application has a bad habit of trying to allocate small data types and access them as larger data types that require specific address alignment. It happens to work until its luck runs out and it dies on an unaligned access. This could be worked around by creating a shared library that used a signal handler to fix up the result register or memory with small reads and writes and then resume after the bad instruction. There is
such a signal handler, installed by the allow_unaligned_data_access() function from libhppa.a on PA-RISC and from libunalign.so on IPF. Getting that signal handler into an already linked program would require heroic efforts, especially because there is no shared library version of the function on PA-RISC.

It may be worth your time to consider that your SIGBUS problem may come from a resource limit and explore eliminating the crash by tuning resource limits such as swap space or maxdsiz.