Operating System - HP-UX
1753521 Members
5411 Online
108795 Solutions
New Discussion юеВ

Deadlock with FIFO, fwrite() and abort()

 
SOLVED
Go to solution
Krishna R
Advisor

Deadlock with FIFO, fwrite() and abort()

PFA a small c program (multi-threaded) that reads and writes to FIFO
1. main launches reader and writer, and crashes deliberately after a few seconds.
2. Reader opens the FIFO for reading, but doesn't read (the idea is to fill the pipe until writer is blocked)
3. Writer opens the FIFO with fopen, and writes "hello" continuosly using fwrite.

There is a signal handler which calls abort().

When this program is run, the process hangs as follows:
1. writer thread is already blocked in fwrite as pipe is full of unread data
2. main thread is blocked in abort() hangs as follows
sig handler -> abort-> cleanup -> acquire mutex

If I use open/write calls on FIFO, the deadlock doesn't happen.

Hang is observed on HPux, Linux and AIX, but not is SOlaris.

I would like to understand which piece of code is unsafe and should be changed.
1. using fwrite on FIFO?
2. calling abort in signal handler?
3. something else.


Any help/pointer is appreciated.

Thanks,
Krishna


Stack trace below:

(gdb)
Thread 3 (system thread 4100701):
#0 0x4001820:0 in readPipe+0xe0 ()
#1 0x60000000c00ab4a0:0 in __pthread_bound_body+0x170 ()
from /usr/lib/hpux32/libpthread.so.1

Thread 2 (system thread 4100700):
#0 0x60000000c0345cb0:0 in _write_sys+0x30 () from /usr/lib/hpux32/libc.so.1
#1 0x60000000c035b170:0 in write+0xb0 () from /usr/lib/hpux32/libc.so.1
#2 0x60000000c03387a0:0 in _xflsbuf+0x1d0 () from /usr/lib/hpux32/libc.so.1
#3 0x60000000c03071f0:0 in fwrite+0x4d0 () from /usr/lib/hpux32/libc.so.1
#4 0x4001690:0 in writePipe+0x150 ()
#5 0x60000000c00ab4a0:0 in __pthread_bound_body+0x170 () from /usr/lib/hpux32/libpthread.so.1

Thread 1 (system thread 4100699):
#0 0x60000000c0341310:0 in __ksleep+0x30 () from /usr/lib/hpux32/libc.so.1
#1 0x60000000c0104dc0:0 in __mxn_sleep+0xab0 () from /usr/lib/hpux32/libpthread.so.1
#2 0x60000000c00c3390:0 in + 0x4a0 () from /usr/lib/hpux32/libpthread.so.1
#3 0x60000000c00c7590:0 in pthread_mutex_lock+0x170 () from /usr/lib/hpux32/libpthread.so.1
#4 0x60000000c03616d0:0 in __thread_mutex_lock+0xb0 () from /usr/lib/hpux32/libc.so.1
#5 0x60000000c0332270:0 in get_iop_lock+0x230 () from /usr/lib/hpux32/libc.so.1
#6 0x60000000c03324f0:0 in ___stdio_unsup_1+0x150 () from /usr/lib/hpux32/libc.so.1
#7 0x60000000c0320890:0 in _cleanup+0x50 () from /usr/lib/hpux32/libc.so.1
#8 0x60000000c02fc540:0 in abort+0xe0 () from /usr/lib/hpux32/libc.so.1
#9 0x40010e0:0 in fatalSignalHandler+0xa0 () #10
#11 0x40014c0:0 in crash+0x40 ()
#12 0x4001af0:0 in main+0x220 ()
6 REPLIES 6
Dennis Handly
Acclaimed Contributor

Re: Deadlock with FIFO, fwrite() and abort()

>I would like to understand which piece of code is unsafe and should be changed.
>1. using fwrite on FIFO?
>2. calling abort in signal handler?

It appears you have everything all explained already. You should be calling _exit(2) from your signal handler.
And as you found, stdio locks each file when doing threads.
Dennis Handly
Acclaimed Contributor
Solution

Re: Deadlock with FIFO, fwrite() and abort()

>I would like to understand which piece of code is unsafe and should be changed.
>2. calling abort in signal handler?

We looked closely at the C99 Standard and you are allowed to call abort or _Exit from a signal handler.
This means libc is broken.
Please contact the Response Center and file a bug report.
Krishna R
Advisor

Re: Deadlock with FIFO, fwrite() and abort()

Hi Dennis,

Thanks for your quick response. I was also wondering if abort() is safe to call from signal-handler as per 'standards', libc should rather handle all possible cases properly.

But the fact that except Solaris other major unix'es are also resulting in a deadlock made me think twice..

We will raise a bug report.

Best Regards,
Krishna
Dennis Handly
Acclaimed Contributor

Re: Deadlock with FIFO, fwrite() and abort()

>libc should rather handle all possible cases properly.

libc should either not flush at all. Or check each stream for a lock and skip it.
Krishna R
Advisor

Re: Deadlock with FIFO, fwrite() and abort()

I guess, not flushing in case of named pipes should be safest, as flushing might also lead to blocked state (even if the writer has not locked the pipe) if the reader is not reading from the pipe.

The stack trace for the same program in Linux:

Linux xxx 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:32:02 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux

Thread 3 (Thread 1084229984 (LWP 3778)):
#0 0x00000038e6cb933f in __write_nocancel () from /lib64/tls/libc.so.6
#1 0x00000038e6c65328 in _IO_new_file_write () from /lib64/tls/libc.so.6
#2 0x00000038e6c64364 in _IO_new_do_write () from /lib64/tls/libc.so.6
#3 0x00000038e6c65492 in _IO_new_file_xsputn () from /lib64/tls/libc.so.6
#4 0x00000038e6c5b758 in fwrite () from /lib64/tls/libc.so.6
#5 0x000000000040099a in writePipe ()
#6 0x00000038e770610a in start_thread () from /lib64/tls/libpthread.so.0
#7 0x00000038e6cc68c3 in clone () from /lib64/tls/libc.so.6
#8 0x0000000000000000 in ?? ()

Thread 2 (Thread 1094719840 (LWP 3779)):
#0 0x00000000004009d3 in readPipe ()
#1 0x00000038e770610a in start_thread () from /lib64/tls/libpthread.so.0
#2 0x00000038e6cc68c3 in clone () from /lib64/tls/libc.so.6
#3 0x0000000000000000 in ?? ()

Thread 1 (Thread 182894173856 (LWP 3777)):
#0 0x00000038e6cb933f in __write_nocancel () from /lib64/tls/libc.so.6
#1 0x00000038e6c65328 in _IO_new_file_write () from /lib64/tls/libc.so.6
#2 0x00000038e6c64364 in _IO_new_do_write () from /lib64/tls/libc.so.6
#3 0x00000038e6c66a21 in _IO_flush_all_lockp () from /lib64/tls/libc.so.6
#4 0x00000038e6c2fae6 in abort () from /lib64/tls/libc.so.6
#5 0x0000000000400862 in fatalSignalHandler ()
#6
#7 0x0000000000400931 in crash ()
#8 0x0000000000400a4e in main ()


It looks like there is no locking, but still flush blocks since reader is not reading (and the pipe is currently full)



Ofc, I would leave it to the right people to decide the best way to handle it.

Dennis Handly
Acclaimed Contributor

Re: Deadlock with FIFO, fwrite() and abort()

>It looks like there is no locking, but still flush blocks since reader is not reading (and the pipe is currently full)

Hmm, it seems there is no good way except to just not flush when there is an abort.

>I would leave it to the right people to decide the best way to handle it.

Now I'm not sure there is a right way. :-(