1819805 Members
3250 Online
109607 Solutions
New Discussion юеВ

Unexplained SIGXFSZ

 
SOLVED
Go to solution
Bruce Bye
New Member

Unexplained SIGXFSZ

I have an application that extends its database files by making an lseek64() call past the current end of file, then writing a block of data to the desired length. Eventually this fails, with a SIGXFSZ during the write() call, but I cannot work out why this is happening when it does.

There is plenty of free space on the volume. The file is only being extended to about 820MB (so no obvious limit there, even if the volume didn't have large files enabled).

Most confusing is that all attempts to reproduce this behaviour in a small test program have failed. The only thing I can think of that I haven't tried is making the test program multi-threaded in the same way as the real application. Is there some cumulative effect that might mean the multi-threaded app is hitting a (hopefully configurable!) limit, which the single-threaded app would not? I've tried linking the test app with the pthreads library in case that was doing something bizarre in initialisation. Any other reasons I might be missing as to why this signal would occur?

I'm stumped :(

-------

The machine in question is 2xPA8500, running HP-UX B.11.00 (64-bit) with 8GB RAM.
10 REPLIES 10
U.SivaKumar_2
Honored Contributor

Re: Unexplained SIGXFSZ

Hi,
SIGXFSZ signal is caused when , file size exceeds the maximum file size that can be created by a process. Have you enabled large files support in your file system.

regards,
U.SivaKumar
Innovations are made when conventions are broken
Dietmar Konermann
Honored Contributor

Re: Unexplained SIGXFSZ

What about your confgured resource limits... see setrlimit(2)? What is the result of ulimit -a? Does the application call setrlimit()?

BTW, if you want to duplicate the problem... the signal is only delivered to UNIX95 conforming applications. These need to be linked with unix95.o, see man setrlimit(2).

Regards...
Dietmar.
"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)
Bruce Bye
New Member

Re: Unexplained SIGXFSZ

ulimit -a gives:
core file size (blocks) 2097151
data seg size (kbytes) 585936
file size (blocks) unlimited
max memory size (kbytes) unlimited
open files 20000
pipe size (512 bytes) 16
stack size (kbytes) 8192
cpu time (seconds) unlimited
max user processes 176
virtual memory (kbytes) unlimited

The application doesn't call setrlimit. Also, if it is getting linked with unix95.o (which from what you say and the setrlimit man page it must be) it's not intentional. Would it be enough if a shared library that it links with is UNIX 95?

For an application that is not UNIX 95 would you expect a write() call to return a value less than the requested length to write in all situations where a SIGXFSZ would be sent to a UNIX 95 application? That was certainly my understanding, and all the write calls in the test app succeed. In any case... I tried explicitly linking my test program with unix95.o with the same result.

BtW: Yes, large files are enabled on the volume with the data files.
Dietmar Konermann
Honored Contributor

Re: Unexplained SIGXFSZ

Hi, Bruce!

From my understanding a non-UNIX95 application should not receive SIGXFSZ... I may be wrong, of course.

Did you check if UNIX95 is set in your environment? It has the same effect as linking with unix95.o.

Regards...
Dietmar.
"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)
Bruce Bye
New Member

Re: Unexplained SIGXFSZ

I did check, and it's not set.
Dietmar Konermann
Honored Contributor

Re: Unexplained SIGXFSZ

Hi, Bruce!

Really weird... my next trouble-shooting step would be a system call trace.

Use the options -e and -f.

Dietmar.
"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)
Dietmar Konermann
Honored Contributor

Re: Unexplained SIGXFSZ

I digged around a little... and found that it is important if UNIX95 is set/exported at link/compile time, which has the same effect as using the unix95.o object explicitely.


These are the cases to distinguish...

1) If the application is compiled and linked with the UNIX95 environment enabled, the application is marked and the kernel is appraised of this compile-time condition when the application runs and it will deliver the signal whenever the limit is
exceeded. This signal is delivered regardless of the state of the UNIX95 environment flag at the time the application is run.

2) If an application is compiled and linked without the UNIX95 environment enabled (the legacy/default case), the application is not marked and the kernel will not deliver the signal when the limit is exceeded. This preserves the classic behavior.

3) Regardless of whether or not the application is compiled with the UNIX95 environment unset, if the application explicitly enables a signal handler, the ignore condition or the default behavior for the signal, the signal will be delivered and appropriately processed whenever the limit is exceeded. We assume that an application that is aware of a signal need not be protected from that signal.

4) If the application is specifically sent a signal by another application via the kill() or raise() functions, the signal will be delivered. We assume that a user who explicitly sends a signal could just have easily sent a SIGKILL (-9), which could not be avoided by the application anyway.


So, in summary, if UNIX95 was not set during compile/link time AND unix95.o was not used AND no handler, default or ignore signal action was enabled or specified at runtime, then no SIGXFSZ should be sent with failed write(2)s.

Regards...
Dietmar.
"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)
Bruce Bye
New Member

Re: Unexplained SIGXFSZ

Dietmar, you've solved one mystery at least! We handle all signals (except for some specific signals that we set ignore or default handling). So when the error occurs, it's no wonder that we're getting the signal.

So the question is just why there is an error at all. The tusc snippet attached shows the last few calls made to extend the offending file (the select calls are from a different thread). I'll try tweaking my test app to more accurately recreate the same system calls in case that makes a difference, and also make sure it sets a signal handler in the ongoing search for a simple repro!
Dietmar Konermann
Honored Contributor
Solution

Re: Unexplained SIGXFSZ

Bruce,

is it possible to place a getrlimit() call immediately before the write? Just to be 100% sure that no limitation is acually effective?

Did you already check if growing the file to that size is generally possible, e.g. by appending data from command line?

Regards...
Dietmar.
"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)
Bruce Bye
New Member

Re: Unexplained SIGXFSZ

Well, I bit the bullet today and rebuilt the failing application with a few choice getrlimit calls and discovered that somewhere between the application starting and the failing write() the limit changes to 819200000.

In putting some further fencepost calls to getrlimit I found... a call to ulimit(). No wonder a search for get/setrlimit turned up nothing! So how embarrassed am I? :o

Thanks for all the help.