1825720 Members
2978 Online
109686 Solutions
New Discussion

SIGSEGV error

 
SOLVED
Go to solution
Michael Elleby III_1
Trusted Contributor

SIGSEGV error

I have a background process that terminates with this error:

Pid 15811 received a SIGSEGV for stack growth failure.
Possible causes: insufficient memory or swap space, or stack size exceeded maxssiz.
Memory fault(coredump)

I looked at vmstat, and it shows 'avm' and 're' being pretty high (avm is 94471 and re is 102) which would say that the system is near paging, but when I do a swapinfo, it tells me that only 10% if swap is being used.

The server is an L2000, with 2 440mhZ processors, and a gig of memory. The maxssiz parameter is currently set at 8388608.

Any suggestions are appreciated?
Knowledge Is Power
16 REPLIES 16
A. Clay Stephenson
Acclaimed Contributor
Solution

Re: SIGSEGV error

Hi Micheal:

This one is almost certainly stacksize; 8MB is rather small but done go nuts and set it to outrageous values so that maxssiz doesn't do it's job - preventing a rogue process from gaining all the resources on the box. Typically increasing maxssiz to 32MB (or possibly 64MB) will be a good compromise.
If it ain't broke, I can fix that.
Michael Elleby III_1
Trusted Contributor

Re: SIGSEGV error

Thanx Clay, I had my suspicions about maxssiz, but wasn't sure, which is why I posted the question.

Last note, I should have mentioned that server is running 11.00 64-bit kernel... Should I modify maxssiz_64bit parameter instead or modify both? I'm willing to hike up both of these conservatively to see if this cures the ill..

Mike-
Knowledge Is Power
Steven Gillard_2
Honored Contributor

Re: SIGSEGV error

You may also want to have a closer look at the application that is crashing with this error - more often than not its a bug where the program has entered an infinite recursive loop. If this is the case increasing maxssiz will not help (although I agree 8mb can be a little low - I usually set it to 32mb).

Do you have access to the source code of this application? Can you get a stack trace from the core file with gdb?

Regards,
Steve
Michael Elleby III_1
Trusted Contributor

Re: SIGSEGV error

Steve-

I do not have access to the source code of the app, nor do I have gdb readily available...

However, I do have the core file..

Mike-
Knowledge Is Power
Sridhar Bhaskarla
Honored Contributor

Re: SIGSEGV error

Michael,

No need to increase maxssiz_64 if your application is not 64bit. Even if you increase it, it doesn't matter.

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
A. Clay Stephenson
Acclaimed Contributor

Re: SIGSEGV error

Hi again:

In this case, I would go ahead and set maxssiz to 32MB. If it still crashes, then you have problems.

The stack (governed by maxssiz or maxssiz_64bit) is used for auto decalred local variables (i.e. those local to a function). 8MB is just on the verge of barely acceptable good programming practice for a locally declared static array. Good programmers will create dynamic data structures for these things and that is controlled by maxdsiz. The other thing that can cause stack overflow is (as mentioned) runaway recursion in which each invocation of a given function allocates only a small amount of memory but the function (or group of mutually recursive functions) is called many, many times.

Again, bump the puppy up to 32MB or so and if it still crashes, it's time to call the software vendor. In a very few cases I have been asked to bump maxssiz up to very large values; in that case, I always ask to speak to the Project Leader and I very nicely ask why his programmers are a bunch of idiots and why haven't they haven't heard of dynamic memory allocation.




If it ain't broke, I can fix that.
Michael Elleby III_1
Trusted Contributor

Re: SIGSEGV error

Thanx again Clay. I appreciate your input as well as the humor that you put into it... I laughed real loudly about your programmer's comment...

Mike-
Knowledge Is Power
Michael Elleby III_1
Trusted Contributor

Re: SIGSEGV error

Clay- I am awaiting a response from the programmers on a convenient window to bump up maxssiz, so I'll be looking to see the results...

Steven- Got my hands on gdb and ran it against the core, here are the results if you'd like to interpret:

# ./gdb -core=/tmp/core
GNU gdb 5.1.1
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "hppa2.0n-hp-hpux11.00".
Core was generated by `mmsmain'.
Program terminated with signal 10, Bus error.
#0 0xc36aacac in ?? ()
(gdb)
Knowledge Is Power
Steven Gillard_2
Honored Contributor

Re: SIGSEGV error

What you've got there is a 'bus error' which is slightly different to the original error message. This is almost always caused by a bug in the code - it basically means the program tried to use a mis-aligned pointer. It *could* in theory also be caused by the stack size limitation, so increasing maxssiz may still help but there's no guarantee.

At the gdb prompt type 'bt' to get a stack trace and forward it back to the developers along with the core file. There won't be much, if any information in it that I'll be able to interpret for you because you really need access to the source code. All you can do from a sys admin point of view is hassle those developers about their buggy code!

Regards,
Steve
Michael Elleby III_1
Trusted Contributor

Re: SIGSEGV error

Steven-

As I am still a little rough around the edges with using gdb, how would I perform a stack trace?

Thanx.
Knowledge Is Power
Steven Gillard_2
Honored Contributor

Re: SIGSEGV error

Hi Michael,

To get a stack trace the command you want is 'backtrace' or 'bt' for short.

Have a look at the HP wdb / gdb documentation online at:

http://h21007.www2.hp.com/dspp/tech/tech_TechSoftwareDetailPage_IDX/1,1703,1664,00.html

Regards,
Steve
H.Merijn Brand (procura
Honored Contributor

Re: SIGSEGV error

'where' is almost the same, and gives you (basically) all the info you need
Enjoy, Have FUN! H.Merijn
Michael Elleby III_1
Trusted Contributor

Re: SIGSEGV error

Steve, here is rerun of gdb with bt issued:

# ./gdb /tmp/mmsmain /tmp/core
GNU gdb 5.1.1
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "hppa2.0n-hp-hpux11.00"...

warning: exec file is newer than core file.
Core was generated by `mmsmain'.
Program terminated with signal 10, Bus error.

warning: The shared libraries were not privately mapped; setting a
breakpoint in a shared library will not work until you rerun the program.


warning: Can't find file mmsmain referenced in dld_list.

warning: Can't find file /opt/app/oracle/OraHome1/lib//libclntsh.sl.8.0 referenced in dld_l
ist.

warning: Can't find file /opt/app/oracle/OraHome1/lib/libwtc8.sl referenced in dld_list.
Reading symbols from /usr/lib/libcl.2...done.
Reading symbols from /usr/lib/libisamstub.1...done.
Reading symbols from /usr/lib/librt.2...done.
Reading symbols from /usr/lib/libpthread.1...done.
Reading symbols from /usr/lib/libnss_dns.1...done.
Reading symbols from /usr/lib/libm.2...done.
Reading symbols from /usr/lib/libxcurses.1...done.
Reading symbols from /usr/lib/libc.2...done.
Reading symbols from /usr/lib/libdld.2...done.
#0 0xc36aacac in ?? ()
(gdb) bt
#0 0xc36aacac in ?? ()
Cannot access memory at address 0x294
(gdb)

Please Note that I copied executable and core to a test box so I could test..

Mike-
Knowledge Is Power
Steven Gillard_2
Honored Contributor

Re: SIGSEGV error

Michael,

Obviously this hasn't produced a valid stack trace, the most likely reason being because you're not running gdb on the same system where the crash occurred. To analyse a core file on another system you need an identical environment - all shared libraries *must* be exactly the same. In your case I can see that at the very least the Oracle libraries are non-existant on this other system. On top of this you should also make sure that all other libraries involved are the same, ie you must be at the same libc / libpthread patch level etc. Its usually easier to run gdb on the system where the core was generated.

Also, if the executable has been 'stripped' of symbol information you'll have a very difficult time getting a useful stack trace. This is common practice for many 3rd party developers because it makes reverse engineering the code more difficult, and also reduces the executable size. The 'file' command will tell you if an executable is stripped or not.

And as I've previously mentioned, even if you do manage to get a stack trace its highly unlikely that we'll be able to interpret it because that requires knowledge of the source code. Its best to report the problem to the developers or application vendor.

Regards,
Steve
Michael Elleby III_1
Trusted Contributor

Re: SIGSEGV error

Steve-

One more question. I just spoke to one of the DBA's and they questioned whether or not the change to maxssiz will cause any issues with the SGA in Oracle.

Being of limited knowledge in Oracle, I wanted to run this by you before I made the change..

Thanx.

Mike
Knowledge Is Power
Steven Gillard_2
Honored Contributor

Re: SIGSEGV error

No problem at all - maxssiz is just a 'fence', increasing it will not result in any extra resource usage. All it does is allow processes to grow their stack segments larger.

Regards,
Steve