Re: received a SIGSEGV for stack growth failure

Nick Raceu · ‎03-12-2007

Hello,

After installing the DST patch (PHCO_35991) and rebooting the machine, I can not use AUTOSYS software. I get the following error: "Pid 8429 received a SIGSEGV for stack growth failure. Possible causes: insufficient memory or swap space, or stack size exceeded maxssiz."
I'm new to HP-UX so please let me know what info should I provide.
I'm running HP-UX B.11.00 and Oracle 8.0.6
Thanks

A. Clay Stephenson · ‎03-12-2007

I very much doubt that the tztab patch had anything to do with your problem. It is probable that you either exceeded a kernel limit (maxdsiz or maxssiz) or more swap is needed.

Please post the output of these commands:

kmtune
swapinfo -tam

If it ain't broke, I can fix that.

Pete Randall · ‎03-12-2007

Nick,

I have to wonder if these two aren't just coincidence. The DST patch really only affect the /usr/lib/tztab file which shouldn't have any affect on maxssize. Check the date/time stamps on /stand/system, /stand/vmunix and /stand/vmunix.prev, compared against /etc/shutdownlog. I'm wondering if someone played with maxssiz, then built a new kernel but never re-booted with that new kernel.

Pete

Pete

Nick Raceu · ‎03-12-2007

Hi Clay,
here is the info:
/home/root# swapinfo -tam
Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 1024 0 1024 0% 0 - 1 /dev/vg00/lvol2
reserve - 226 -226
memory 998 102 896 10%
total 2022 328 1694 16% - 0 -

Nick Raceu · ‎03-12-2007

Hi Pete,

Looks like the system has been rebooted several times since the last kernel has been re-built:
root@chapters:/stand# ls -ltr
total 70166
drwxr-xr-x 2 root root 8192 Dec 20 2000 lost+found
-rw-r--r-- 1 root sys 19 Dec 20 2000 bootconf
-r--r--r-- 1 root sys 82 Dec 20 2000 kernrel
drwxr-xr-x 2 root sys 1024 Dec 21 2000 system.d
-r--r--r-- 1 root sys 1103 Dec 22 2000 system.prev2
-rw-rw-rw- 1 root sys 980 Feb 1 2001 system_chapter1
-rwxr-xr-x 1 root sys 11622256 Feb 23 2001 vmunix.prev2
-rw-rw-rw- 1 root sys 1038 Jan 19 2002 system.prev
-rwxr-xr-x 1 root sys 12117872 Jan 19 2002 vmunix.prev
drwxr-xr-x 5 root sys 1024 Jan 19 2002 dlkm.vmunix.prev
-rw-rw-rw- 1 root sys 1045 Jan 19 2002 system
-rwxr-xr-x 1 root sys 12117872 Jan 19 2002 vmunix
drwxr-xr-x 3 root sys 2048 Jan 19 2002 build

Here are some rebootes in the last 3 years:
16:38 Thu Feb 19, 2004. Reboot: (by chapters!root)
16:22 Wed Jul 21, 2004. Reboot: (by chapters!root)
11:37 Thu Sep 16, 2004. Reboot: (by chapters!root)
14:57 Tue Mar 1, 2005. Reboot: (by chapters!root)
17:40 Thu Dec 8, 2005. Reboot: (by chapters!root)
14:40 Fri Mar 9, 2007. Reboot: (by chapters!root)

A. Clay Stephenson · ‎03-12-2007

Your box is running with the out-of-the-box settings for maxdsiz, maxtsiz, and maxssiz.

Your current settings have maxssiz set to 8MiB and maxdsiz and maxtsiz both set to 64MiB.

I would increase maxssiz to 32MiB, maxdsiz to 512MiB, and maxtsiz to 128MiB and see if your problems go away.

If it ain't broke, I can fix that.

Nick Raceu · ‎03-13-2007

Clay, I have increased maxssiz to 32MiB, maxdsiz to 512MiB, and maxtsiz to 128MiB, however i still get same error when trying to start the AutoSys event server.
Thanks
Nick

Don Morris_1 · ‎03-13-2007

That message is only generated when there really *is* a stack growth failure.

You raised maxssiz -- which is the most obvious thing to try [though I agree with you that the kernel settings should have been the same].

Next most obvious is swap available (since growth of any virtual object requires swap reservation), your swapinfo output shows plenty of space available... I presume you looped swapinfo or used Glance to monitor to ensure that AUTOSYS doesn't consume most of the swap just starting up (for Data, etc.) and then get the SIGSEGV for swap... and tear itself down so you see plenty of resources again.

Third in the list is lockable memory (mlock/plock interfaces) if this application is allowed to lock pages. Again, with so much available -- this shouldn't be coming into play unless the Text/Data/non-stack allocations of the application consume all/most of the memory first.

To address the points above, if you haven't already monitor the system state using Glance or other tools while you try to run AUTOSYS -- just to ensure it doesn't spike up and then return the system to previous state.

Assuming that it isn't doing this (and my gut is that it isn't), I would expect that AUTOSYS is actually having a recursion loop and truly exhausting the stack. If it appears to run longer (if the wall clock time is measurable) after raising maxssiz, that would support this theory. Additionally, you could/should check for a core file and see if the stack object within the core file is sized at or near maxssiz and what the stack trace was for the application at the time.

A. Clay Stephenson · ‎03-13-2007

That message is only generated when there really *is* a stack growth failure.

It would be more accurate to say that that message was probably generated on a stack growth failure but the actual message is entirely dependent upon what signal handler was in place for SIGSEGV for the process at the time.

Autosys is extremely easy to misconfigure; I would suggest that you contact Computer Associates for support.

If it ain't broke, I can fix that.

Nick Raceu · ‎03-13-2007

Unfortunatley i dont have support from Ca and also, nothing has been changed on the AUTOSYS app, just the DST patch for the HP-UX and the reboot.
Nick

Dennis Handly · ‎03-13-2007

This error is typically a coding error caused by infinite recursion. Setting maxssiz larger will only make your core files bigger.

>Don: I would expect that AUTOSYS is actually having a recursion loop and truly exhausting the stack.

Exactly.

>Don: you could/should check for a core file and see if the stack object within the core file is sized at or near maxssiz and what the stack trace was for the application at the time.

Having the same set of function in a stack trace will show this. You need to use gdb's "bt" command:
$ gdb autosys-exec core
(gdb) bt

>Clay: but the actual message is entirely dependent upon what signal handler was in place for SIGSEGV for the process at the time.

You have to go out of your way to provide a handler for stack overflow. You have to call sigstack(2), sigaltstack(2) or sigspace(2).

So if you get the message, you have a stack overflow.

Don Morris_1 · ‎03-14-2007

No, I'm serious. That message is generated when a virtual fault is taken, we determine that it is a stack fault -- but the stack fault can not be satisfied. It is a kernel generated message -- it won't matter what the signal handlers do with the SIGSEGV, that message is generated first. Only an unsatisfied stack fault will generate it.

Nick Raceu · ‎03-14-2007

Hi All,

As I said initially, I have not much experience in HP-UX (actually Unix in general), however now I got an error from OMNIBACKUP on the same box, and there were no changes done on omnibackup nor autosys, and they were working for the last 5 years, so it looks to me more as a problem on the server instead of a bad application.

"operator@chapters:/home/operator> xomni
Starting GUI...
Please wait, this may take some time...

Pid 4150 received a SIGSEGV for stack growth failure.
Possible causes: insufficient memory or swap space,
or stack size exceeded maxssiz.
/opt/omni/bin/xomni[122]: 4150 Memory fault(coredump)"

Thanks and I appreciate all help received.

Dennis Handly · ‎03-14-2007

>Don: No, I'm serious.

I'm not sure who you were responding to here?

>but the stack fault can not be satisfied. It is a kernel generated message -- it won't matter what the signal handlers do with the SIGSEGV, that message is generated first.

If by this you mean that "satisfied" can be done with sigstack(2), sigaltstack(2) or sigspace(2), then there won't be that message?

>so it looks to me more as a problem on the server instead of a bad application.

As long as you have maxssiz reasonable and swapspace, you should be ok.

Pid 4150 received a SIGSEGV for stack growth failure. ...
/opt/omni/bin/xomni[122]: 4150 Memory fault(coredump)"

As suggested, you should use gdb to get a stack trace, that may suggest other solutions.

You need to look at line 122 of xomni to see what executable was being run.

Don Morris_1 · ‎03-15-2007

Dennis> "I'm not sure who you were responding to here?"

I was responding to Mr. Stephenson -- manual quoting is the bane of my existence.

Dennis> "If by this you mean that "satisfied" can be done with sigstack(2), sigaltstack(2) or sigspace(2), then there won't be that message?"

Nope -- I mean statisfied from VM's/the fault handlers point of view (the virtual address is legal for the process, we can grow the stack to cover it, all physical/lockable/swap reservations are satisfied, etc.). If VM can't satisfy the fault -- you get the message, and the process gets a SIGSEGV.
The sig calls you mention will relate to how the process handles the SIGSEGV (it may well handle it), but the kernel will still log that we sent it for this case.

Dennis> "As suggested, you should use gdb to get a stack trace, that may suggest other solutions."

Seconded.

Nick Raceu · ‎03-15-2007

After all looks like a coruption in the autosys database caused the problem.

Dennis Handly · ‎03-15-2007

>Don: If VM can't satisfy the fault -- you get the message, and the process gets a SIGSEGV. The sig calls you mention will relate to how the process handles the SIGSEGV (it may well handle it), but the kernel will still log that we sent it for this case.

That has not been my experience. (I just tried it.) If you call sig, you don't get the message because the stack overflow can be handled.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: received a SIGSEGV for stack growth failure

received a SIGSEGV for stack growth failure