process dying (captured by tusc)

Ciaran Byrne · ‎09-06-2002

Hi,
I am having an issue with application crashes for certain parts of the process. I pointed tusc at the pid and was able to catch the process before failing. Here is what is in the log (tusc capturing pid and failed system call).
This obviously looks like the application can not handle some system call but which piece I am having trouble with. Can anybody help interpret what this means

[26033] ioctl(52, TCGETA, 0x7a6fb5b8) ........................................................................ ERR#25
ENOTTY
[26033] ioctl(52, TCGETA, 0x7a6fb578) ........................................................................ ERR#25
ENOTTY
[26033] ioctl(52, TCGETA, 0x7a6fb578) ........................................................................ ERR#25
ENOTTY
[26033] sigtimedwait(0x7ac66198, NULL, 0x7ac661b8) ........................................................... ERR#11
EAGAIN
[26033] ioctl(64, TCGETA, 0x7a6fb538) ........................................................................ ERR#25
ENOTTY
[26033] sigtimedwait(0x7a834858, NULL, 0x7a834878) ........................................................... ERR#11
EAGAIN
[26033] ksleep(PTH_CONDVAR_OBJECT, 0x40018190, 0x40018198, 0x7f7f37bc) ....................................... = -ETI
MEDOUT
[26033] ioctl(52, TCGETA, 0x7a6fc238) ........................................................................ ERR#25
ENOTTY
[26033] ioctl(52, TCGETA, 0x7a6fc1f8) ........................................................................ ERR#25
ENOTTY
[26033] ioctl(52, TCGETA, 0x7a6fc1f8) ........................................................................ ERR#25
ENOTTY
[26033] Received signal 11, SIGSEGV, in user mode, [SIG_DFL], partial siginfo
[26033] Siginfo: si_code: I_NONEXIST, faulting address: 0x3a6f5349, si_errno: 0
[26033] PC: 0xc16f44fb, instruction: 0x0c3f1200
[26033] exit(11) [implicit] .................................................................................. WIFSIG
NALED(SIGSEGV)|WCOREDUMP

Thank you,
Ciaran

Tom Danzig · ‎09-06-2002

Just a thought ... are you trying to run via cron or some other method not via a terminal? Error #25 as defined in error.h is "Not a typewriter". Perhaps the process needs a terminal attached to it for stdin?

The ERROR#25 may be tusc specific and not releated to HP-UX errors though.

Ciaran Byrne · ‎09-06-2002

thanks for your response.
This is not run by cron but an application which has multiple processes associated with it i.e. multiple engines. These are sporadically dying. The npty parameter is set to 128.

Regards,
Ciaran

Rick Beldin · ‎09-09-2002

I think that the answer to this lies not in system space, but in user space:
[26033] Received signal 11, SIGSEGV, in user mode, [SIG_DFL], partial siginfo
[26033] Siginfo: si_code: I_NONEXIST, faulting address: 0x3a6f5349, si_errno: 0
[26033] PC: 0xc16f44fb, instruction: 0x0c3f1200

I would work to make sure that this application generated a core file and then use gdb to analyze it. If it runs setuid to root, you will never get a core. The process running the program needs write access to the directory from which is was started. In the case of some daemons on HP-UX, we do a touch core and then a chmod 666 core to make sure that there is a core file it can write to. If you've prevented a core by creating a directory called core, you'll need to remove that. Make sure that ulimit isn't cutting off the end of your core either.

Necessary questions: Why? What? How? When?

T G Manikandan · ‎09-09-2002

What is the application you are running?

check whether you have set these kernel parameters to recommended values.

MAXDSIZ
MAXSSIZ

What is the memory and swap usage on the machine

THanks

Mike Stroyan · ‎09-12-2002

The failed system calls don't look very ominous. The ENOTTY errors on
ioctl happen all the time in code that handles both tty and non-tty file
descriptors. The other errors are just timeouts.

This program died trying to do a
"STB r31,0(r1)"

instruction into address "0x3a6f5349". That is not a reasonable
address. It looks like part of a string. Interpreted as chars it reads
":oSI". This definitely looks like a memory corruption error in the
application. It might be a simple buffer overrun with a very long
string. Some sleuthing with a debugger such as wdb could find the rest
of that string in memory to further characterize the problem. This
really is a job for the application developers.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

process dying (captured by tusc)

process dying (captured by tusc)

Re: process dying (captured by tusc)

Re: process dying (captured by tusc)

Re: process dying (captured by tusc)

Re: process dying (captured by tusc)

Re: process dying (captured by tusc)