Operating System - HP-UX
1834714 Members
2383 Online
110069 Solutions
New Discussion

process dying (captured by tusc)

 
Ciaran Byrne
Advisor

process dying (captured by tusc)

Hi,
I am having an issue with application crashes for certain parts of the process. I pointed tusc at the pid and was able to catch the process before failing. Here is what is in the log (tusc capturing pid and failed system call).
This obviously looks like the application can not handle some system call but which piece I am having trouble with. Can anybody help interpret what this means

[26033] ioctl(52, TCGETA, 0x7a6fb5b8) ........................................................................ ERR#25
ENOTTY
[26033] ioctl(52, TCGETA, 0x7a6fb578) ........................................................................ ERR#25
ENOTTY
[26033] ioctl(52, TCGETA, 0x7a6fb578) ........................................................................ ERR#25
ENOTTY
[26033] sigtimedwait(0x7ac66198, NULL, 0x7ac661b8) ........................................................... ERR#11
EAGAIN
[26033] ioctl(64, TCGETA, 0x7a6fb538) ........................................................................ ERR#25
ENOTTY
[26033] sigtimedwait(0x7a834858, NULL, 0x7a834878) ........................................................... ERR#11
EAGAIN
[26033] ksleep(PTH_CONDVAR_OBJECT, 0x40018190, 0x40018198, 0x7f7f37bc) ....................................... = -ETI
MEDOUT
[26033] ioctl(52, TCGETA, 0x7a6fc238) ........................................................................ ERR#25
ENOTTY
[26033] ioctl(52, TCGETA, 0x7a6fc1f8) ........................................................................ ERR#25
ENOTTY
[26033] ioctl(52, TCGETA, 0x7a6fc1f8) ........................................................................ ERR#25
ENOTTY
[26033] Received signal 11, SIGSEGV, in user mode, [SIG_DFL], partial siginfo
[26033] Siginfo: si_code: I_NONEXIST, faulting address: 0x3a6f5349, si_errno: 0
[26033] PC: 0xc16f44fb, instruction: 0x0c3f1200
[26033] exit(11) [implicit] .................................................................................. WIFSIG
NALED(SIGSEGV)|WCOREDUMP

Thank you,
Ciaran
5 REPLIES 5
Tom Danzig
Honored Contributor

Re: process dying (captured by tusc)

Just a thought ... are you trying to run via cron or some other method not via a terminal? Error #25 as defined in error.h is "Not a typewriter". Perhaps the process needs a terminal attached to it for stdin?

The ERROR#25 may be tusc specific and not releated to HP-UX errors though.
Ciaran Byrne
Advisor

Re: process dying (captured by tusc)

thanks for your response.
This is not run by cron but an application which has multiple processes associated with it i.e. multiple engines. These are sporadically dying. The npty parameter is set to 128.

Regards,
Ciaran
Rick Beldin
HPE Pro

Re: process dying (captured by tusc)

I think that the answer to this lies not in system space, but in user space:
[26033] Received signal 11, SIGSEGV, in user mode, [SIG_DFL], partial siginfo
[26033] Siginfo: si_code: I_NONEXIST, faulting address: 0x3a6f5349, si_errno: 0
[26033] PC: 0xc16f44fb, instruction: 0x0c3f1200

I would work to make sure that this application generated a core file and then use gdb to analyze it. If it runs setuid to root, you will never get a core. The process running the program needs write access to the directory from which is was started. In the case of some daemons on HP-UX, we do a touch core and then a chmod 666 core to make sure that there is a core file it can write to. If you've prevented a core by creating a directory called core, you'll need to remove that. Make sure that ulimit isn't cutting off the end of your core either.
Necessary questions: Why? What? How? When?
T G Manikandan
Honored Contributor

Re: process dying (captured by tusc)

What is the application you are running?

check whether you have set these kernel parameters to recommended values.


MAXDSIZ
MAXSSIZ

What is the memory and swap usage on the machine

THanks
Mike Stroyan
Honored Contributor

Re: process dying (captured by tusc)

The failed system calls don't look very ominous. The ENOTTY errors on
ioctl happen all the time in code that handles both tty and non-tty file
descriptors. The other errors are just timeouts.

This program died trying to do a
"STB r31,0(r1)"

instruction into address "0x3a6f5349". That is not a reasonable
address. It looks like part of a string. Interpreted as chars it reads
":oSI". This definitely looks like a memory corruption error in the
application. It might be a simple buffer overrun with a very long
string. Some sleuthing with a debugger such as wdb could find the rest
of that string in memory to further characterize the problem. This
really is a job for the application developers.