1833802 Members
2235 Online
110063 Solutions
New Discussion

How to trace processes ?

 
SOLVED
Go to solution
Binu Raj
Occasional Contributor

How to trace processes ?


Hi

One of our application process dies down in the customer place without any error.
This process is suppose to listen to a tcp port and spawn new process when a new client tries to connect to the application.

I have analized the syslog file and kernel parameters. Everything looks fine. Can someone suggest how to trace (What are the system calls the process uses and log it periodically so that we can analize how the process died.)this process and find out why this happens, please. And the possible reasons for one process to die when there is enough resources?

Customer had sent me some text which he says is a trace on the process. I could not make out which tool he used.

Following is the trace.

[27147] accept(0, 0x7f7f07a8, 0x7f7f07bc) ................ ERR#233 ENOBUFS
[27147] open("/usr/lib/nls/msg//strerror.cat", O_RDONLY, 014346) ERR#2
ENOENT
[27147] open("/usr/lib/nls////strerror.cat", O_RDONLY, 014344) ERR#2 ENOENT
[27147] write(2, "a c c e p t : N o b u f f e ".., 34) = 34
[27147] close(0) ......................................... = 0
[27147] write(2, "* * N L L A ", 8) .................. = 8
[27147] write(2, "B ", 1) ................................ = 1
[27147] write(2, "E N D N = 3 ", 8) .................. = 8
[27147] write(2, "S ", 1) ................................ = 1
[27147] write(2, "= 2 3 3 * * , ", 8) .................. = 8
[27147] write(2, " ", 1) ................................ = 1
[27147] write(2, "M a s t e r n ", 8) .................. = 8
[27147] write(2, "o ", 1) ................................ = 1
[27147] Received signal 18, SIGCLD, in write(), [caught], no siginfo
[27147] write(2, "t s t a r t e ", 8) .................. = 8
[27147] waitpid(-1, WIFEXITED(0), WNOHANG) ............... = 6309
[27147] waitpid(-1, WIFEXITED(0), WNOHANG) ............... = 0
[27147] write(2, "d ", 1) ................................ = 1
[27147] write(2, "! \n", 2) .............................. = 2
[27147] exit(4) .......................................... WIFEXITED(4)

Thanks in advance.

Binu Raj
2 REPLIES 2
Carsten Krege
Honored Contributor
Solution

Re: How to trace processes ?

Your customer used tusc or trace (can be downloaded on http://gatekeep.cs.utah.edu) to trace the unix system calls.
The program fails in the accept() system call with error number 233 (=ENOBUFS = "No buffer space available", see /usr/include/sys/errno.h).

In the man page of accept(2) you find:

[ENOBUFS] No buffer space is available. The accept() cannot complete. The queued socket connect request is aborted.

It might be possible that a TCP RST is sent from the remote client to the socket and this is therefore an error that should be handled by the program then.


In PHNE_19110 (replaced by PHNE_23456)
s700_800 11.00 cumulative ARPA Transport patch you find the following problem description though:

When a new connect request arrives at the local TCP,
and it is immediately followed by a RESET from the
remote system, the server application is awakened
twice to perform accept() calls. Each accept() call
is returned with ENOBUFS.
Resolution:
ENOBUFS is the correct return value.
The problem is the accepting server application should
not be awakened twice. This was caused by mishandling
the connection id in TPI messages. This problem is
fixed by correctly tracking the connection id between
the socket and TCP layers.

I recommend to install PHNE_23456 (+ dependent patches!) and to check whether the problem still occurs.

Carsten



-------------------------------------------------------------------------------------------------
In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move. -- HhGttG
Vincenzo Restuccia
Honored Contributor

Re: How to trace processes ?

s700_800 11.00 LAN product cumulative patch(PHNE_21217)
s700_800 11.00 cumulative ARPA Transport patch(PHNE_21767)
s700_800 11.00 HP-PB FDDI (J2157B) product cumulative patch(PHNE_19633)



s800 10.20 EISA 100VG-AnyLAN product patch(PHNE_13650)
s800 10.20 HP-PB 100BT cumulative patch(PHNE_21884)