- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- HPUX: program stuck in TE_do_list()
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-20-2009 05:30 AM - last edited on тАО10-08-2012 11:29 PM by Maiko-I
тАО10-20-2009 05:30 AM - last edited on тАО10-08-2012 11:29 PM by Maiko-I
HPUX: program stuck in TE_do_list()
Hi,
we have a problem with a process getting stuck in a CPU loop after it appears to exit cleanly.
A tusc of the process shows this
Siginfo: si_code: SEGV_MAPERR, faulting address: 0xffffffef7efdd0, si_errno: 0
PC: 00000001000000a0.0 break.m 0x16000
Received signal 11, SIGSEGV, in user mode, [0x9fffffffef54ddd0], partial siginfo
Siginfo: si_code: SEGV_MAPERR, faulting address: 0xffffffef7efdd0, si_errno: 0
PC: 00000001000000a0.0 break.m 0x16000
...
a pstack of the process does not show anything within our library.
$ /usr/ccs/bin/pstack 14855
----------------------- lwpid : 2365377 -------------------------------
0: c00000000004f561 : TE_do_list() + 0x2d1 (/usr/lib/hpux64/dld.so)
1: c000000000054e60 : TE_do_program_exit() + 0x300 (/usr/lib/hpux64/dld.so)
2: c0000000002bfc50 : (unknown) () (unknown)
-------------------------------- lwpid : 2365378 -------------------------------
0: c0000000003547d0 : (unknown) () (unknown)
Our signal handle should catch SIG_SEGV and abort(). Of course this is not reproducible,
This is a multi-threaded 64 bit application, which is a mixture of C/C++ running on HPUX 11.23
Any clues as to what could be causing this would be very helpful.
Thanks
Alan
P.S. This thread has been moved from HP-UX > General to HP-UX > Languages and Scripts - HP Forums Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-20-2009 06:03 AM
тАО10-20-2009 06:03 AM
Re: HPUX: program stuck in TE_do_list()
It seems to be the process is causing an invalid memory reference, or segmentation fault during the execution in cpu loop.
http://www.opengroup.org/onlinepubs/009695399/basedefs/signal.h.html
- SIGSEGV
SEGV_MAPERR
Address not mapped to object.
- SIGSEGV
void * si_addr
Address of faulting memory reference.
Debuging SIGSEGV:
http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=703638
Also to check pthread , libc patches & also if any resource contention in regards to dbc_max_pct, shmmax / memory or swap space.
Hth,
Raj.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-20-2009 07:17 AM
тАО10-20-2009 07:17 AM
Re: HPUX: program stuck in TE_do_list()
we dont think that its a problem with system resource, as far as we can tell the system has GBytes of memory free when this happens. The process itself is using under 200MBytes.
Our signal handler would normally catch signals such as SIGSEGV. For example if I run via gdb and force malloc/realloc to return null I get this
hpi2-~/ora: gdb simple
HP gdb 5.2.03 for HP Itanium (32 or 64 bit) and target HP-UX 11.2x.
Copyright 1986 - 2001 Free Software Foundation, Inc.
Hewlett-Packard Wildebeest 5.2.03 (based on GDB) is covered by the
GNU General Public License. Type "show copying" to see the conditions to
change it and/or distribute copies. Type "show warranty" for warranty/support.
..
(gdb) set heap-check null-check-size 50000
(gdb) run
Starting program: /home/alan/ora/simple
jrealloc: Error 0
Program received signal SIGSEGV, Segmentation fault
si_code: 1 - SEGV_MAPERR - Address not mapped to object.
[Switching to process 9924]
0x9fffffffef6ad4c0:1 in __milli_strlen+0x41 ()
from /home/alan/5.0_rels/jbc5.0.20/lib/libjbase.so
(gdb) c
Continuing.
jBASE: Segmentation violation. Aborting
Program received signal SIGABRT, Aborted
si_code: -1 - Unknown si_code. Report to HP..
[Switching to process 9924]
0x9fffffffec1f9890:0 in kill+0x30 () from /usr/lib/hpux64/libc.so.1
(gdb) c
Continuing.
Program received signal SIGABRT, Aborted
si_code: -1 - Unknown si_code. Report to HP..
[Switching to process 9924]
0x9fffffffec1f9890:0 in kill+0x30 () from /usr/lib/hpux64/libc.so.1
(gdb) c
Continuing.
Program terminated with signal SIGABRT, Aborted.
The program no longer exists.
(gdb) quit
hpi2-~/ora: gdb simple core
HP gdb 5.2.03 for HP Itanium (32 or 64 bit) and target HP-UX 11.2x.
Copyright 1986 - 2001 Free Software Foundation, Inc.
Hewlett-Packard Wildebeest 5.2.03 (based on GDB) is covered by the
GNU General Public License. Type "show copying" to see the conditions to
change it and/or distribute copies. Type "show warranty" for warranty/support.
..
Core was generated by `simple'.
Program terminated with signal 6, Aborted.
SI_UNKNOWN - signal of unknown origin
#0 0x9fffffffec1f9890:0 in kill+0x30 () from /usr/lib/hpux64/libc.so.1
(gdb) where
#0 0x9fffffffec1f9890:0 in kill+0x30 () from /usr/lib/hpux64/libc.so.1
#1 0x9fffffffec11e1d0:0 in raise+0x30 () from /usr/lib/hpux64/libc.so.1
#2 0x9fffffffec1baf90:0 in abort+0x190 () from /usr/lib/hpux64/libc.so.1
#3 0x9fffffffef38b270:0 in SynchronousSignalHandler () at jediSignalUnix.c:531
#4
#5 0x9fffffffef6ad4c0:1 in __milli_strlen+0x41 ()
so you can see we get an error in a realloc, which causes a SIGSEGV, which leads to us calling abort.
In my real scenario, the process seems to have run to completion, and from what we can tell has called exit (all output indicates this), and then just got stuck in TE_do_list() which is in /usr/lib/hpux64/dld.so.
This is random, and only occurs rarely, and only on the production system (never on Test).
If we knew what TE_do_list(), was trying to do then we might be able to replicate it.
Thanks again
Alan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-20-2009 07:32 AM
тАО10-20-2009 07:32 AM
Re: HPUX: program stuck in TE_do_list()
>>
(gdb) set heap-check null-check-size 50000
(gdb) run
Starting program: /home/alan/ora/simple
jrealloc: Error 0
Program received signal SIGSEGV, Segmentation fault
- it makes sense , SIGSEGV produced with address not mapped error.
I hope you are using gcc to compile the code .
- Could you check the below link ,
http://www.mail-archive.com/gcc-bugs@gcc.gnu.org/msg255898.html
It look like there is a bug with gcc passing through TE_do_list () function (in #17) and similarly exiting with library error. (#18)
Hth,
Raj.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-20-2009 07:39 AM
тАО10-20-2009 07:39 AM
Re: HPUX: program stuck in TE_do_list()
we use HP's C/C++ compiler aCC, and not gcc
Alan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-20-2009 08:30 AM
тАО10-20-2009 08:30 AM
Re: HPUX: program stuck in TE_do_list()
The top frame shows that the dynamic loader dld.so is trying to invoke terminator functions of shared libraries on program exit.
The faulting PC shown by tusc does not seem to be a good address, but you could try and determine where it lies, through gdb.
It is possible that these are addresses that the dynamic loader is using as addresses of terminator functions, possibly because of memory corruption by the application. Please check for memory corruption using gdb and/or vaccine (cadvise).
ranga
--
ranga
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-21-2009 12:35 AM - edited тАО10-08-2012 09:07 PM
тАО10-21-2009 12:35 AM - edited тАО10-08-2012 09:07 PM
Re: HP-UX: program stuck in TE_do_list()
It looks like dld is getting a signal. Do you have the latest dld patch? PHSS_39822
In some cases, dld blocks signals, causing infinite loops.
Have you tried using gdb?
>This is random, and only occurs rarely, and only on the production system
Do you have a corefile? Or if it loops, can you attach with gdb?
>If we knew what TE_do_list(), was trying to do then we might be able to replicate it.
Run shlib terminators and do run C++ static destruction.
>raj: passing through TE_do_list function (in #17)
#17 should be calling compiler generated routines like #16, __do_global_dtors_aux (g++).
>ranga: The faulting PC shown by tusc does not seem to be a good address
It never does for Integrity.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-22-2009 02:03 AM
тАО10-22-2009 02:03 AM
Re: HPUX: program stuck in TE_do_list()
I managed to replicate this issue on a local dev machine (running HPUX 11.31 ), although Its very random (I had to run more than 10,000 processes before I got this one)
If I attach to the process using gdb, and try and get a stack trace, then gdb exits wih a SIGSEGV :-( ( sam thing happens if I generate a core file using gcore and try to use gdb on this)
CPU TTY PID USERNAME PRI NI SIZE RES STATE TIME %WCPU %CPU COMMAND
0 ? 5319 alan 152 20 101M 6128K run 853:14 101.11 100.93 Loop1
hpitv3-~/ora: pstack 24764
24764: ./Loop1
-------------------------------- lwpid : 7794933 -------------------------------
-1: c000000000435e10 : [ sendsig ]
1: c00000000004f561 : TE_do_list() + 0x2d1 (/usr/lib/hpux64/dld.so)
2: c000000000054e60 : TE_do_program_exit() + 0x300 (/usr/lib/hpux64/dld.so)
3: c00000000037cd50 : (unknown) () (unknown)
-------------------------------- lwpid : 7794934 -------------------------------
0: c000000000435e10 : (unknown) () (unknown)
hpitv3-~/ora: gcore 5319
hpitv3-~/ora: ls -l core*
-rw------- 1 alan users 6042872 Oct 22 09:50 core.5319
goORA hpitv3-~/ora: file core*
core.5319: ELF-64 core file - IA64 from 'Loop1'
hpitv3-~/ora: /opt/langtools/bin/gdb Loop1 core.5319
HP gdb 5.9 for HP Itanium (32 or 64 bit) and target HP-UX 11.2x.
Copyright 1986 - 2001 Free Software Foundation, Inc.
Hewlett-Packard Wildebeest 5.9 (based on GDB) is covered by the
GNU General Public License. Type "show copying" to see the conditions to
change it and/or distribute copies. Type "show warranty" for warranty/support.
..
Core was generated by `Loop1'.
warning: Load module /home/oracle/10gR2/oracle/product/client_1/lib/libclntsh.so.10.1 has been stripped.
Debugging information is not available.
warning: Load module /home/oracle/10gR2/oracle/product/client_1/lib/libnnz10.so has been stripped.
Debugging information is not available.
#0 0xe0000001085c6c60 in
(1) 0xe0000001085c6c80 ---- Signal 11 (SIGSEGV) delivered ----
(2) 0x400000000033d9e1 internalize_unwinds + 0x5c1 [/opt/langtools/bin/gdb]
(3) 0x400000000033be50 read_unwind_info + 0x2d0 [/opt/langtools/bin/gdb]
(4) 0x400000000033ba60 find_unwind_entry + 0x250 [/opt/langtools/bin/gdb]
(5) 0x4000000000473930 print_frame + 0x1530 [/opt/langtools/bin/gdb]
(6) 0x400000000046c8b0 print_frame_info_base + 0x7d0 [/opt/langtools/bin/gdb]
(7) 0x40000000004db630 print_stack_frame_stub + 0x70 [/opt/langtools/bin/gdb]
(8) 0x4000000000423740 catch_errors + 0x1a0 at ../../../Src/gnu/gdb/top.c:746 [/opt/langtools/bin/gdb]
(9) 0x40000000004db590 print_stack_frame + 0x70 [/opt/langtools/bin/gdb]
(10) 0x4000000000647340 core_open + 0x810 at ../../../Src/gnu/gdb/corelow.c:168 [/opt/langtools/bin/gdb]
(11) 0x40000000006aae30 core_file_command + 0xd0 at ../../../Src/gnu/gdb/corefile.c:114 [/opt/langtools/bin/gdb]
(12) 0x40000000004b2a00 do_captured_command + 0x60 at ../../../Src/gnu/gdb/top.c:823 [/opt/langtools/bin/gdb]
(13) 0x4000000000423740 catch_errors + 0x1a0 at ../../../Src/gnu/gdb/top.c:746 [/opt/langtools/bin/gdb]
(14) 0x40000000004b2980 catch_command_errors + 0x60 at ../../../Src/gnu/gdb/top.c:788 [/opt/langtools/bin/gdb]
(15) 0x4000000000185fd0 captured_main + 0x25a0 [/opt/langtools/bin/gdb]
(16) 0x4000000000423740 catch_errors + 0x1a0 at ../../../Src/gnu/gdb/top.c:746 [/opt/langtools/bin/gdb]
(17) 0x40000000001839e0 main + 0x60 [/opt/langtools/bin/gdb]
(18) 0xc000000000032f90 main_opd_entry + 0x50 [/usr/lib/hpux64/dld.so]
GDB crashed with signal 11! About to dump core into 'core' in the directory:
/home/alan/ora
Select one of the following options...
[N] No, do not dump core
[Y] Yes, dump core (default)
NOTE: Make sure to rename any existing core file in this
directory, as gdb's core will overwrite it.
[C] Continue execution (at your own risk)
> N
hpitv3-~/ora:
Alan
- Tags:
- gdb
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-22-2009 02:19 AM - edited тАО10-08-2012 09:07 PM
тАО10-22-2009 02:19 AM - edited тАО10-08-2012 09:07 PM
Re: HP-UX: program stuck in TE_do_list()
>then gdb exits with a SIGSEGV
Try downloading gdb 6.0?
http://www.hp.com/go/wdb
>same thing happens if I generate a core file using gcore and try to use gdb on this)
At least if you can get gdb fixed, you won't have to create 10,000 processes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-23-2009 01:00 AM
тАО10-23-2009 01:00 AM
Re: HPUX: program stuck in TE_do_list()
hpitv3-~/ora: /opt/langtools/bin/gdb Loop1 core.5319
HP gdb 6.0 for HP Itanium (32 or 64 bit) and target HP-UX 11iv2 and 11iv3.
Copyright 1986 - 2009 Free Software Foundation, Inc.
Hewlett-Packard Wildebeest 6.0 (based on GDB) is covered by the
GNU General Public License. Type "show copying" to see the conditions to
change it and/or distribute copies. Type "show warranty" for warranty/support.
..
Core was generated by `Loop1'.
warning: Load module /home/oracle/10gR2/oracle/product/client_1/lib/libclntsh.so.10.1 has been stripped.
Debugging information is not available.
warning: Load module /home/oracle/10gR2/oracle/product/client_1/lib/libnnz10.so has been stripped.
Debugging information is not available.
#0 0xe0000001085c6c60 in
(1) 0xe0000001085c6c80 ---- Signal 11 (SIGSEGV) delivered ----
(2) 0x40000000002cc141 internalize_unwinds + 0x5e1 [/opt/langtools/bin/gdb]
(3) 0x40000000002ca4d0 read_unwind_info + 0x2d0 [/opt/langtools/bin/gdb]
(4) 0x40000000002c9ec0 find_unwind_entry + 0x260 [/opt/langtools/bin/gdb]
(5) 0x40000000003925f0 print_frame + 0x1450 at ../../../Src/gnu/gdb/stack.c:4732 [/opt/langtools/bin/gdb]
(6) 0x40000000003867d0 print_frame_info_base + 0x750 at ../../../Src/gnu/gdb/stack.c:4732 [/opt/langtools/bin/gdb]
(7) 0x400000000037c390 print_stack_frame_stub + 0x70 at ../../../Src/gnu/gdb/stack.c:4732 [/opt/langtools/bin/gdb]
(8) 0x40000000001ed0d0 catch_errors + 0x190 [/opt/langtools/bin/gdb]
(9) 0x400000000037c2f0 print_stack_frame + 0x70 at ../../../Src/gnu/gdb/stack.c:4732 [/opt/langtools/bin/gdb]
(10) 0x400000000063c020 core_open + 0x830 at ../../../Src/gnu/gdb/corelow.c:180 [/opt/langtools/bin/gdb]
(11) 0x40000000006a1bb0 core_file_command + 0xd0 at ../../../Src/gnu/gdb/corefile.c:114 [/opt/langtools/bin/gdb]
(12) 0x40000000004421c0 do_captured_command + 0x60 at ../../../Src/gnu/gdb/top.c:823 [/opt/langtools/bin/gdb]
(13) 0x40000000001ed0d0 catch_errors + 0x190 [/opt/langtools/bin/gdb]
(14) 0x4000000000442140 catch_command_errors + 0x60 at ../../../Src/gnu/gdb/top.c:788 [/opt/langtools/bin/gdb]
(15) 0x400000000024a500 captured_main + 0x29a0 [/opt/langtools/bin/gdb]
(16) 0x40000000001ed0d0 catch_errors + 0x190 [/opt/langtools/bin/gdb]
(17) 0x40000000001ecf00 main + 0x60 [/opt/langtools/bin/gdb]
(18) 0xc000000000032f90 main_opd_entry + 0x50 [/usr/lib/hpux64/dld.so]
GDB crashed with signal 11! About to dump core into 'core' in the directory:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-23-2009 01:29 AM - edited тАО10-08-2012 09:08 PM
тАО10-23-2009 01:29 AM - edited тАО10-08-2012 09:08 PM
Re: HP-UX: program stuck in TE_do_list()
>Already done that, same thing :-(
You might want to contact the Response Center about your loop.
For gdb support see:
http://h21007.www2.hp.com/portal/site/dspp/menuitem.863c3e4cbcdc3f3515b49c108973a801/?ciid=a8080f1bace021100f1bace02110275d6e10RCRD
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-27-2009 03:38 AM - edited тАО10-08-2012 09:08 PM
тАО10-27-2009 03:38 AM - edited тАО10-08-2012 09:08 PM
Re: HP-UX: program stuck in TE_do_list()
Have you contacted the Response Center yet? I see this defect as just being fixed:
QXCR1000892132: GDB5.9 SIG11 in internalize_unwinds when reading ia64 core