Operating System - HP-UX
1829378 Members
5112 Online
109991 Solutions
New Discussion

Core dump with BUS_ADRALN error

 
SOLVED
Go to solution
Laurent Menase
Honored Contributor

Re: Core dump with BUS_ADRALN error

br6: 0xc0000000000ba2a0

so you could look what is the function at br6

disass *0xc0000000000ba2a0
K!rn Kumr
Frequent Advisor

Re: Core dump with BUS_ADRALN error

(gdb) disas *0xc0000000000ba2a0
Attempt to dereference a non-pointer value.
(gdb) disass *0xc0000000000ba2a0
Attempt to dereference a non-pointer value.
K!rn Kumr
Frequent Advisor

Re: Core dump with BUS_ADRALN error

(gdb) disas 0xc0000000000ba2a0
Dump of assembler code for function pthread_getspecific:
0xc0000000000ba2a0:0 :
addl r9=0x450,r1
0xc0000000000ba2a0:1 : sxt4 r14=r32
0xc0000000000ba2a0:2 :
addl r8=0x45c,r1
0xc0000000000ba2b0:0 :
cmp4.ge.unc p6=r32,r0;;
0xc0000000000ba2b0:1 :
ld8 r10=[r9]
0xc0000000000ba2b0:2 :
addl r9=-216,r1
0xc0000000000ba2c0:0 :
(p6) ld4 r11=[r8]
0xc0000000000ba2c0:1 :
mov r8=0;;
0xc0000000000ba2c0:2 :
shladd r15=r14,4,r10
0xc0000000000ba2d0:0 :
(p6) cmp4.lt.unc p6=r32,r11;;
0xc0000000000ba2d0:1 :
(p6) ld1 r10=[r15]
0xc0000000000ba2d0:2 :
---Type to continue, or q to quit---
nop.i 0x0;;
0xc0000000000ba2e0:0 :
(p6) cmp4.eq.unc p6=1,r10
0xc0000000000ba2e0:1 :
nop.m 0x0
0xc0000000000ba2e0:2 :
(p6) br.cond.dpnt.few 0xc0000000000ba300;;
0xc0000000000ba2f0:0 :
nop.m 0x0
0xc0000000000ba2f0:1 :
nop.m 0x0
0xc0000000000ba2f0:2 :
br.ret.sptk.few b0;;
0xc0000000000ba300:0 :
ld8 r8=[r9]
0xc0000000000ba300:1 :
nop.i 0x0;;
0xc0000000000ba300:2 :
add r9=r8,r13;;
0xc0000000000ba310:0 :
ld8 r8=[r9]
0xc0000000000ba310:1 :
nop.i 0x0;;
---Type to continue, or q to quit---
0xc0000000000ba310:2 :
adds r9=0x160,r8;;
0xc0000000000ba320:0 :
ld8 r8=[r9];;
0xc0000000000ba320:1 :
shladd r9=r14,4,r8
0xc0000000000ba320:2 :
cmp.eq.unc p7,p6=r0,r8;;
0xc0000000000ba330:0 :
(p6) adds r9=8,r9
0xc0000000000ba330:1 :
(p7) mov r8=0;;
0xc0000000000ba330:2 :
nop.i 0x0
0xc0000000000ba340:0 :
(p6) ld8 r8=[r9]
0xc0000000000ba340:1 :
nop.m 0x0
0xc0000000000ba340:2 :
br.ret.sptk.few b0;;
End of assembler dump.
(gdb)
Dennis Handly
Acclaimed Contributor
Solution

Re: Core dump with BUS_ADRALN error

>(gdb) x /i *(void**)($r44+24)
0xc0000000001a54a0:0 :

You have everything you need to blame it on the third party. :-)
gethostent(3N) says:
struct hostent *gethostbyname(const char *name);
struct hostent {
char *h_name; // 0
char **h_aliases; // 8
int h_addrtype; int h_length; // 16
char **h_addr_list; // 24

Inside get_tcp_service try "info local".
See if you find a local of type struct hostent. You may have to use "ptype" on each.

I suppose you could just use:
p *(struct hostent*)$r38

Basically it seems the code in get_tcp_service thinks that h_addr_list can be treated as a long instead of an int. Or a struct in_addr.

gethostent(3n) has an example how to format h_addr_list:
http://docs.hp.com/en/B2355-60130/gethostent.3N.html


>Laurent: br6: 0xc0000000000ba2a0, so you could look what is the function at br6
>disass *0xc0000000000ba2a0

This won't work, that's why I showed my command. br6 isn't a callee save register. Also that "*" wouldn't be helpful.
K!rn Kumr
Frequent Advisor

Re: Core dump with BUS_ADRALN error

This is the info about the locals as well as the value at address u asked for

(gdb) frame 0
#0 get_tcp_service (service=0x400000000000d120 "TSC720",
ip_addr=0x9fffffffffffe1d8, port=0x9fffffffffffe1d0) at min_tcp_serv.c:146
146 in min_tcp_serv.c
(gdb) info local
host_entry = (struct hostent *) 0x60000000000c9f18
service_entry = (struct servent *) 0x0
tsc_ip_addr = (unsigned long *) 0x60000000000ca974
host = 0x9fffffffffffe8e7 "TSCTBR2"
valid_host = 0
result = 0
(gdb) p *(struct hostent*)$r38
$1 = {h_name = 0x60000000000c9f38 "emsss0a0.bc",
h_aliases = 0x60000000000c9f58, h_addrtype = 2, h_length = 4,
h_addr_list = 0x60000000000c9f48}
K!rn Kumr
Frequent Advisor

Re: Core dump with BUS_ADRALN error

You are really amazing and awesome your knowledge and at responding. Its really pleasure to interact with people like you.

Could you please help me learning that by sharing info like who to develop such kind of art of debugging. Please suggest any book or online guide anything for me to get this knowledge.
Dennis Handly
Acclaimed Contributor

Re: Core dump with BUS_ADRALN error

>host_entry = (struct hostent*) 0x60000000000c9f18

This is the return value. You can print some fields out:
p host_entry->h_aliases[0]
p host_entry->h_aliases[1]
(If the above doesn't fail, you can do more.)

x /4gx host_entry->h_addr_list

x /4bu host_entry->h_addr_list[0]
x /4bu host_entry->h_addr_list[1]
K!rn Kumr
Frequent Advisor

Re: Core dump with BUS_ADRALN error

(gdb) p host_entry->h_aliases[2]
$4 = 0x0

(gdb) x /4bu host_entry->h_addr_list[1]
0x0: Cannot access memory at address 0x0

But how can we know the root cause for the crash
Dennis Handly
Acclaimed Contributor

Re: Core dump with BUS_ADRALN error

>You are really amazing and ...

Please read the following about assigning points:
http://forums.itrc.hp.com/service/forums/helptips.do?#33

>Could you please help me learning that by sharing info like who to develop such kind of art of debugging.

wdb has a bunch of documentation and whitepapers:
http://www.hp.com/go/wdb
Debugging core files using HP WDB
Debugging dynamic memory usage errors using HP WDB
Debugging threads with HP Wilde Beest debugger

And Intel has various manuals about the Itanium instruction set.
Basically you need to know assembly language, procedure calling conventions, etc.

>But how can we know the root cause for the crash?

I already mentioned my guess:
Basically it seems the code in get_tcp_service thinks that h_addr_list can be treated as a unsigned long instead of an unsigned int. Or a struct in_addr.

gethostent(3n) has an example how to format h_addr_list:
http://docs.hp.com/en/B2355-60130/gethostent.3N.html

It would be helpful if you could provide the definition of struct in_addr. Probably in /usr/include/netinet/in.h.
struct in_addr {
in_addr_t s_addr;
};

Where in_addr_t should be uint32_t.

This old document mentions what happens in 64 bit mode: Chapter 4 64-Bit Capable Implementation, BSD Sockets: IP Addresses
http://docs.hp.com/en/B3782-90716/ch04s11.html
Dennis Handly
Acclaimed Contributor

Re: Core dump with BUS_ADRALN error

Some compiler documentation links:
http://www.hp.com/go/aCC
HP compilers for HP Integrity servers
Optimizing Itanium®-based applications
Inline assembly for Itanium®-based HP-UX

The first explains predication and speculation.
Dennis Handly
Acclaimed Contributor

Re: Core dump with BUS_ADRALN error

If you have nothing better to do while waiting for your third party vendor to fix their h_addr_list coding problem, you could do what is suggested in the gdb abort message:
BUS_ADRALN - Invalid address alignment. Please refer to the following link that helps in handling unaligned data:
http://docs.hp.com/en/14487/pragmas.htm#pragma-pack-ex4

By calling allow_unaligned_data_access and linking with -lunalign, you can get past that alignment fault and find the next problem.
K!rn Kumr
Frequent Advisor

Re: Core dump with BUS_ADRALN error

I have already given it a try by compiling my code with -lunalign. But its not my code accessing it. So it is again the 3rd party code that has to be fixed either by changing the code or compiling it with flag -lunalign(donno if it really works. So I really have nothing to do. Meanwhile I have one more doubt by looking at the frame0 how can you decide its fault in that frame. Can that be a repurcusion of fault occuring in someother frame before that. Actually frame3 and frame2 are third party code is for sure and frame1 and frame0 must be code on hp builtin source i guess...
Dennis Handly
Acclaimed Contributor

Re: Core dump with BUS_ADRALN error

>I have already given it a try by compiling my code with -lunalign. But it's not my code accessing it.

It doesn't matter who is messing up, calling allow_unaligned_data_access will handle it. (You would have to do it for each thread.)

>So it is again the 3rd party code that has to be fixed either by changing the code or compiling it with flag -lunalign

No, they must fix it by changing the code. Using libunalign is only a kludge so you can make progress and find your bugs.

>I have one more question by looking at the frame 0 how can you decide its fault in that frame?

That's how the hardware works and the debugger said so. If you get a fault/signal, it stops there. Unless you have a signal handler, then it can be deeper.

>Can that be a repercussion of fault occurring in some other frame before that?

Yes a bug in a previous frame could pass a bad address to the final frame.

>Actually frame 3 and frame 2 are third party code is for sure and frame 1 and frame 0 must be code on HP builtin source I guess.

No, frame 1 and frame 0 are likely third party code too. You can get some idea by:
(gdb) frame 0
(gdb) info source
...

K!rn Kumr
Frequent Advisor

Re: Core dump with BUS_ADRALN error

Hi...

I need your help again...my does not crash when i create a string and reserve a space for it.

string *str = new str;
str->reserve(3000);

If i dont reserve it is crashing. I need to append a values to that string dynamically with the values that i get from some server reponse. Do you have any idea why is it happening so. and from disas how do i know which register is what. pls help me in learning by telling few steps hw to start with assembly mode debugging...intial steps...
Dennis Handly
Acclaimed Contributor

Re: Core dump with BUS_ADRALN error

>does not crash when I create a string and reserve a space for it.
>string *str = new string;
>str->reserve(3000);

Note that using new on a string isn't that useful, since it only allocates 8 bytes, then later the space in a separate chunk.

>If I don't reserve it is crashing.
>I need to append a values to that string dynamically with the values that I get from some server response.

How are you accessing/appending to the string? Using str->append or str->operator+?

>Do you have any idea why is it happening so? and from disas how do I know which register is what?

Typically this is heap corruption and the string operation is the victim. Try compiling with +check=malloc,bounds

>pls help me in learning by telling few steps how to start with assembly mode debugging... initial steps.

You should start with a stack trace. Since C++ has lots of inlining, you may want to compile with +d so you see the names all of the functions. And remove +O2.

K!rn Kumr
Frequent Advisor

Re: Core dump with BUS_ADRALN error

hi i am using string append function.

Yeah I give bt then I see the frames. Later all I can do is check the values of the locals and args.

How do we give the values to the disas like $pc-16*x $pc+16*x...how do we get the value of x. Later I have gone through the instruction set for the ops like mov etc..
how to know what lies in the address...what register is for what stuff like this...
Dennis Handly
Acclaimed Contributor

Re: Core dump with BUS_ADRALN error

>I give bt then I see the frames.

Please post them here.

>How do we give the values to the disas like $pc-16*x $pc+16*x... how do we get the value of x?

You just want a window around the failing instruction so you can track where the register got its value. And which register so you can see the value.

>how to know what lies in the address? what register is for what stuff like this?

You just have to know and build that knowledge up each time.
You can look at the contents of a valid address with the x command:
x /8gx address
K!rn Kumr
Frequent Advisor

Re: Core dump with BUS_ADRALN error

What exactly does this line mean

0x4000000000034d00:0 :
alloc r45=ar.pfs,0,15,2,0

0x4000000000034d00 - is the address at which alloc instruction with respective values is stored rite???

what is r45?? we have many registers like program, general, pointer, etc etc.....

what is ar.pfs,0,15,2,0
K!rn Kumr
Frequent Advisor

Re: Core dump with BUS_ADRALN error

Hi Dennis,

When i am trying to compile my code with purify these are the messages that are displayed before the compilation fails.

libclntsh.so.9.0..........Unhandled indirect branch at 4000000000B0D828
Annot:found:@4000000000B0D828 switch: 15@40000000003108D0
Address : 4000000001153FA0
Missing BranchOut in Function at 0x4000000000429EA0
Address : 4000000001155DD0
Missing BranchOut in Function at 0x4000000000429EA0
Address : 4000000001156630
Missing BranchOut in Function at 0x4000000000429EA0
Address : 4000000001153FA0
This is only a part of huge file....it compiles successfully without purify.....
K!rn Kumr
Frequent Advisor

Re: Core dump with BUS_ADRALN error

My purify version of application crashes when loaded with core. Below is the trace for it

#0 0xc00000000cf6a0a0:0 in _KiLl+0x50 ()
from /hubd/home/id817019/cache/usr/lib/hpux64/libc.so.1_pure_p7_c0_109130125_
B1123_64_4371360
(gdb) bt
#0 0xc00000000cf6a0a0:0 in _KiLl+0x50 ()
from /hubd/home/id817019/cache/usr/lib/hpux64/libc.so.1_pure_p7_c0_109130125_
#1 0xc0000000057efe60:0 in _x_kill+0x40 ()
from /opt/rational/releases/purify.hpia.7.0.0.0-012/lib64/librtlib.so
#2 0xc00000000585c690:0 in reissue_signal+0xd0 ()
from /opt/rational/releases/purify.hpia.7.0.0.0-012/lib64/librtlib.so
#3 0xc00000000585cf30:0 in pure_signal_handler_wrapper+0x530 ()
from /opt/rational/releases/purify.hpia.7.0.0.0-012/lib64/librtlib.so
#4 0xc0000000058469d0:0 in pure_sigtramp+0xf0 ()
from /opt/rational/releases/purify.hpia.7.0.0.0-012/lib64/librtlib.so
#5
#6 0x0 in ()
warning: Attempting to unwind past bad PC 0x0
#7 0xc000000007fb1460:0 in pthread_once+0x660 ()
from /hubd/home/id817019/cache/usr/lib/hpux64/libpthread.so.1_pure_p7_c0_1091
#8 0xc00000000ca9ecb0:0 in _e_thread_once () at gpthread.c:479
#9 0xc00000000c827cc0:0 in _e_ipc_gshmget () at Ushm.c:345
#10 0xc00000000c663030:0 in _tmsmcreat () at shm/tmsmcreat.c:56
#11 0xc00000000c38e650:0 in _tmusrattch () at tmbbattch.c:446
#12 0xc00000000c24bfc0:0 in _tmbbhookup () at bbhookup.c:60
#13 0xc00000000c359f20:0 in _tmenrollsvr () at beserver.c:180
#14 0xc00000000c3657c0:0 in _tmstdinit () at stdmain.c:845
---Type to continue, or q to quit---
#15 0xc00000000c3605d0:0 in _tmmain () at stdmain.c:169
#16 0xc00000000c319980:0 in _tmstartserver () at tmstrtsrvr.c:110
#17 0x4000000000014900:0 in main (argc=20, argv=0x9fffffffffffdeb0)
at BS-3001.c:84

can you please suggest me something on this Dennis
Srimalik
Valued Contributor

Re: Core dump with BUS_ADRALN error

Dennis Handly
Acclaimed Contributor

Re: Core dump with BUS_ADRALN error

Whatever happened to a stack trace to your string abort problem?

>What exactly does this line mean
0x4000000000034d00:0: alloc r45=ar.pfs,0,15,2,0
0x4000000000034d00 - is the address at which alloc instruction with respective values is stored right?

Yes, this the instruction slot.

>what is r45?? we have many registers like program, general, pointer
>what is ar.pfs,0,15,2,0

r45 is just a scratch register, it gets a copy of the old ar.pfs register. The register frame is adjusted to 15 locals and parms and 2 outgoing registers and no rotating registers.

>When I am trying to compile my code with purify these are the messages that are displayed before the compilation fails.
libclntsh.so.9.0 Unhandled indirect branch at 4000000000B0D828

(You'll need to talk to IBM Rational.)

This may be why you have that branch to 0 in frame 6 in your other thread? Do you know what pthread_once was called with?
http://h30499.www3.hp.com/t5/Languages-and-Scripting/Purified-version-of-application-core-dumps-during-start-up/td-p/4334173


>can you please suggest me something on this Dennis

Use gdb to check for heap corruption/leaks and not purify.

>Srikrishan: http://www.hp.com/go/ia-64 (is broken)

Try: http://www.hp.com/go/integrity ?

K!rn Kumr
Frequent Advisor

Re: Core dump with BUS_ADRALN error

regarding strings my colleague has suggested to change to char * instead of using string and it worked. But one of my colleagues suggested me to use -mt to compile the code as it a multithreaded application and when tested its working really good. Is it so that if an application is a multithreaded on ia64 we should compile it with -mt option
Dennis Handly
Acclaimed Contributor

Re: Core dump with BUS_ADRALN error

>Is it so that if an application is a multithreaded on Integrity we should compile it with -mt option

Whether PA or Integrity, as documented it isn't should, it is MUST!
K!rn Kumr
Frequent Advisor

Re: Core dump with BUS_ADRALN error

+DSmontecito +FPD -Wl,+pi,1M -Wl,+pd,1M -Wl,+mergeseg -Wl,+s +Z +w1 -D_REENTRANT

What are these flags good for. To compile my code on ia64 in 64 bit do i need to include them. I am able to compile without them as well..but what is the advantage of including them...just wanted to know...