1833046 Members
2636 Online
110049 Solutions
New Discussion

Inexplicable Bus Error

 
SOLVED
Go to solution
Kunal_3
Occasional Advisor

Inexplicable Bus Error

I have an inhouse application called adcorba, that depends on some third-party libraries.

On HP-UX 11.0 and 11i, the application terminates with a bus error, (core dump):
20154 Bus error (core dumped) adcorba

I've tried all the options posted on past ITRC messages. They include:
1. 64 bit kernel
2. Values of maxdsiz, etc., are maxed out at 1GB
3. Compiler is aCC 3.37
4. The system is patched with the latest patch bundles, and explicitly includes PHKL_27283 and PHKL_27282 (which apparently have been known to fix such a problem)

5. Here are the results of GDB. Since the third-party libs are not debug versions, gdb shows minimal info:
HP gdb 4.5 for PA-RISC 1.1 or 2.0 (narrow), HP-UX 11.00
and target hppa1.1-hp-hpux11.00.
Copyright 1986 - 2001 Free Software Foundation, Inc.
Hewlett-Packard Wildebeest 4.5 (based on GDB) is covered by the
GNU General Public License. Type "show copying" to see the conditions to
change it and/or distribute copies. Type "show warranty" for warranty/support.
..
Core was generated by `adcorba'.
Program terminated with signal 10, Bus error.
#0 0xc75b380c in MMarshaller::pack+0x1c ()
from /tsi/ae/tools/TRA/5.1.3_L/h7_110_aCC/debug/lib/libmaverick50.sl

(libmaverick50.sl is the third-party library)

6. Tried "bt" in gdb, but all it returns is:
#0 0xc75b380c in MMarshaller::pack+0x1c ()
from /tsi/ae/tools/TRA/5.1.3_L/h7_110_aCC/debug/lib/libmaverick50.sl
#1 0xc764c02c in MRpcClientRvImpl::pack+0x2d8 ()
from /tsi/ae/tools/TRA/5.1.3_L/h7_110_aCC/debug/lib/libmaverick50.sl
#2 0xc764d718 in MRpcClientRvImpl::syncInvoke+0x90 ()
from /tsi/ae/tools/TRA/5.1.3_L/h7_110_aCC/debug/lib/libmaverick50.sl
#3 0xc7579f84 in MRpcClient::syncInvoke+0xa4 ()
from /tsi/ae/tools/TRA/5.1.3_L/h7_110_aCC/debug/lib/libmaverick50.sl
#4 0xc757c318 in MRpcOperationProxy::syncInvoke+0x764 ()
from /tsi/ae/tools/TRA/5.1.3_L/h7_110_aCC/debug/lib/libmaverick50.sl
Cannot access memory at address 0xffffde2d

7. Tried "info share" in gdb (to see if libpthread and libcma were being mixed, but no luck). Here's what info share shows me:
(gdb) info share
Shared Object Libraries
flags tstart tend dstart dend dlt
/tsi/ae/tools/TRA/5.1.3_L/h7_110_aCC/debug/lib/librepowww532.sl
0x02000201 0xc5e00000 0xc5f10000 0x67a42000 0x67a5b000 0x67a422c4
/tsi/ae/tools/tpcl/5.1.2_L/h7_110_aCC/debug/lib/libxerces-c2_1.sl
0x02000201 0xc7800000 0xc7b14000 0x67a7e000 0x67b72000 0x67a81238
/usr/lib/libnsl.1
0x02000201 0xc0280000 0xc030a000 0x67a5d000 0x67a6a000 0x67a5d420
/home/ca/tibco/adapter/adcorba/5.1/lib/libCorbautild.sl
0x02000201 0xc5600000 0xc57d7000 0x67b74000 0x67bbc000 0x67b77774
/home/ca/tibco/adapter/adcorba/5.1/lib/libCorbarvd.sl
0x02000201 0xc5000000 0xc55b6000 0x67f96000 0x68082000 0x67fa1d1c
/tsi/ae/tools/TRA/5.1.3_L/h7_110_aCC/debug/lib/libmaverick50.sl
0x02000201 0xc7000000 0xc772f000 0x67bda000 0x67d10000 0x67bdf488
/home/ca/tools/ORBacus/4.1.2/h7_110_aCC/debug/6.0.1/lib/libOB.sl.4.1.2
0x02000201 0xc6000000 0xc6d21000 0x67d3a000 0x67e91000 0x67d4ba64
/home/ca/tools/ORBacus/4.1.2/h7_110_aCC/debug/6.0.1/lib/libIDL.sl.4.1.2
0x02000201 0xc5800000 0xc5ddd000 0x67e9f000 0x67f48000 0x67ea701c
/home/ca/tools/ORBacus/4.1.2/h7_110_aCC/debug/6.0.1/lib/libCosNaming.sl.4.1.2
0x02000201 0xc4680000 0xc4745000 0x67f48000 0x67f5a000 0x67f48e28
/home/ca/tools/ORBacus/4.1.2/h7_110_aCC/debug/6.0.1/lib/libCosEvent.sl.4.1.2
0x02000201 0xc4e00000 0xc4f8a000 0x67f62000 0x67f8a000 0x67f641ac
/home/ca/tools/ORBacus/4.1.2/h7_110_aCC/debug/6.0.1/lib/libJTC.sl.2.0.1
0x02000201 0xc3050000 0xc307b000 0x67f8a000 0x67f8d000 0x67f8a2dc
/usr/lib/librt.2
0x02000201 0xc062c000 0xc0630000 0x67f8d000 0x67f8e000 0x67f8d008
/home/ca/tibco/tibrv/lib/libtibrvcmq.sl
0x02000201 0xc0eac000 0xc0eb7000 0x68083000 0x68084000 0x68083024
/home/ca/tibco/tibrv/lib/libtibrvcm.sl
0x02000201 0xc1be0000 0xc1bfb000 0x68082000 0x68083000 0x68082038
/home/ca/tibco/tibrv/lib/libtibrvft.sl
0x02000201 0xc0ea4000 0xc0eac000 0x68084000 0x68085000 0x68084024
/home/ca/tibco/tibrv/lib/libtibrvsd.sl
0x02000201 0xc0e18000 0xc0e20000 0x680af000 0x680b0000 0x680af034
/home/ca/tibco/tibrv/lib/libtibrv.sl
0x02000201 0xc3380000 0xc33fb000 0x68089000 0x6808e000 0x680891a4
/home/ca/tibco/tibrv/lib/libssl.sl
0x02000201 0xc3340000 0xc3380000 0x6808e000 0x68092000 0x6808e0e8
/home/ca/tibco/tibrv/lib/libcrypto.sl
0x02000201 0xc4500000 0xc4641000 0x68098000 0x680a8000 0x680984fc
/lib/libxnet.2
0x02000201 0xc006d000 0xc0070000 0x680ab000 0x680ac000 0x680ab004
/usr/lib/libxti.2
0x02000201 0xc00c0000 0xc00d8000 0x680a8000 0x680ab000 0x680a80d8
/lib/libpthread.1
0x02000201 0xc0040000 0xc0059000 0x680ac000 0x680af000 0x680ac0b8
/usr/lib/libstd.2
0x02000201 0xc0660000 0xc069e000 0x680b2000 0x680b5000 0x680b2528
/usr/lib/libstream.2
0x02000201 0xc0630000 0xc0658000 0x680b7000 0x680ba000 0x680b7354
/usr/lib/libCsup.2
0x02000201 0xc0520000 0xc053b000 0x680ba000 0x680bd000 0x680ba430
/usr/lib/libm.2
0x02000201 0xc0090000 0xc00bc000 0x680bd000 0x680be000 0x680bd28c
/usr/lib/libcl.2
0x02000201 0xc0540000 0xc062b000 0x680c1000 0x680cd000 0x680c1648
/usr/lib/libisamstub.1
0x02000201 0xc0067000 0xc0068000 0x680be000 0x680bf000 0x680be004
/usr/lib/libdld.2
0x02000201 0xc0007000 0xc000a000 0x680e4000 0x680e5000 0x680e400c
/usr/lib/libc.2
0x02000201 0xc0100000 0xc0246000 0x680d1000 0x680e4000 0x680d1790
/opt/graphics/OpenGL/lib/libogltls.sl
0x02ff0201 0xc0005000 0xc0007000 0x680e9000 0x680ea000 0x680e9048
/usr/lib/libnss_files.1
0x02ff0201 0xc005c000 0xc0065000 0x67a39000 0x67a3a000 0x67a39048

8. Tried "info threads" from gdb. Here's what it shows me:
(gdb) info threads
* 32 system thread 21582 0xc75b380c in MMarshaller::pack+0x1c ()
from /tsi/ae/tools/TRA/5.1.3_L/h7_110_aCC/debug/lib/libmaverick50.sl
31 system thread 21622 0xc020cf38 in _select_sys+0x10 () from /usr/lib/libc.2
30 system thread 21611 0xc020b540 in __ksleep+0x10 () from /usr/lib/libc.2
29 system thread 21610 0xc020b540 in __ksleep+0x10 () from /usr/lib/libc.2
28 system thread 21609 0xc020b540 in __ksleep+0x10 () from /usr/lib/libc.2
27 system thread 21608 0xc020b540 in __ksleep+0x10 () from /usr/lib/libc.2
26 system thread 21607 0xc020b540 in __ksleep+0x10 () from /usr/lib/libc.2
25 system thread 21606 0xc020b540 in __ksleep+0x10 () from /usr/lib/libc.2
24 system thread 21605 0xc020b540 in __ksleep+0x10 () from /usr/lib/libc.2
23 system thread 21604 0xc020b540 in __ksleep+0x10 () from /usr/lib/libc.2
22 system thread 21603 0xc020b540 in __ksleep+0x10 () from /usr/lib/libc.2
21 system thread 21602 0xc020b540 in __ksleep+0x10 () from /usr/lib/libc.2
20 system thread 21601 0xc020b540 in __ksleep+0x10 () from /usr/lib/libc.2
19 system thread 21600 0xc020b540 in __ksleep+0x10 () from /usr/lib/libc.2
18 system thread 21599 0xc020b540 in __ksleep+0x10 () from /usr/lib/libc.2
17 system thread 21598 0xc020b540 in __ksleep+0x10 () from /usr/lib/libc.2
16 system thread 21597 0xc020b540 in __ksleep+0x10 () from /usr/lib/libc.2
15 system thread 21595 0xc020cf38 in _select_sys+0x10 () from /usr/lib/libc.2
14 system thread 21593 0xc020cf38 in _select_sys+0x10 () from /usr/lib/libc.2
13 system thread 21587 0xc020dd38 in _accept_sys+0x10 () from /usr/lib/libc.2
12 system thread 21586 0xc020b540 in __ksleep+0x10 () from /usr/lib/libc.2
11 system thread 21585 0xc020b540 in __ksleep+0x10 () from /usr/lib/libc.2
10 system thread 21584 0xc020b540 in __ksleep+0x10 () from /usr/lib/libc.2
9 system thread 21583 0xc020b540 in __ksleep+0x10 () from /usr/lib/libc.2
8 system thread 21581 0xc020dd38 in _accept_sys+0x10 () from /usr/lib/libc.2
7 system thread 21580 0xc020cf38 in _select_sys+0x10 () from /usr/lib/libc.2
6 system thread 21579 0xc020b540 in __ksleep+0x10 () from /usr/lib/libc.2
5 system thread 21550 0xc020b540 in __ksleep+0x10 () from /usr/lib/libc.2
4 system thread 21549 0xc020cf38 in _select_sys+0x10 () from /usr/lib/libc.2
3 system thread 21548 0xc020b540 in __ksleep+0x10 () from /usr/lib/libc.2
2 system thread 21547 0xc020cf38 in _select_sys+0x10 () from /usr/lib/libc.2
1 system thread 21546 0xc020b540 in __ksleep+0x10 () from /usr/lib/libc.2

9.I've tried tusc too, but I have not been able to make sense of its output. I've attached the tusc output to.

What could be the problem?
Thanks in advance,
Kunal
16 REPLIES 16
Manish Srivastava
Trusted Contributor

Re: Inexplicable Bus Error

Hi Kunal,

I saw the gdb output. Could you do a single stepping operation inGDB and get the exact API call which causes the Bus error.

Last time when I had this problem it turned out that there was a malloc header corruption and the free() call was dumping core. So some change to a strcpy() was done to remove this error.

manish
Kunal_3
Occasional Advisor

Re: Inexplicable Bus Error

Hi Manish,
My application makes a call (syncInvoke) into the third-party library (libmaverick50.sl). That's the point after which I can't single-step anymore.
The third-party library is not really a debug version, and the third-party vendors have effectively said, "your headache buddy".
I've tried "thread apply all bt" in gdb, but again, its dumps a lot of info that I don't completely know what to do with.
Is there anything else I could try that could isolate the cause of the problem as being something stupid I'm doing? (like mixing other unmixable libs, missing a patch, etc.).
Thanks,
Kunal
Manish Srivastava
Trusted Contributor

Re: Inexplicable Bus Error

Hi Kunal,

Does this happen for all data which you send to this API or only to some of them. If it happens always then it looks loke a problem in the your code. Just chek for all the mallocs and writes to the malloc area. If you are writing more than what you have allocated then this can be caused.

If you have access Purify tool then you can run that on your application to get the exact location where there is an array bound write.

One more thing, which is the libc patch installed on the system?

manish
ranganath ramachandra
Esteemed Contributor

Re: Inexplicable Bus Error

since you dont have a debuggable library, you have may to do some assembly level debugging, looking at some instructions before and after this point along with register values etc.

of course you are always at an advantage if you know what exactly you want to achieve when you make a call to this library function.

since there is no single remedy for SIGBUS's, each has to be analysed in its own context.
 
--
ranga
hp-ux 11i v3[i work for hpe]

Accept or Kudo

Kunal_3
Occasional Advisor

Re: Inexplicable Bus Error

This API - syncInvoke(...) makes an RPC call, and its only parameter is a timeout value (time to wait for a response from the server).
The API sends a message to a server using RPC over TCP/IP.

In response to "Does this happen for all data which you send to this API", yes.
I've varied the timeout value from 0 to 100 secs, but it consistently dumps.
I've also varied the size, contents, etc., of message data being sent from 0 bytes to 1 KB, from text to alphanumeric to special characters, but again, it consistently dumps.

Unfortunately, I don't have access to purify on HP (As an aside, the same application, and the same API work perfectly on Windows and Solaris).

The libc patches installed on the system are:
# PHCO_23251 1.0 libc manpage cumulative patch
# PHCO_25569 1.0 libc cumulative header file patch
# PHCO_25898 1.0 cumulative 10.20 libc compatibility support
# PHCO_30030 1.0 libc cumulative patch

"you have may to do some assembly level debugging ooking at some instructions before and after this point along with register values". What tools would you suggest to do this?

Thanks,
Regards,
Kunal
RAC_1
Honored Contributor
Solution

Re: Inexplicable Bus Error

Your tusc output only tells that it is bus error. What options you have used on tusc for this?? Can run it with tusc -vfp -o "file_name" program and post.

Also as tols by others, ahve you checked if you ar eupto date on libc patches??

Anil
There is no substitute to HARDWORK
ranganath ramachandra
Esteemed Contributor

Re: Inexplicable Bus Error

gdb is sufficient as a tool as long as you can follow the disassembly and make out what is going on.
 
--
ranga
hp-ux 11i v3[i work for hpe]

Accept or Kudo

Kunal_3
Occasional Advisor

Re: Inexplicable Bus Error

The tusc output is attached as a zip file. Although I'm re-reviewing the patch list, I'm pretty sure I have the latest libc patches installed.
-Kunal
Manish Srivastava
Trusted Contributor

Re: Inexplicable Bus Error

Hi,

This may have already been done but did you try type casting the value which you are passing to the API to the correct data type?

manish
RAC_1
Honored Contributor

Re: Inexplicable Bus Error

Kunal,

Did you resolve the error?? If yes, What was the problem???

Anil
There is no substitute to HARDWORK
Kunal_3
Occasional Advisor

Re: Inexplicable Bus Error

Yes, I've tried typecasting the value too.
And, no, the problem is not yet solved.
RAC_1
Honored Contributor

Re: Inexplicable Bus Error

What does file core says??

adb -k ./core
Once on adb prompt, do
$c/$C

Whats does it give??

Anil
There is no substitute to HARDWORK
Kunal_3
Occasional Advisor

Re: Inexplicable Bus Error

When I run adb, I see the message:
PA-32 adb ($h help $q quit)
Failed to get PDIR
adb>

Sill, running $c or $C shows:
can't unwind -- no_entry

(is there anything else I need to do to run adb?)

With gbd, the equivalent of $c (bt, backtrace) shows:
#0 0xc75b380c in MMarshaller::pack+0x1c ()
from /tsi/ae/tools/TRA/5.1.3_L/h7_110_aCC/debug/lib/libmaverick50.sl
#1 0xc764c02c in MRpcClientRvImpl::pack+0x2d8 ()
from /tsi/ae/tools/TRA/5.1.3_L/h7_110_aCC/debug/lib/libmaverick50.sl
#2 0xc764d718 in MRpcClientRvImpl::syncInvoke+0x90 ()
from /tsi/ae/tools/TRA/5.1.3_L/h7_110_aCC/debug/lib/libmaverick50.sl
#3 0xc7579f84 in MRpcClient::syncInvoke+0xa4 ()
from /tsi/ae/tools/TRA/5.1.3_L/h7_110_aCC/debug/lib/libmaverick50.sl
#4 0xc757c318 in MRpcOperationProxy::syncInvoke+0x764 ()
from /tsi/ae/tools/TRA/5.1.3_L/h7_110_aCC/debug/lib/libmaverick50.sl
Cannot access memory at address 0xffffde2d
ranganath ramachandra
Esteemed Contributor

Re: Inexplicable Bus Error

has this API call ever worked for you, say in a different application etc ? have other API calls to this shared library worked ?

make sure you are using a published API (one that has prototypes declared in a header file, for example), follow the prototype and documentation if any. the shared library's functionality (at least for this API) may require that you should call some initializer function before using any API.

there are the other usual catches - a shared library built with aCC's '-AA' option cannot be used with a program or shared library built without that option. the aCC folks say you should use '-mt' if you are building for a multi-threaded process.

then there *may* be issues if the vendor library was built with (old?) gcc/g++, especially if the library was not built with libgcc.a archived into it and depended on the program to be linked with libgcc : constructors for static global objects in shared libraries may not have been invoked, (as that is usually handled by the runtime support library - like aCC's libCsup). i am saying all this about initializers because the faulting routine is called "pack" - likely to be accessing some RPC data structure - and SIGSEGV indicates faulty memory acess, probably through an uninitialized pointer.

you can try seeing what exactly is happening at the faulting address : e.g. gdb's "x/i $pc" will show the instruction, then you can examine the memory addresses and registers referenced in that instruction to see what is obviously wrong. that kind of debugging *may* give you clues but you may also not get anywhere with it. if you want to do it, keep the latest copies of the gdb manual and the PA-RISC runtime architecture documents handy.
 
--
ranga
hp-ux 11i v3[i work for hpe]

Accept or Kudo

Kunal_3
Occasional Advisor

Re: Inexplicable Bus Error

I've been able to successfully use this API in a "toy" program (in a tiny, much scaled-down version of my current application - 1/200th the size of my app). But the usage of the API remained same across the actual program (adcorba) and the toy program. I had the vendors of the library verify that I was indeed, using it correctly (initialization and all).

I'm also working on gradually scaling up the toy program, step-by-step, in the hope that at one point, the toy program too, will crash. That may give me a clue as to what's going wrong.

I've requested the library vendors to try to send me the command line options that were used with aCC to build the library. If I understand what you're saying, a library that is built using -AA and/or -mt cannot be mixed with an application that is built without using these. Are there any other such caveats?

I'm currently working on trying to figure out what's getting corrupted in the memory/registers at the time of the crash, using gdb.

Re: Inexplicable Bus Error

Kunal,
Could you find out the reason for this BUS errors?