Unexpected core by signal 11

Alex Pronichev · ‎05-03-2006

Hello!
There is a call stack from core file
#0 0x800003ff7fda5b98 in real_malloc+0xae8 () from /lib/pa20_64/libc.2
#1 0x800003ff7fda2a70 in _malloc+0x658 () from /lib/pa20_64/libc.2
#2 0x800003ff7fda8ddc in malloc+0x1c4 () from /lib/pa20_64/libc.2
#3 0x800003ff7fdb8d64 in strdup+0x2c () from /lib/pa20_64/libc.2
#4 0x800003ff7e7ebcc8 in DataElement::DataElement (this=0x80000001008a57f8, node=0x8000000100547c00) at data_element.cpp:161
#5 0x800003ff7e7eccbc in CurrencyDataElement::CurrencyDataElement (this=0x80000001008a57f8, node=0x8000000100547c00) at data_element.cpp:248
...and so on to main()

in step 4 we have
#4 0x800003ff7e7ebcc8 in DataElement::DataElement (this=0x80000001008a57f8, node=0x8000000100547c00) at data_element.cpp:161
161 CalcProc = strdup( calc_proc );

initially CalcProc was initialized by NULL

p calc_proc
$1 = 0x800000010087f6e8 "res = @page_931000100000.s_931.00.009 + @page.s_931.00.021 + @page.s_931.00.022 + @page.s_931.00.023 + @page.s_931.00.024 + @page.s_931.00.025 - @page_931000100000.s_931.00.001;"

what is wrong?
what I do wrong?

The process is ended sometimes by core dump.

OS: HPUX 11.23 PA-RISC

RAC_1 · ‎05-03-2006

What does "file core" say?

There is no substitute to HARDWORK

Peter Godron · ‎05-03-2006

Alex,
as strdup executes a malloc, are you having a memory problem?
what are the definitions for calc_proc and CalcProc ?

Try and print out calc_proc and check the length of the calc_proc string.

char *strdup(const char *s);

Alex Pronichev · ‎05-03-2006

Core file say:
Core was generated by `binserver'.
Program terminated with signal 11, Segmentation fault.
#0 0xc00000001099db98 in real_malloc+0xae8 () from /lib/pa20_64/libc.2
(gdb) ba
#0 0xc00000001099db98 in real_malloc+0xae8 () from /lib/pa20_64/libc.2
#1 0xc00000001099aa70 in _malloc+0x658 () from /lib/pa20_64/libc.2
#2 0xc0000000109a0ddc in malloc+0x1c4 () from /lib/pa20_64/libc.2
#3 0xc0000000109b0d64 in strdup+0x2c () from /lib/pa20_64/libc.2
#4 0xc000000011dafcc8 in DataElement::DataElement (this=0x800000010057ea00, node=0x80000001004a4ee8) at data_element.cpp:161
#5 0xc000000011db0cbc in CurrencyDataElement::CurrencyDataElement (this=0x800000010057ea00, node=0x80000001004a4ee8) at data_element.cpp:248
#6 0xc000000011dabee0 in CreateDataElement (node=0x80000001004a4ee8, ElementCache=0x8000000100680748) at const.cpp:21
#7 0xc000000011daaa4c in CurrencyElement::CurrencyElement (this=0x80000001006356c0, node=0x80000001004a4ee8, fCont=0x800000010007f970, fRow=0x0,
ElementCache=0x8000000100680748) at const.cpp:353
and so on

the definition of CalcProc and calc_proc is
const char* CheckProc;
const char* calc_proc

the value of calc_proc when program crash
"res = @page_931000100000.s_931.00.009 + @page.s_931.00.021 + @page.s_931.00.022 + @page.s_931.00.023 + @page.s_931.00.024 + @page.s_931.00.025 - @page_931000100000.s_931.00.001;"

Yes. The program crash sometimes, 1 of 250-280 times.

Alex Pronichev · ‎05-03-2006

Before doing strdup I'm checking the value of calc_proc that is not empty with next case:

if( calc_proc != NULL && calc_proc[0] != '\0' ) {
CalcProc = strdup( calc_proc );
}

Peter Godron · ‎05-03-2006

Alex,
assuming the reply with the definitions included a typing mistake:
CheckProc instead of CalcProc

I still think you are hitting a memory allocation problem. According to the spec, if the malloc fails, the return value is set to NULL and errno set to ENOMEM, which indicates not enough memory. In that case free() up any unneeded memory.

Alex Pronichev · ‎05-03-2006

You right I take mistype when show definition
const char* CalcProc;

Another words what can I do?

I need remove direct call of strdup and use malloc directly test return value for null and then memcpy to catch memory allocation problem?

Peter Godron · ‎05-03-2006

Alex,
replace your strdup call with:

char *s;
char *d;

if ((d = malloc (strlen (s) + 1)) == NULL)
printf("Error in malloc\n");

memcpy (d, s, strlen (s) + 1);

Alex Pronichev · ‎05-04-2006

Here another core with same symptoms:
Core was generated by `binserver'.
Program terminated with signal 11, Segmentation fault.
#0 0xc00000001099db98 in real_malloc+0xae8 () from /lib/pa20_64/libc.2
(gdb) ba
#0 0xc00000001099db98 in real_malloc+0xae8 () from /lib/pa20_64/libc.2
#1 0xc00000001099aa70 in _malloc+0x658 () from /lib/pa20_64/libc.2
#2 0xc0000000109a0ddc in malloc+0x1c4 () from /lib/pa20_64/libc.2
#3 0xc000000010aeb7ec in operator new+0x4c () from /lib/pa20_64/libCsup.2
#4 0x4000000000025698 in std::allocator::allocate+0x38 ()
#5 0x4000000000024ccc in std::basic_string,std::allocator>::_C_getRep+0x594 ()
#6 0x4000000000024498 in std::basic_string,std::allocator>::replace+0x788 ()
#7 0xc000000011330b58 in std::basic_string,std::allocator>::operator+= (this=0x800003ff7fff30b0,
__s=0xc000000011d26499 "\"") at /opt/aCC/include_std/string:448
#8 0xc000000011de5758 in SaveXMLVisitor::visit_Document (this=0x800000010051b000, document=@0x800000010000a978) at savexml.cpp:122
#9 0xc000000011db9270 in Document_Impl::Accept (this=0x800000010000a978, p_visitor=@0x800000010051b000) at document.cpp:177
and so on...

(gdb) up 7
#7 0xc000000011330b58 in std::basic_string,std::allocator>::operator+= (this=0x800003ff7fff30b0,
__s=0xc000000011d26499 "\"") at /opt/aCC/include_std/string:448

(gdb) p *this
$3 = {> = {}, static npos = 18446744073709551615, static __nullref = {__ref_hdr_ = {
__mutex_ = { = {_C_mutex = {pmutex = 0x800003ff7fbcff00}}, }, __refs_ = 1, __capacity_ = 0,
__nchars_ = 0}, __eos_char_ = 0 '\000'}, _C_data = 0x8000000100a80a60 " datageneration=\""}

(gdb) p __s
$4 = 0xc000000011d26499 "\""

there is concatenation of two strings via += operator

string s1 =" datageneration=\"";
string s2 ="\"";
s1 += s2 core dump :(

Anybody have some troubles?

Don Morris_1 · ‎05-04-2006

SIGSEGV in standard library memory allocation is almost always a sign of trashing your heap.

That's usually caused by stale pointer dereference (causing you to write garbage to memory that is supposed to be free on the heap, trashing the metadata used to manage the heap inside the library) or [and this is my bet] buffer overrun. Heap management is usually done by placing metadata before or after the object returned by malloc -- and overruns corrupt the metadata for the next/previous object... causing the library as it attempts to walk the pointers in the metadata (or whatever it is doing) to dereference a bad address.... hence, SIGSEGV.

Check your pointer usage.

Dennis Handly · ‎05-04-2006

Yes, Don is correct. You have corrupted the heap.

You may want to use gdb's heap commands to look for heap corruption. There is "info heap" and "info leak".

You turn it on with "set heap-check ...". See "help set heap-check".

A. Clay Stephenson · ‎05-04-2006

if( calc_proc != NULL && calc_proc[0] != '\0' ) {
CalcProc = strdup( calc_proc );
}

The above statement is dangerous and is implementation dependent upon whether an if with a logical and falls thru as soon as a false condition is detected. In some implenmentations, both sides of the if are evaluated then the AND is tested -- which would be a killer for you.

On some platforms, the evaluation of calc_proc[0] would produce a segmentation violation if calc_proc were NULL. I've seen a few cases where code like this would run just fine until the optimizer was used. Regardless of the language (Pascal, C, C++, FORTRAN) the safer play is to break this into two distinct conditions:

if (calc_proc != NULL)
{
if (calc_proc[0] != '\0') CalcProc = strdup( calc_proc );
}

This construct will execute safely under all conditions regardless of language or platform.

If it ain't broke, I can fix that.

Dennis Handly · ‎05-04-2006

> if(calc_proc != NULL && calc_proc[0] != '\0')
>The above statement is dangerous and is implementation dependent upon whether an if with a logical and falls thru as soon as a false condition is detected. In some implementations, both sides of the if are evaluated then the AND is tested

You are confused. ANSI C and ANSI C++ REQUIRE the evaluation to be short circuited. I don't think Alex should program for broken C/C++ compilers.

>Regardless of the language (Pascal, C, C++, FORTRAN)

You are correct about Pascal.

A. Clay Stephenson · ‎05-04-2006

Nothing in this posting convinces me that this code fragment was not K&R C. I am well aware of what the language specification calls for; I am also well aware that I have seen this specific behavior in both ANSI C and C++ --- although I shouldn't. I have even seen cases as I mntioned earlier in which all was well until the optimizer was used.

If it ain't broke, I can fix that.

Alex Pronichev · ‎05-05-2006

The problem in core file pointed on line
161 CalcProc = strdup( calc_proc );

not on
160 if( calc_proc != NULL && calc_proc[0] != '\0' )

I also use GDB with keys
set heap-check bounds on
set heap-check scramble on
but this haven't any result

Dennis Handly · ‎05-05-2006

>The problem in core file pointed on line
>161 CalcProc = strdup(calc_proc);
>not on
>160 if(calc_proc != NULL && calc_proc[0] !=

Clay proposed that your coding style was dangerous and I was defending you. :-) This is not your problem because HP's compilers meet the Standards requirements for ordering for && and ||.

>I also use GDB with: set heap-check bounds on scramble on
>but this haven't any result

I would have thought that would catch it. You might try free on and string on. The latter may need the latest gdb.

If gdb can't catch it, you may need purify.
Otherwise you need to understand machine code and be able to guess the heap data structures that are being corrupted and set watch points to catch the corruption.

>Nothing in this posting convinces me that this code fragment was not K&R C.

The stack trace shows it is aC++!

Stephen Keane · ‎05-16-2006

What do you have maxssiz and maxssiz_64bit set to in the kernel ?

Alex Pronichev · ‎05-16-2006

This parameteres are set to next values:

maxssiz = 100610048
maxssiz_64bit = 2147483648
maxtsiz = 1073741824
maxtsiz_64bit = 4398046511103

Dennis Handly · ‎05-17-2006

>Stephen Keane: What do you have maxssiz and maxssiz_64bit set to in the kernel ?

Ah, you are thinking Alex has a stack overflow and not a heap corruption?

That is very possible. Especially if using threads. Though Alex said: ... and so on to main()

Normally a stack overflow occurs near the beginning and not so far into the function: real_malloc+0xae8

Alex Pronichev · ‎05-17-2006

My application is single threaded.

Dennis Handly · ‎05-17-2006

Were you able to use gdb's heap checking options to track down the heap corruption?

Otherwise I would need access to the WHOLE application and the corefile.

Alex Pronichev · ‎05-17-2006

My application is very big.
Core file size is above 100Mb
Application has many modules and libraries, and has a connection with delivery system.
I can't do access for you to our test server.
What kind of information do you need?

Dennis Handly · ‎05-18-2006

>My application is very big.
Core file size is above 100Mb
Application has many modules and libraries, and has a connection with delivery system.
I can't do access for you to our test server.
What kind of information do you need?

I would need access to everything in order to see why the heap is corrupted. In the worst case I would have to run it with watch points to see what corrupts it.

But you have never said if you were able to use gdb's heap checking options?

Dennis Handly · ‎07-06-2007

Did you ever solve this?

>I also use GDB with keys: set heap-check bounds on
but this haven't any result

The syntax may have caused this to not work. You need to do:
(gdb) set heap-check on bounds on

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Unexpected core by signal 11

Unexpected core by signal 11

Re: Unexpected core by signal 11

Re: Unexpected core by signal 11

Re: Unexpected core by signal 11

Re: Unexpected core by signal 11

Re: Unexpected core by signal 11

Re: Unexpected core by signal 11

Re: Unexpected core by signal 11

Re: Unexpected core by signal 11

Re: Unexpected core by signal 11

Re: Unexpected core by signal 11 (heap corruption)

Re: Unexpected core by signal 11

Re: Unexpected core by signal 11 (heap corruption)

Re: Unexpected core by signal 11

Re: Unexpected core by signal 11

Re: Unexpected core by signal 11 (heap corruption)

Re: Unexpected core by signal 11

Re: Unexpected core by signal 11

Re: Unexpected core by signal 11 (heap corruption)

Re: Unexpected core by signal 11

Re: Unexpected core by signal 11 (heap corruption)

Re: Unexpected core by signal 11

Re: Unexpected core by signal 11 (heap corruption)

Re: Unexpected core by signal 11 (heap corruption)