1748246 Members
3519 Online
108760 Solutions
New Discussion юеВ

Re: Heap corruption?

 
SOLVED
Go to solution
Sam20
Occasional Advisor

Heap corruption?

Hello,
I'm trying to understand what's going wrong with a program run in HP-UX 11.11 that results in a SIGSEGV (11, segmentation fault):

(gdb) bt
#0 0x737390e8 in _sigfillset+0x618 () from /usr/lib/libc.2
#1 0x73736a8c in _sscanf+0x55c () from /usr/lib/libc.2
#2 0x7373c23c in malloc+0x18c () from /usr/lib/libc.2
#3 0x7379e3f8 in _findbuf+0x138 () from /usr/lib/libc.2
#4 0x7379c9f4 in _filbuf+0x34 () from /usr/lib/libc.2
#5 0x7379c604 in __fgets_unlocked+0x84 () from /usr/lib/libc.2
#6 0x7379c7fc in fgets+0xbc () from /usr/lib/libc.2
#7 0x7378ecec in __nsw_getoneconfig+0xf4 () from /usr/lib/libc.2
#8 0x7378f8b8 in __nsw_getconfig+0x150 () from /usr/lib/libc.2
#9 0x737903a8 in __thread_cond_init_default+0x100 () from /usr/lib/libc.2
#10 0x737909a0 in nss_search+0x80 () from /usr/lib/libc.2
#11 0x736e7320 in __gethostbyname_r+0x140 () from /usr/lib/libc.2
#12 0x736e74bc in gethostbyname+0x94 () from /usr/lib/libc.2
#13 0x11780 in dnetResolveName (name=0x400080d8 "smtp.org.com", hent=0x737f3334) at src/dnet.c:64
..

The problem seems to be occurring somewhere inside libc! A system call trace ends with:

Connecting to server smtp.org.com on port 25
write(1, "C o n n e c t i n g t o s e ".., 51) .......................... = 51
open("/etc/nsswitch.conf", O_RDONLY, 0666) ............................... [entry]
open("/etc/nsswitch.conf", O_RDONLY, 0666) ................................... = 5
Received signal 11, SIGSEGV, in user mode, [SIG_DFL], partial siginfo
Siginfo: si_code: I_NONEXIST, faulting address: 0x400118fc, si_errno: 0
PC: 0xc01980eb, instruction: 0x0d3f1280
exit(11) [implicit] ............................ WIFSIGNALED(SIGSEGV)|WCOREDUMP

Last instruction by the program:

struct hostent *him;
him = gethostbyname(name); // name == "smtp.org.com" as shown by gdb

Is this a problem with the system, or am I missing something?
Any guidance for digging deeper would be appreciated.

Thx.
8 REPLIES 8
Dennis Handly
Acclaimed Contributor

Re: Heap corruption?

As you said, you have heap corruption if you abort after you call malloc/free.
Frames #0 and #1 with the bogus names are static functions used in with malloc.

>The problem seems to be occurring somewhere inside libc!

libc is just the victim here. You have likely corrupted the heap and gethostbyname was the first one to step in it.

You need to get the latest WDB and use the heap checking options to see where the heap is first corrupted:
http://www.hp.com/go/wdb

Under the documentation link there is a PDF of:
Debugging dynamic memory usage errors using HP WDB
Sam20
Occasional Advisor

Re: Heap corruption?

I already have WDB.
How can I use it to track the heap corruption event?
Turning heap-check on, info heap only displays a list of allocations.. what can I do with that?

Thank you for your help.
Dennis Handly
Acclaimed Contributor
Solution

Re: Heap corruption?

>I already have WDB.

Make sure you download the latest.

>How can I use it to track the heap corruption event? Turning heap-check on

That should do it. Try:
(gdb) set heap-check on
(gdb) set heap-check string on footer-size 8

I'm not sure if "info corruption" works on 11.11?
Sam20
Occasional Advisor

Re: Heap corruption?

>Make sure you download the latest.
Well, I didn't have the latest! I do now.

>(gdb) set heap-check on
>(gdb) set heap-check string on footer-size 8
>
>I'm not sure if "info corruption" works on >11.11?

"info corruption" works. And it nailed it.
I executed the program step by step few instructions before the location where "bt" showed the SIGSEGV was born.
Slowing down and using "info corruption" every other instruction showed clearly that this one was the culprit:

226 actualLen = vsnprintf(dsb->str+dsb->len, size, fmt, ap);
(gdb) info corruption
Analyzing heap ...

No blocks were found.

(gdb) n
229 if (actualLen > -1 && actualLen < size) {
(gdb) info corruption
Analyzing heap ...

Following blocks appear to be corrupted
No. Total bytes Blocks Corruption Address Function
0 5669647 1 End of block 0x4053f9b8 xrealloc()
(gdb) p size
$1 = 0

vsnprintf is here called with 2nd arg == 0.
vsnprintf was introduced in C99 (ISO/IEC 9899:1999) and "is equivalent to snprintf, with the variable argument list" (├В┬з7.19.6.12.2), snprintf (├В┬з7.19.6.5.2): "If n is zero, nothing is written".
Well, HP UX 11.11 doesn't comply with this specification. Even though size == 0, arguments are written at the end of dsb->str.. which, of course, corrupts the heap (no space is allocate when size==0, given that nothing should be written).

HP manual pages are unclear ("It is the user's responsibility to ensure that enough storage is available." http://docs.hp.com/en/B2355-90695/printf.3S.html), nothing is said regarding the case of maxsize==0. Nice trap.. at the very least, the WARNINGS section of the man page should warn std-compliant users..

Anyway, workaround is trivial (never call v/snprintf under HP-UX with maxsize==0!) and I now have a stable running program!

Thank you very much Dennis.

--------------------------
Heap corruption through vsnprintf under HP-UX B11.11
This program prints "@@" under Linux/Cygwin/..
It prints "@fooo@" under HP-UX B11.11:

#include
#include

const int S=2;

void f ( const char *fmt, ...) {
va_list ap;
int actualLen=0;
char buf[S];

bzero( buf, S );

va_start( ap, fmt );
actualLen = vsnprintf( buf, 0, fmt, ap );
va_end( ap );

printf( "@%s@\n", buf );
}

int main () {
f( "%s", "fooo" );
return 0;
}

Dennis Handly
Acclaimed Contributor

Re: Heap corruption?

>11.11 doesn't comply with this specification. Even though size == 0, arguments are written at the end of dsb->str.

This may be fixed in a patch. It should be fixed on 11.31.

>bzero(buf, S);

You should be using the ANSI C function memset and the header .

>Thank you very much Dennis.

If you are happy with the answers, please read the following about assigning points:
http://forums.itrc.hp.com/service/forums/helptips.do?#33
Sam20
Occasional Advisor

Re: Heap corruption?

>This may be fixed in a patch. It should be >fixed on 11.31.

Good to know. Even though recent versions of vprintf man page look unchanged..

>>bzero(buf, S);
>
>You should be using the ANSI C function >memset and the header .

Yes I should :-]

>If you are happy with the answers, please >read the following about assigning points:

Points assigned. You definitely deserve them.

Thanks again.
---------------------
This topic is now closed.
Sam20
Occasional Advisor

Re: Heap corruption?

Solved.
Dennis Handly
Acclaimed Contributor

Re: Heap corruption?

>ME: It should be fixed on 11.31.

Yes, it works fine there. My 11.23 system still has it failing.