Operating System - HP-UX
1839193 Members
3433 Online
110137 Solutions
New Discussion

Re: Attempt to access item beyond bounds of memory (Signal 11)

 
SOLVED
Go to solution
Aneesh Mohan
Honored Contributor

Attempt to access item beyond bounds of memory (Signal 11)

Hi all,

We are getting Cobol application error message in our application server after the reboot of the application server(vpar2) and Memory upgrade in Database server(vpar1).

Server :- HPUS v3, rx8640
Application :- COBOL
Error Message :- Attempt to access item beyond bounds of memory (Signal 11)


The error started after the below tasks done in HW side.

Database Server (Vpar 1) :- Upgraded Memory
Application Server (Vpar 2) :- Only just rebooted

Both Vpar1 and Vpar2 is in same Npar.

There was no change in kernel values of appplication & database server before and after reboot .

Any one faced this problem before please let me know.

Aneesh
15 REPLIES 15
Dennis Handly
Acclaimed Contributor

Re: Attempt to access item beyond bounds of memory (Signal 11)

This could just be a latent bug that you need to track down.
Aneesh Mohan
Honored Contributor

Re: Attempt to access item beyond bounds of memory (Signal 11)

Dear Dennis,

Could you please throw more light in your previous comment.

Aneesh
Dennis Handly
Acclaimed Contributor

Re: Attempt to access item beyond bounds of memory (Signal 11)

>Could you please throw more light in your previous comment?

Any changes on the system could change configuration files or uninitialized memory to cause it to abort.

You need to assume this is a bug in the application and get a stack trace to see where the bad value is coming from.
Do you have wdb/gdb installed?
Aneesh Mohan
Honored Contributor

Re: Attempt to access item beyond bounds of memory (Signal 11)

Dear Dennis,

I have gdb installed in the broken server.

I have asked the application team to generate stack trace.Meanwhile I have traced the application job through tusc.

TUSC o/p
========
hpus72^root:/fnsqa1 > grep brk bank.out | grep signal
1285769710.2588030 [28314]{812628} Received signal 18, SIGCLD, in brk(), [caug ht], no siginfo
1285769721.5492824 [29067]{814838} Received signal 18, SIGCLD, in brk(), [caug ht], no siginfo
1285769721.6267955 [29074]{814854} Received signal 18, SIGCLD, in brk(), [caug ht], no siginfo
1285769721.6460949 [29076]{814859} Received signal 18, SIGCLD, in brk(), [caug ht], no siginfo
hpus72^root:/fnsqa1 >

Can I know this can be the reason for the memory error.

Aneesh
Dennis Handly
Acclaimed Contributor

Re: Attempt to access item beyond bounds of memory (Signal 11)

>Received signal 18, SIGCLD, in brk(), [caug ht], no siginfo

This just says that a child process died/exited while you were busy working. This is normal.

>Can I know this can be the reason for the memory error?

No, unrelated. Note: This Signal 11 is a software error, nothing to do with RAM failing.
Aneesh Mohan
Honored Contributor

Re: Attempt to access item beyond bounds of memory (Signal 11)

Hi Dennis,

Please let me inform you I have noticed the memory segmentation error is same always.

faulting address: 0xffffffffffff7824


First run:-
===========

834898 1286007085.1875483 [16711]{1483418} brk(0x40016000) ..................................... [entry]
834899 1286007085.1875832 [16701]{1483391} sigprocmask(SIG_SETMASK, 0x200000007fffd380, NULL) .. [entry]
834900 1286007085.1876389 [16212]{1482166} stat("extfh.cfg", 0x9fffffffffff9940) ............... = 0
834901 1286007085.1877169 [16256]{1482285} brk(0x6000000005335000) ............................. = 0
834902 1286007085.1877989 [16305]{1482422} read(10, 0x600000000042bf36, 2064) .................. [entry]
834903 1286007085.1878552 [16416]{1482619} read(10, "0496\0\006\0\0\0\0\0060102\001L ".., 2064) = 1174
834904 1286007085.1879359 [16474]{1482768} semop(70, 0x9fffffffffffc470, 1) .................... [entry]
834905 1286007085.1879896 [16516]{1482885} semop(70, 0x9fffffffffffc470, 2) .................... [entry]
834906 1286007085.1880437 [16622]{1483174} write(10, 0x600000000042c766, 40) ................... [entry]
834907 1286007085.1881253 [16566]{1483027} read(10, "04ab\0\006\0\0\0\0\0060102\001L ".., 2064) = 1195
834908 1286007085.1882174 [16707]{1483406} getmount_cnt(0x9fffffffffff7010) .................... = 12
834909 1286007085.1882796 [16667]{1483298} ftruncate(28, 0) .................................... = 0
834910 1286007085.1883271 [16711]{1483418} brk(0x40016000) ..................................... = 0
834911 1286007085.1883590 [16701]{1483391} sigprocmask(SIG_SETMASK, 0x7fffd380, NULL) .......... = 0
834912 1286007085.1884248 [16212]{1482166} Received signal 11, SIGSEGV, in user mode, [caught], partial siginfo
834913 1286007085.1884298 [16212]{1482166} Siginfo: si_code: SEGV_ACCERR, faulting address: 0xffffffffffff7824, si_errno: 0
834914 1286007085.1884368 [16212]{1482166} PC: 00000001000000a0.0 break.m 0x16000
834915 1286007085.1886208 [16256]{1482285} brk(0x6000000005337000) ............................. [entry]
834916 1286007085.1886807 [16305]{1482422} read(10, "06` \0\006\0\0\0\0\0060102\0\0' ".., 2064) = 1632
834917 1286007085.1887719 [16622]{1483174} write(10, "\0( \0\006\0\0\0\0\003N ca\0\0\0".., 40) . = 40




Second run:-
==========

285772484.1635832 [29623]{816256} fcntl(49, F_SETLKW, 0x9fffffffffff9f00) [entr
y]
1285772484.1636261 [29559]{816105} fcntl(46, F_SETLK, 0x9fffffffffff95b0) = 0
1285772484.1636761 [29596]{816194} stat(0x60000000000477e0, 0x9fffffffffff9610)
[entry]
1285772484.1637118 [29091]{814897} fstat(49, 0x9fffffffffff9630) [entry]
1285772484.1637533 [29141]{815022} stat(0x60000000000477e0, 0x9fffffffffff9610)
[entry]
1285772484.1637878 [29191]{815151} fstat(49, 0x9fffffffffff9e20) [entry]
1285772484.1638263 [29245]{815292} Received signal 11, SIGSEGV, in user mode,
[caught], partial siginfo
1285772484.1638336 [29245]{815292} Siginfo: si_code: SEGV_ACCERR, faulting a
ddress: 0xffffffffffff7824, si_errno: 0
1285772484.1638398 [29245]{815292} PC: 00000001000000a0.0 break.m
0x16000
1285772484.1640003 [29352]{815571} close(48) ............. = 0
1285772484.1640376 [29303]{815440} stat("/fns/pd/r/data/file/ACCTCLSE_E", 0x9fff
ffffffff9550) = 0


I would like to add one more point here.I have copied complete application to one of my test server and ran the jobs succesfully ( 5 times).So I just wondering that there is some issue specific to this server since there is code change.

Aneesh

Could be any kernel parameter causing this restriction

Aneesh
Dennis Handly
Acclaimed Contributor

Re: Attempt to access item beyond bounds of memory (Signal 11)

>I have noticed the memory segmentation error is same always. faulting address: 0xffffffffffff7824

gdb is a better tool to track this down. You can find where this address is coming from.
You can also look at the registers and memory.

>Could be any kernel parameter causing this restriction?

The size of maxssiz could affect the memory region layout and the relative positions of each shared lib data area.

Are the libc (etc) versions the same on both systems?
Aneesh Mohan
Honored Contributor

Re: Attempt to access item beyond bounds of memory (Signal 11)

Hi Dennis,

TEST SERVER:
============
hpus102^root:/ > what /usr/lib/libc.sl
/usr/lib/libc.sl:
$ PATCH_11.31/PHCO_38658 Apr 7 2009 21:48:10 $
hpus102^root:/ > what /usr/lib/libc.0
/usr/lib/libc.0:
PATCH-PHCO_32718 for 10.20; for 10.30, 11.x compatibility libc.1_ID@@/main/r10dav/libc_dav/libc_dav_cpe//1
/ux/core/libs/libc/shared_pa1/libc.1_ID
Feb 4 2005 10:03:03
hpus102^root:/ > what /usr/lib/libc.1
/usr/lib/libc.1:
PATCH-PHCO_32718 for 10.20; for 10.30, 11.x compatibility libc.1_ID@@/main/r10dav/libc_dav/libc_dav_cpe//1
/ux/core/libs/libc/shared_pa1/libc.1_ID
Feb 4 2005 10:03:03
hpus102^root:/ >

hpus102^root:/ > kctune |grep ^max
max_acct_file_size 2560000 Default Immed
max_async_ports 4096 Default Immed
max_mem_window 0 Default Immed
max_thread_proc 1100 1100 Immed
maxdsiz 0x80000000 0x80000000 Immed
maxdsiz_64bit 0xf0000000 0xf0000000 Immed
maxfiles (now) 2048 Default
maxfiles_lim 60000 60000 Immed
maxrsessiz 8388608 Default
maxrsessiz_64bit 8388608 Default
maxssiz 134217728 134217728 Immed
maxssiz_64bit 1073741824 1073741824 Immed
maxtsiz 1073741824 1073741824 Immed
maxtsiz_64bit 0x40000000 0x40000000 Immed
maxuprc 256 Default Immed
hpus102^root:/ >


Production Server:-
==================

hpus72^root:/ > what /usr/lib/libc.sl
/usr/lib/libc.sl:
$ PATCH_11.31/PHCO_38658 Apr 7 2009 21:48:10 $
hpus72^root:/ > what /usr/lib/libc.0
/usr/lib/libc.0:
PATCH-PHCO_32718 for 10.20; for 10.30, 11.x compatibility libc.1_ID@@/main/r10dav/libc_dav/libc_dav_cpe//1
/ux/core/libs/libc/shared_pa1/libc.1_ID
Feb 4 2005 10:03:03
hpus72^root:/ > what /usr/lib/libc.1
/usr/lib/libc.1:
PATCH-PHCO_32718 for 10.20; for 10.30, 11.x compatibility libc.1_ID@@/main/r10dav/libc_dav/libc_dav_cpe//1
/ux/core/libs/libc/shared_pa1/libc.1_ID
Feb 4 2005 10:03:03
hpus72^root:/ >

hpus72^root:/ > kctune |grep ^max
max_acct_file_size 2560000 Default Immed
max_async_ports 4096 Default Immed
max_mem_window 0 Default Immed
max_thread_proc 1100 1100 Immed
maxdsiz 0x80000000 0x80000000 Immed
maxdsiz_64bit 0xf0000000 0xf0000000 Immed
maxfiles 4096 4096
maxfiles_lim 60000 60000 Immed
maxrsessiz 8388608 Default
maxrsessiz_64bit 8388608 Default
maxssiz 134217728 134217728 Immed
maxssiz_64bit 1073741824 1073741824 Immed
maxtsiz 1073741824 1073741824 Immed
maxtsiz_64bit 0x40000000 0x40000000 Immed
maxuprc 13324 13324 Immed
hpus72^root:/ >
Dennis Handly
Acclaimed Contributor

Re: Attempt to access item beyond bounds of memory (Signal 11)

>what /usr/lib/libc.sl

Unless you are running Aries for your COBOL app, you should be looking at /usr/lib/hpux32/libc.so.1.
Aneesh Mohan
Honored Contributor

Re: Attempt to access item beyond bounds of memory (Signal 11)

Hi Dennis,


hpus102^root:/ > what /usr/lib/hpux32/libc.so.1
/usr/lib/hpux32/libc.so.1:
$ PATCH_11.31/PHCO_38658 Apr 7 2009 22:01:37 $
You have mail in /var/mail/root
hpus102^root:/ >



hpus72^root:/ > what /usr/lib/hpux32/libc.so.1
/usr/lib/hpux32/libc.so.1:
$ PATCH_11.31/PHCO_38658 Apr 7 2009 22:01:37 $
hpus72^root:/ >


Aneesh

Aneesh Mohan
Honored Contributor

Re: Attempt to access item beyond bounds of memory (Signal 11)

Hi ,

Can I know How I can find the memory address area avilable to the users (UAM).

I am suspecting if there is any pointers in application,then if any call to the pointer address which is not allocated to the partion may cause the problem.


Aneesh
Aneesh Mohan
Honored Contributor

Re: Attempt to access item beyond bounds of memory (Signal 11)

Hi,

Please find the gdb o/p for the application core dump

(fnsonlpd)hpus72:/fns/p/r/exe>gdb
HP gdb 5.7 for HP Itanium (32 or 64 bit) and target HP-UX 11.2x.
Copyright 1986 - 2001 Free Software Foundation, Inc.
Hewlett-Packard Wildebeest 5.7 (based on GDB) is covered by the
GNU General Public License. Type "show copying" to see the conditions to
change it and/or distribute copies. Type "show warranty" for warranty/support.

(gdb) file /opt/microfocus/cobol/bin/cobrun
Reading symbols from /opt/microfocus/cobol/bin/cobrun...done.
(gdb) core core
warning: core file may not match specified executable file.
Core was generated by `rtsora'.
Program terminated with signal 11, Segmentation fault.
SEGV_ACCERR - Invalid Permissions for object
Error while reading in load map pointer.
warning: Unable to read the load_info structure address from .
#0 0x1ffffffffc883171 in ()
(gdb)


Aneesh
Dennis Handly
Acclaimed Contributor
Solution

Re: Attempt to access item beyond bounds of memory (Signal 11)

>Can I know How I can find the memory address area available to the users?

If you have a core file, that contains all of the address ranges except for read only regions:
elfdump -o -S core-file

In gdb, you can do "info file" and "info shared" for other regions.

>I am suspecting if there is any pointers in application, then if any call to the pointer address which is not allocated to the partition may cause the problem.

Yes. Or if the pointer is corrupted or never initialized.

>Please find the gdb o/p for the application core dump

You need to use "bt" to get a trace.

>HP gdb 5.7 for HP Itanium (32 or 64 bit) and target HP-UX 11.2x.

Please download the latest, 6.1.

>file /opt/microfocus/cobol/bin/cobrun
>warning: core file may not match specified executable file.
Core was generated by `rtsora'.

If rtsora is an executable, you need to use that in the "file" command.

>Error while reading in load map pointer.
warning: Unable to read the load_info structure address from

Probably because you didn't use the right executable?
Aneesh Mohan
Honored Contributor

Re: Attempt to access item beyond bounds of memory (Signal 11)

Thanks Dennis..

We have found the problem .The problem was with a CPU register.HP Language Support Team did Application Core File analysis and Informed that problem is with one of our 10 CPUs.We just isolated with mpsched and replaced the faulty CPU.


Closing the thread.....



Aneesh Mohan
Honored Contributor

Re: Attempt to access item beyond bounds of memory (Signal 11)


Mentioned Above,.....


Aneesh