Operating System - HP-UX
1752777 Members
6188 Online
108789 Solutions
New Discussion

Memory fault HP-UX / Database ZIM

 
Tiago_BP
Occasional Advisor

Memory fault HP-UX / Database ZIM

We are migrating to HP rx5470 system PA-RISC 11:11. The new solution is rx2800i2with 11.31 Itanium. Our ERP is the ZIM (www.zim.biz). We are encountering the following error when running the application:

Pid 3407 received a SIGSEGV for stack growth failure.LAGEM |
Possible causes: Insufficient memory or swap space, |
or stack size exceeded maxrsessiz |
or stack size limit is set too small. |
Memory fault (coredump)

The version of ZIM is correct (for itanium) and the support of ZIM not find the solution.Seems to be some adjustment of the HP-UX OS, Some tuning of the kernel.

Sorry for the bad writing. writing via google translator.

 

Tiago Bohrz

10 REPLIES 10
Patrick Wallek
Honored Contributor

Re: Memory fault HP-UX / Database ZIM

How much RAM in your server?

How much swap space (show output of 'swapinfo -tam')?

What are the values of the 'max*siz' and 'max*siz_64bit' kernel parameters? Use kctune to find the info.

 

# kctune maxdsiz maxdsiz_64bit maxssiz maxssiz_64bit maxtsiz maxtsiz_64bit


Dennis Handly
Acclaimed Contributor

Re: Memory fault HP-UX / Database ZIM

>Memory fault (coredump)

 

Unfortunately you need to go through each of the possible causes and figure out which was exceeded.

Since you have a core file above, you can use elfdump and gdb to tell you the cause.

 

Include the output of these commands.

elfdump -S -o core

gdb path-to-ZIM core

bt

q

 

An example core file, not stack overflow:

$ file core
core:           ELF-32 core file - IA64 from 'a.out' - received SIGBUS
$ elfdump -o -S  core
                *** Program Header ***
Type     Offset   Vaddr    FSize    Memsz
CoreVer  00000314 00000000 00000004 00000004
CoreKern 00000318 00000000 00000008 00000008
...
CoreStck 0004d930 658ff000 00001000 00001000  RSE stack
CoreStck 0004e930 74684000 0b97c000 0b97c000  user stack

 

If the value of Memsz matches the values of maxrsessiz and maxssiz (for the above 32 bit case), then you have to increase the kernel parms.

 

$ typeset -i10 x=16#00001000; echo $x
4096
$ typeset -i10 x=16#0b97c00; echo $x
12155904

$ /usr/sbin/kctune maxssiz maxrsessiz
Tunable          Value  Expression  Changes
maxrsessiz   0x2800000  0x2800000
maxssiz     0x17f00000  0x17f00000  Immed

Tiago_BP
Occasional Advisor

Re: Memory fault HP-UX / Database ZIM

The server is HP INTEGRITY RX2800 I2, 2 X ITANIUM QUAD CORE 9320,   32 GB RAM PC3.

 

RX1# swapinfo -tam
                 Mb        Mb      Mb   PCT  START/      Mb
TYPE      AVAIL    USED   FREE  USED   LIMIT RESERVE  PRI  NAME
dev          8192       0      8192    0%       0       -    1  /dev/vg00/lvol2
dev         16384      0      16384    0%       0       -    1  /dev/vg00/lvol11
reserve       -          822    -822
memory  31069    3574   27495   12%
total         55645    4396   51249    8%       -       0    -

RX1# kctune maxdsiz maxdsiz_64bit maxssiz maxssiz_64bit maxtsiz maxtsiz_64bit
Tunable                Value                   Expression  Changes
maxdsiz              4294963200       4294963200  Immed
maxdsiz_64bit  4294967296       Default     Immed
maxssiz             8388608              Default     Immed
maxssiz_64bit   268435456         Default     Immed
maxtsiz              100663296         Default     Immed
maxtsiz_64bit  1073741824        Default     Immed

 

Thank you for your attention
Tiago_BP
Occasional Advisor

Re: Memory fault HP-UX / Database ZIM

Output the commands

 

$ file core
core: ELF-32 core file - IA64 from 'zim' - received SIGSEGV

 

$ elfdump -S -o core

core:

*** Program Header ***

Type Offset Vaddr FSize Memsz

CoreVer 00000334 00000000 00000004 00000004
CoreKern 00000338 00000000 00000008 00000008
CoreUTS 00000340 00000000 00000808 00000808
CoreComm 00000b48 00000000 00000003 00000003
CoreProc 00000b50 00000000 0000be00 0000be00
CoreLoad 0000c950 40010000 00270000 00270000
CoreMMF 0027c950 678c0000 00004000 00004000
CoreMMF 00280950 678c4000 00004000 00004000
CoreMMF 00284950 678c8000 00008000 00008000
CoreMMF 0028c950 678d0000 0000c000 0000c000
CoreMMF 00298950 678dc000 00004000 00004000
CoreMMF 0029c950 678e0000 00003000 00003000
CoreMMF 0029f950 678e4000 00003000 00003000
CoreMMF 002a2950 678e8000 00006000 00006000
CoreMMF 002a8950 678ef000 00001000 00001000

CoreMMF 002a9950 678f0000 00004000 00004000
CoreMMF 002ad950 678f4000 00002000 00002000
CoreMMF 002af950 678f6000 00001000 00001000
CoreMMF 002b0950 678f7000 00003000 00003000
CoreMMF 002b3950 678fa000 00002000 00002000
CoreMMF 002b5950 678fc000 00002000 00002000
CoreMMF 002b7950 678fe000 00001000 00001000
CoreStck 002b8950 678ff000 00002000 00002000
CoreStck 002ba950 7fff8000 00008000 00008000

 

 

I do not know if I used the commands correctly
gdb path-to-core ZIM
bt
q

 


$ gdb zim
HP gdb 6.2 for HP Itanium (32 or 64 bit) and target HP-UX 11iv2 and 11iv3.
Copyright 1986 - 2011 Free Software Foundation, Inc.
Hewlett-Packard Wildebeest 6.2 (based on GDB) is covered by the
GNU General Public License. Type "show copying" to see the conditions to
change it and/or distribute copies. Type "show warranty" for warranty/support.
..
warning: Load module /opt/zim/712/zim has been stripped.
Debugging information is not available.

(no debugging symbols found)...
(gdb) bt
No stack.
(gdb) q

 

commands are valid in this case above?
$ Typeset-i10 x = 16 # 00001000; echo $ x
4096
$ Typeset-i10 x = 16 # 0b97c00; echo $ x
12155904

 

Today the output of kctune:


$ /usr/sbin/kctune maxssiz maxrsessiz
Tunable            Value              Expression   Changes
maxrsessiz       401604608   401604608
maxssiz               8388608       Default         Immed


Any solution?

 

sorry, writing via google translator

thanks for the help

Dennis Handly
Acclaimed Contributor

Re: Memory fault HP-UX / Database ZIM

>maxssiz             8,388,608

 

This may be too small.

 

>CoreStck 002b8950 678ff000 00002000 00002000 RSE stack
>CoreStck 002ba950 7fff8000 00008000 00008000 User stack

 

These length values are way too small for a stack overflow.  This means you need to debug it with gdb.

 

>gdb path-to-core ZIM

 

That's not what I said.

 

>$ gdb zim

 

You need to do:

gdb zim core

bt

info reg

disas $pc-16*16 $pc+16*4

q

 

>commands are valid in this case above?

 

You have to use the sizes from your elfdump output.

 

>$ /usr/sbin/kctune maxssiz maxrsessiz
>maxrsessiz       401,604,608

 

This is WAY too big!  Please set it back to 16 MB and reboot when convenient.

I can count on one hand the times I've seen,when it had to be increased from the default 8 MB.

 

Perhaps you mistakenly increased it instead of maxssiz?

Tiago_BP
Occasional Advisor

Re: Memory fault HP-UX / Database ZIM

Modify maxssiz need reboot to confirm change? I modify to 134217728, also not reboot the system (output say:"Can Change          Immediately or at Next Boot".

 

With modify (without restarting) error  still occurs again.

 

$ elfdump -S -o core

core:

*** Program Header ***

Type Offset Vaddr FSize Memsz

CoreVer 00000334 00000000 00000004 00000004
CoreKern 00000338 00000000 00000008 00000008
CoreUTS 00000340 00000000 00000808 00000808
CoreComm 00000b48 00000000 00000003 00000003
CoreProc 00000b50 00000000 0000be00 0000be00
CoreLoad 0000c950 40010000 002b0000 002b0000
CoreMMF 002bc950 600c0000 00004000 00004000
CoreMMF 002c0950 600c4000 00004000 00004000
CoreMMF 002c4950 600c8000 00008000 00008000
CoreMMF 002cc950 600d0000 0000c000 0000c000
CoreMMF 002d8950 600dc000 00004000 00004000
CoreMMF 002dc950 600e0000 00003000 00003000
CoreMMF 002df950 600e4000 00003000 00003000
CoreMMF 002e2950 600e8000 00006000 00006000
CoreMMF 002e8950 600ef000 00001000 00001000

CoreMMF 002e9950 600f0000 00004000 00004000
CoreMMF 002ed950 600f4000 00002000 00002000
CoreMMF 002ef950 600f6000 00001000 00001000
CoreMMF 002f0950 600f7000 00003000 00003000
CoreMMF 002f3950 600fa000 00002000 00002000
CoreMMF 002f5950 600fc000 00002000 00002000
CoreMMF 002f7950 600fe000 00001000 00001000
CoreStck 002f8950 600ff000 17f00000 17f00000 
CoreStck 181f8950 7fff8000 00008000 00008000

 

---------------------------------------------------------------------------------------------

 

Output of

$gdb zim core

$ gdb zim core
HP gdb 6.2 for HP Itanium (32 or 64 bit) and target HP-UX 11iv2 and 11iv3.
Copyright 1986 - 2011 Free Software Foundation, Inc.
Hewlett-Packard Wildebeest 6.2 (based on GDB) is covered by the
GNU General Public License. Type "show copying" to see the conditions to
change it and/or distribute copies. Type "show warranty" for warranty/support.
..
warning: Load module /opt/zim/712/zim has been stripped.
Debugging information is not available.

(no debugging symbols found)...
Core was generated by `zim'.
Program terminated with signal 11, Segmentation fault.
SEGV_MAPERR - Address not mapped to object
(no debugging symbols found)...(no debugging symbols found)...
(no debugging symbols found)...(no debugging symbols found)...
(no debugging symbols found)...(no debugging symbols found)...
#0 0x424b550:0 in <unknown_procedure> + 0 ()

 

(gdb) bt
#0 0x424b550:0 in <unknown_procedure> + 0 ()
#1 0x424b7e0:0 in <unknown_procedure> + 0x290 ()
#2 0x424b7e0:0 in <unknown_procedure> + 0x290 ()
#3 0x424b7e0:0 in <unknown_procedure> + 0x290 ()
#4 0x424b7e0:0 in <unknown_procedure> + 0x290 ()
#5 0x424b7e0:0 in <unknown_procedure> + 0x290 ()
#6 0x424b7e0:0 in <unknown_procedure> + 0x290 ()
#7 0x424b7e0:0 in <unknown_procedure> + 0x290 ()
#8 0x424b7e0:0 in <unknown_procedure> + 0x290 ()
................

................

#6001 0x424b7e0:0 in <unknown_procedure> + 0x290 ()
#6002 0x424b7e0:0 in <unknown_procedure> + 0x290 ()

................

I have not found the end.....

 

---------------------------------------------------------------------------------------------

(gdb) info reg
pr0: 0x1
pr1: 0x1
pr2: 0x1
pr3: 0
pr4: 0
pr5: 0
pr6: 0
pr7: 0
pr8: 0
pr9: 0x1
pr10: 0
pr11: 0
pr12: 0
pr13: 0
pr14: 0
pr15: 0
pr16: 0
pr17: 0
pr18: 0

pr19: 0
pr20: 0
pr21: 0
pr22: 0

pr23: 0
pr24: 0
pr25: 0
pr26: 0
pr27: 0
pr28: 0
pr29: 0
pr30: 0
pr31: 0
pr32: 0
pr33: 0
pr34: 0
pr35: 0
pr36: 0
pr37: 0
pr38: 0
pr39: 0
pr40: 0
pr41: 0
pr42: 0
pr43: 0
pr44: 0
pr45: 0

pr46: 0
pr47: 0
pr48: 0
pr49: 0
pr50: 0
pr51: 0
pr52: 0
pr53: 0
pr54: 0
pr55: 0
pr56: 0
pr57: 0
pr58: 0
pr59: 0
pr60: 0
pr61: 0
pr62: 0
pr63: 0
gr0: 0
gr1: 0x200000004001e2e0
gr2: 0x9fffffff3ffe7c00
gr3: 0x9fffffff3ffe7c00
gr4: 0

gr5: 0xc000000000000408
gr6: 0x60000000c0042750
gr7: 0x20000000600f9750
gr8: 0x1ff
gr9: 0x2000000040154df0
gr10: 0x40154df0
gr11: 0x40154bd0
gr12: 0x200000007b94da80
gr13: 0x20000000600c0fe0
gr14: 0x20000000600c0ce2
gr15: 0x48bf
gr16: 0x17777c
gr17: 0x1010
gr18: 0x20000000400d5180
gr19: 0
gr20: 0x20000000400d3164
gr21: 0x20000000400d31c8
gr22: 0x4
gr23: 0x40288b3c
gr24: 0x20000000400d31c4
gr25: 0
gr26: 0x402a1c3c
gr27: 0x1e3c0

gr28: 0x400d31c8
gr29: 0x400d31c4
gr30: 0xd00
gr31: 0xc000000000000793
gr32: 0x40154df0
gr33: 0
gr34: 0x40155070
gr35: 0x40154e90
gr36: 0x424b590
gr37: 0
br0: 0x424b7e0
br1: 0x60000000c00447c0
br2: 0
br3: 0
br4: 0
br5: 0
br6: 0x60000000c0251000
br7: 0x60000000c013a340
rsc: 0x1f
bsp: 0x2000000077fff2b0
bspst: 0x2000000077fff000
rnat: 0
ccv: 0x100000000

unat: 0
fpsr: 0x9804c9a74433f
pfs: 0xc000000000001026
(sor:0, sol:32, sof:38)
lc: 0
ec: 0
ip: 0x424b550:0
cfm: 0x6
(sor:0, sol:0, sof:6)
psr: 0x436f6d6d61

--------------------------------------------------------------------------------------------

 

(gdb) disas $pc-16*16 $pc+16*4

Dump of assembler code from 0x424b450:0 to 0x424b590:0:

0x424b450:0 <??>:
nop.m 0x0 MMB
,
0x424b450:1 <??>: nop.m 0x0
0x424b450:2 <??>: br.cond.dptk.few XSVarAlign+0x9d0;;
0x424b460:0 <??>:
nop.m 0x0 MMB
,
0x424b460:1 <??>: nop.m 0x0
0x424b460:2 <??>: br.cond.dptk.few XSVarAlign+0x930;;
0x424b470:0 <??>:
adds r37=1,r37 MI,
I,
0x424b470:1 <??>: sxt2 ret1=r38;;
0x424b470:2 <??>: sxt2 ret0=r37;;
0x424b480:0 <??>:
cmp4.lt p6=ret0,ret1 MMB
,
0x424b480:1 <??>: nop.m 0x0
0x424b480:2 <??>: (p6) br.cond.dptk.few XSVarAlign+0x6e0;;
0x424b490:0 <??>:
nop.m 0x0

 

,
0x424b490:1 <??>: nop.m 0x0
0x424b490:2 <??>: br.cond.dptk.few XSVarAlign+0x960;;
0x424b4a0:0 <??>:
mov r43=0 MMI

0x424b4a0:1 <??>: mov r45=r35
0x424b4a0:2 <??>: zxt2 r44=r33
0x424b4b0:0 <??>:
mov r46=r36 MMB
,
0x424b4b0:1 <??>: mov r32=0x1ff
0x424b4b0:2 <??>: br.call.dptk.few rp=XSVarAlign+0xa10;;
0x424b4c0:0 <??>:
mov ret1=ret0;; M,M
I,
0x424b4c0:1 <??>: mov r33=ret1
0x424b4c0:2 <??>: zxt2 ret1=ret1;;
0x424b4d0:0 <??>:
cmp4.ne p6=r32,ret1 MMB
,
0x424b4d0:1 <??>: nop.m 0x0
0x424b4d0:2 <??>: (p6) br.cond.dptk.few XSVarAlign+0x9b0;;

 

0x424b4e0:0 <??>:
nop.m 0x0 MMB
,
0x424b4e0:1 <??>: nop.m 0x0
0x424b4e0:2 <??>: br.cond.dptk.few XSVarAlign+0x9f0;;
0x424b4f0:0 <??>:
nop.m 0x0 MII
,
0x424b4f0:1 <??>: mov rp=r42
0x424b4f0:2 <??>: zxt2 ret1=r33;;
0x424b500:0 <??>:
mov ret0=ret1 MIB
,
0x424b500:1 <??>: mov.i ar.pfs=r41
0x424b500:2 <??>: br.ret.dptk.few rp;;
0x424b510:0 <??>:
mov r43=0x1773 MMB
,
0x424b510:1 <??>: nop.m 0x0
0x424b510:2 <??>: br.call.dptk.few rp=ERUErr+0x0;;
0x424b520:0 <??>:
mov gp=r40

 

0x424b520:1 <??>: nop.m 0x0
0x424b520:2 <??>: br.cond.dptk.few XSVarAlign+0x9f0;;
0x424b530:0 <??>:
mov ret1=0x1ff MI,
I
0x424b530:1 <??>: mov rp=r42;;
0x424b530:2 <??>: mov.i ar.pfs=r41
0x424b540:0 <??>:
mov ret0=ret1 MMB
,
0x424b540:1 <??>: nop.m 0x0
0x424b540:2 <??>: br.ret.dptk.few rp;;
0x424b550:0 <??>:
alloc r62=ar.pfs,0,32,6,0 MMI

0x424b550:1 <??>: adds sp=-48,sp
0x424b550:2 <??>: mov r63=rp
0x424b560:0 <??>:
mov r61=gp MMI
,
0x424b560:1 <??>: mov r39=r35
0x424b560:2 <??>: mov r38=r34;;
0x424b570:0 <??>:

 

mov r37=r33 MI,
I
0x424b570:1 <??>: mov r36=r32;;
0x424b570:2 <??>: zxt2 r64=r37
0x424b580:0 <??>:
nop.m 0x0 MMB
,
0x424b580:1 <??>: nop.m 0x0
0x424b580:2 <??>: br.call.dptk.few rp=GetAddr+0x0;;
End of assembler dump.

 

 


will decrease maxrsessiz to 16 MB

Perhaps you mistakenly increased it instead of maxssiz?

probably, but I was not.

 

 

You have to use the sizes from your elfdump output.

As the above data, what parameters should I change, I missed the part of elfdump?

 

thanks for the help, and sorry for bad writing.

Dennis Handly
Acclaimed Contributor

Re: Memory fault HP-UX / Database ZIM

>Modify maxssiz need reboot to confirm change? I modify to 134,217,728

 

You don't need to reboot to change max[sd]siz*.  Only for maxrsessiz*.


>CoreStck 002f8950 600ff000 17f00000 17f00000 RSE stack
>CoreStck 181f8950 7fff8000 00008000 00008000 user stack

 

This indicates a RSE stack overflow.  The value of 0x17f00000 matches your maxrsessiz and $bspst.

 

>#1 0x424b7e0:0 in <unknown_procedure> + 0x290

>#6002 0x424b7e0:0 in <unknown_procedure> + 0x290

>I have not found the end

 

So there is your infinite recursion.  If you do this in gdb, one can calculate how many:

disas 0x424b7e0-0x290 0x424b7e0-0x290+16*4

 

But that is most likely:

0x424b550:0 <??>:  alloc r62=ar.pfs,0,32,6,0 RSE frame

0x424b550:1 <??>:  adds sp=-48,sp               User frame

 

So with (32+6)* 8 bytes per register frame.  This gives 1.3 million frames:

$ typeset -i10 x=16#17f00000; echo $x
401604608
$ echo $((x *63/64 / ((32+6)*8) ))
1300426

 

This number of frames would give for the user stack:

$ echo $(( 1300426 * 48 ))

62,420,448


But the size of the user stack is 32768.

 

This most likely indicates a bug, rather than an algorithm flaw of using too much recursion.

Since the two stacks don't match, there maybe something wrong with the machine code??

 

>bsp:    0x2000000077fff2b0
>bspst: 0x2000000077fff000

 

This indicates how much you have gone beyond the end of the RSE stack.

 

>and the support of ZIM not find the solution.

 

You're in contact with ZIM for support?  You may want to show them this topic.

Tiago_BP
Occasional Advisor

Re: Memory fault HP-UX / Database ZIM

Dennis, thank you for your attention and cooperation. 

 

We are in contact with the support ZIM, but we're not found a solution. 

 

I will show the topic so they can check.

 

a detail that I wanted to comment, zim can run runtime on (compiled) and runtime off (source). 

At runtime off, ERP works. At runtime on, not works.

 

This can be a problem in the ZIM compiler?

 

We're out of ideas to test and find the problem, we can eliminate problems of kernel tuning?

 

Once again, thank you for your attention.

 


Dennis Handly
Acclaimed Contributor

Re: Memory fault HP-UX / Database ZIM

>zim can run runtime on (compiled) and runtime off (source). 

>At runtime off, ERP works. At runtime on, not works.

 

I'm not sure what these terms mean?  Compiled in terms of something like byte code?

Or are native IA64 instructions generated?

 

>This can be a problem in the ZIM compiler?

 

I suppose it's possible.

Since the two stacks don't match, there maybe something wrong with the machine code??

 

>We're out of ideas to test and find the problem, we can eliminate problems of kernel tuning?

 

I would say so.  You shouldn't need that much recursion.

 

>Once again, thank you for your attention.

 

If you are happy with the answers, please click on the Kudos star for each helpful post.