Operating System - HP-UX
1747984 Members
4839 Online
108756 Solutions
New Discussion юеВ

MF COBOL Server Express 4.0 - Loop on STOP RUN?

 
SOLVED
Go to solution
Michael Mike Reaser
Valued Contributor

MF COBOL Server Express 4.0 - Loop on STOP RUN?

I've got a small-ish (600 line) COBOL program that we're executing on an rp4440 running 11.11 (yeah, I know, I can't do anything about it, so please don't start with "Get on a supported version of the OS") using Micro Focus COBOL Server Express v4.0, and the thing is looping on its STOP RUN statement.

It's obvious to me that the stack has become corrupted, but I've checked and double-checked and quintuple-checked the program and its two subroutines and all the parameters sizes match up, their order matches up, etc.

Here's what I saw from gdb and lsof on the looping process:

(gdb) ba
#0 0xc00000000047b120 in locate_envvar_entry_p+0xa0 () from /opt/lib/cobol/lib/libcobrts64.sl.2
#1 0xc00000000047b778 in cobputenv+0x58 () from /opt/lib/cobol/lib/libcobrts64.sl.2
#2 0xc000000000478134 in open_msg_file+0x234 () from /opt/lib/cobol/lib/libcobrts64.sl.2
#3 0xc000000000478500 in get_msg+0x28 () from /opt/lib/cobol/lib/libcobrts64.sl.2
#4 0xc000000000478584 in get_message+0x2c () from /opt/lib/cobol/lib/libcobrts64.sl.2
#5 0xc00000000044f6f4 in _mFgetrtsmsg+0x84 () from /opt/lib/cobol/lib/libcobrts64.sl.2
#6 0xc00000000045435c in memerr+0x54 () from /opt/lib/cobol/lib/libcobrts64.sl.2
#7 0xc0000000004537a8 in process_handler+0x48 () from /opt/lib/cobol/lib/libcobrts64.sl.2
#8 0xc000000000450a00 in mFt_rt_proc_exec_range+0x138 () from /opt/lib/cobol/lib/libcobrts64.sl.2
#9 0xc0000000004508b4 in mFt_rt_proc_exec+0x1c () from /opt/lib/cobol/lib/libcobrts64.sl.2
#10 0xc000000000453824 in signal_catch+0x54 () from /opt/lib/cobol/lib/libcobrts64.sl.2
#11 0xc0000000004540fc in signal_handler_catch+0x184 () from /opt/lib/cobol/lib/libcobrts64.sl.2
#12
#13 0x3017322c3032342c in ()
#14 0x3637322c3032342c in ()
#15 0xc00000000047b0b4 in locate_envvar_entry_p+0x34 () from /opt/lib/cobol/lib/libcobrts64.sl.2
#16 0x20203736312c3234 in ()
#17 0xc00000000047b0b4 in locate_envvar_entry_p+0x34 () from /opt/lib/cobol/lib/libcobrts64.sl.2
#18 0x37d93ed137da36d0 in ()
#19 0xc00000000047b0b4 in locate_envvar_entry_p+0x34 () from /opt/lib/cobol/lib/libcobrts64.sl.2
Error accessing memory address 0x53db2e71878a3c77: Bad address.
(gdb) quit
The program is running. Quit anyway (and detach it)? (y or n) y
Detaching from program: /opt/lib/cobol/bin/rts64, process 10068
$ lsof -p 10068
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
rts64 10068 mreaser cwd DIR 64,0x10003 16384 692138 /data1/mreaser/test/cgi_node/bin
rts64 10068 mreaser txt REG 64,0x6 26208 41384 /opt/lib/cobol/bin/rts64
rts64 10068 mreaser mem REG 64,0x6 36768 41524 /opt/lib/cobol/lib/cobmfini64.sl.2
rts64 10068 mreaser mem REG 64,0x6 479592 41516 /opt/lib/cobol/lib/cobmffh64.sl.2
rts64 10068 mreaser mem REG 64,0x7 15802 44683 /usr/lib/tztab
rts64 10068 mreaser mem REG 64,0x7 274664 4600 /usr/lib/pa20_64/libm.2
rts64 10068 mreaser mem REG 64,0x6 75400 41480 /opt/lib/cobol/lib/libcobscreen64.sl.2
rts64 10068 mreaser mem REG 64,0x7 23936 4583 /usr/lib/pa20_64/libdl.1
rts64 10068 mreaser mem REG 64,0x7 1869096 4576 /usr/lib/pa20_64/libc.2
rts64 10068 mreaser mem REG 64,0x7 1072416 4578 /usr/lib/pa20_64/libcl.2
rts64 10068 mreaser mem REG 64,0x6 449176 41491 /opt/lib/cobol/lib/libcobmisc64.sl.2
rts64 10068 mreaser mem REG 64,0x6 479368 41484 /opt/lib/cobol/lib/libcobcrtn64.sl.2
rts64 10068 mreaser mem REG 64,0x6 1131968 41504 /opt/lib/cobol/lib/libcobrts64.sl
rts64 10068 mreaser mem REG 64,0x7 236632 4567 /usr/lib/pa20_64/dld.sl
rts64 10068 mreaser 0r CHR 3,0x2 0t0 66 /dev/null
rts64 10068 mreaser 1u REG 64,0x10003 11022 684710 /data1/mreaser/test/cgi_node/log/20090430/neym05MR_20090501_085108.log
rts64 10068 mreaser 2u REG 64,0x10003 0 684806 /data1 (/dev/vg01/lvol3)
rts64 10068 mreaser 4ur REG 64,0x6 52 39072 /opt/lib/cobol/etc/extfh.cfg
rts64 10068 mreaser 5ur REG 64,0x6 52 39072 /opt/lib/cobol/etc/extfh.cfg
$

The "/data1 (/dev/vg01/lvol3)" for stderr is what to me confirmed the stack corruption, since that *SHOULD* be pointing to /data1/mreaser/test/cgi_node/log/20090430/neym05MR_20090501_085108.err but instead thinks it's pointing to a generic lvol???

So, anyone got any Magical Pointers for me, other than "Your software is older than dirt"? If an upgrade will absolutely, positively fix this I'll get the wheels rolling at my end, but I'm not in any position to "upgrade and then see if the problem persists". My hands are tied - my admins nor management chain will approve any change to the software on the system without total assurances that any such change WILL fix the bug. :-/
There's no place like 127.0.0.1

HP-Server-Literate since 1979
2 REPLIES 2
Dennis Handly
Acclaimed Contributor
Solution

Re: MF COBOL Server Express 4.0 - Loop on STOP RUN?

>It's obvious to me that the stack has become corrupted

Using the word "stack" is outdated/incorrect. Some data structure or just memory has been corrupted.

Have you talked to MF yet?

>Here's what I saw from gdb and lsof on the looping process:

Looping where? You need to describe the movement of the PC. Either in words or attach a movie. ;-) Does it stay in frame 0? Does a gdb finish ever return?
Or did you mean it has infinite recursion?

#12

What signal was this? You may have to run the application in gdb.

>The "/data1 (/dev/vg01/lvol3)" for stderr is what to me confirmed the stack corruption
>anyone got any Magical Pointers for me,

(Good catch.) This can't be stack corruption. (FDs aren't stored on the stack like on old MPE.) It is a logic corruption, some evil person closed stderr. I would suggest you use tusc to track this. Or set a breakpoint on close:
(gdb) b close if ($r26 == 2)
Michael Mike Reaser
Valued Contributor

Re: MF COBOL Server Express 4.0 - Loop on STOP RUN?

>Using the word "stack" is outdated/incorrect. Some data structure or just memory has been corrupted.

Yah, I'm letting my "worked on HP systems since a 3000 Series III on MPE III" background show. LOL

>Have you talked to MF yet?

That's the next step. I'm waiting on my boss and/or his boss to get the me the support ID(s) and information I need to do so.

>Looping where? You need to describe the movement of the PC. Either in words or attach a movie. ;-) Does it stay in frame 0? Does a gdb finish ever return?
Or did you mean it has infinite recursion?

I dunno, that's why I was hoping for help. All I can tell is that top(1) says that this program is #1 on Our Hit Parade of CPU hogs - I'm basically taking over one of the CPU cores at 99.5+ percent, and stay there until the process is killed.

>It is a logic corruption, some evil person closed stderr. I would suggest you use tusc to track this. Or set a breakpoint on close:
(gdb) b close if ($r26 == 2)

Good idea aobut the breakpoint. I'll try this when I get the chance.
There's no place like 127.0.0.1

HP-Server-Literate since 1979