Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Crash in DCE...

SOLVED
Go to solution
Willem Grooters
Honored Contributor

Crash in DCE...

One of the DCE programs (sender) may crash occasionally:

$ run/nodebug WPSDSND
%CMA-F-EXCCOP, exception raised; VMS condition code follows
-SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=00000000, PC=001EE8C5, PSL=03C00000
%TRACE-F-TRACEBACK, symbolic stack dump follows
module name routine name line rel PC abs PCCL
0003D718 0003D718
0005AD2E 0005AD2E
0002F42B 0002F42B
002A86E3 002A86E3
00065734 00065734
0005AE87 0005AE87
00000000 00000000

and since this is in a batch job, it shows up in accounting:

14-NOV-2008 22:36:48 PROCESS BATCH TGROOTERPROD 00033979 10408014

This error code translates to:

TKS_PROD» write sys$output f$message (%x10408014)
%CMA-F-NOMSG, Message number 10408014

Doesn't tell much except it's in some multi-threaded part of the program - and this has been confirmed examining the map file.

It could happen after any number of messages sent. I don't think it has to do with the message itself. Due to the way the progtram works, it will restart automatically and access the first message fit for sending - the one it just crashed on - and happily send it.

The program is NOT linked /NOTRACEBACK, because traceback is missing I guess the error shows up in some multi-threaded code. I found CMA as 'Common Multithread Archietcture'.
It _could_ be that PTHREADS isn't linked with the program - there have been issues with PTreads in some applications and this might have been one of them.

We couldn't find any clue on the cause. It's not something with hibernation within the progranm itself. It uses LIB$SPAWN (WAIT ) (see http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1288314)

So we're wondering where to look for clues and hints on what may cause the problem.
Same environment: VAX, VMS 7.1`/7.2, DCE 1.5 + ECO1
Willem Grooters
OpenVMS Developer & System Manager
3 REPLIES
Volker Halle
Honored Contributor
Solution

Re: Crash in DCE...

Willem,

%CMA-F-EXCCOP indicates, that an exception has been raised from a routine, which was being executed within the pthreads threads context.

You have the failing PC = 001EE8C5, find out in which module/routine this code is in. If this address is not within your program, consider to use SDA> SHOW PROC/IMAGE to find out, which library this code is in.

Consider to create a process dump by using $ SET PROC/DUMP before executing your image. Then analyze the resulting process dump with ANAL/PROC.

Volker.

Willem Grooters
Honored Contributor

Re: Crash in DCE...

Found the error's origin deep inside a package I have no access to, accessing the PTHREADS library. I discussed the problem with collegues that have a deeper knowledge of this software:
1. It does a refresh of 'user credentials' in regular intervals and somewhere in that routine it runs off the rails.
2. There are two versions of this shared image: one based on OpenVMS 6.2 (without PTHREADS) and one based on OpenVMS 7.2 (with PTHREADS) and we run the latter - on a 7.1 machine. That might well be the problem:

DBG> show call
module name routine name line rel PC abs PC
SHARE$LIBRTL 00000000 0003DB18
SHARE$PTHREAD$RTL 00000000 0005B12E
SHARE$CMA$RTL 00000000 0002F82B
SHARE$HAD_SHR 00000000 002A8AE3
SHARE$PTHREAD$RTL 00000000 00065B34
SHARE$PTHREAD$RTL 00000000 0005B287
00000000 00000000
00000000 83AB5422

where there is:
* some data not initialzed
* dynamic link fails
* any other error that is not handled (assumed not to fail ?)
Willem Grooters
OpenVMS Developer & System Manager
Willem Grooters
Honored Contributor

Re: Crash in DCE...

Since this communication channel is actually obsolete, the program having this problem will soon be removed from the suite; so we leave it as it is.
Willem Grooters
OpenVMS Developer & System Manager