Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

VMS upgrade v7.2-2 to v7.3-2 - subsequent application failure

 
SOLVED
Go to solution
Art Wiens
Respected Contributor

VMS upgrade v7.2-2 to v7.3-2 - subsequent application failure

This isn't my problem to solve (I am not the programmer), but I wanted to get get some insight if possible ...

Environment:
- ES47 model 4, 8GB memory
- VMS v7.2-2 restored from an existing Alpha 800, 512MB memory
- Upgraded to v7.3-2, all VMS patches applied

I tried one of the applications (written in BASIC v1.3 and uses FMS v2.4 screens). I am able to launch the app and navigate it's menues, choose a function that I have a bit of familiarity with and inquire/retrieve a record ... all seems to work fine. Trying to exit the application (Gold PF1) the process gets an ACCVIO (see attached).

Is there anything that can be gleaned from this little bit of info? Is there anything I can do/try from a VMS perspective? I don't have access to the source code to answer any such specific questions.

Cheers,
Art
19 REPLIES 19
Hein van den Heuvel
Honored Contributor
Solution

Re: VMS upgrade v7.2-2 to v7.3-2 - subsequent application failure

It is not unlikely to be a badly handled error.

Those addresses do not mean much ot anyone.
The very least you need to provide/consider is some crude memory map.
And I would also run once with $SET WATCH FILE/CLA=MAJOR. It is not unlikely to be an environment / file problem

$SET PROC/NAM=TEST
$SET WATCH FILE/CLA=MAJOR
$RUN program
:
! manoeuver around, untill just before exit
! Watch for errors in set watch output all along

^Y SPAWN ! Or second window

$ANAL/SYSTEM
SDA> SET PROC TEST
SDA> SHOW PROC/CHAN
SDA> SHOW PROC/IMAGE
SDA> EXAM/INSTR interesting-address-from-accvio - 40 ; 80

Does SDA in 7.3-2 have LNM TRACE?
Again, just in case it is environmental and the access happens to try a logical name use that.

LNM LOAD
LNM START TRACE

Back to main window and try the exit.

LNM STOP TRACE
LNM SHOW TRACE (filter for the right PID)

Good luck!
Hein.






DECxchange
Regular Advisor

Re: VMS upgrade v7.2-2 to v7.3-2 - subsequent application failure

Hello,
I doubt if something like this has anything to do with your OS upgrade. It is probably an application level bug. So there's no way you can recompile and relink that particular program with debug?

$ basic/debug/noopt
$ link/deb /opt

If there is an option file? This may not be the exact syntax for recompiling and relinking on your system, just a general suggestion.

Otherwise, I think those SDA suggestions are very informative and useful. I think though at some point, you want to be able to get access to source code or somebody who does have access to source code and solve the problem that way.

Another thought, you might want to check the process quotas, if they changed with the upgrade. Do you have a copy of the operating system before the upgrade you can fall back on to see if you still get this error? And you can check for any changes in quotas?
Hoff
Honored Contributor

Re: VMS upgrade v7.2-2 to v7.3-2 - subsequent application failure

This ACCVIO looks comparatively trivial to find, given access to the source code and a debug build.

On zero evidence, my first target would be for an error in a declared exit handler ($dclexh), or something in the main exit path. But again, a reproducible error is a wonderful thing.

And no, it's not clear if this a bug introduced in the upgrade, or a latent bug in the application. My bet would be on the latter.
Volker Halle
Honored Contributor

Re: VMS upgrade v7.2-2 to v7.3-2 - subsequent application failure

Art,

issue a SET PROC/DUMP and then run your program. Once it will ACCVIO, you'll get a process dump written. Consider the run with CMKRNL priv set, so that you get all of of the process address space and important parts of system address space dumps.

$ ANAL/PROC imagename.DMP
DBG>

You can now debug the failing isntruction, you can invoke SDA (from the DBG> prompt type SDA) to look at the images activated in the process and the channels and have all the information you need - except direct mapping of adresses to source code lines.

Volker.
Ian Miller.
Honored Contributor

Re: VMS upgrade v7.2-2 to v7.3-2 - subsequent application failure

would not running with CMEXEC be enough as system data structures are often protected ERKW
____________________
Purely Personal Opinion
Volker Halle
Honored Contributor

Re: VMS upgrade v7.2-2 to v7.3-2 - subsequent application failure

Ian,

you're right. Running the image (creating the process dump) with CMEXEC prevents the following error message:

%SDA-W-EXCLDATA, data excluded from dump due to insufficient privilege

when trying to analyze the process dump with SDA.

Merry Christmas,

Volker.
Art Wiens
Respected Contributor

Re: VMS upgrade v7.2-2 to v7.3-2 - subsequent application failure

So I was just about to try some of your suggestions and voila ... the problem is gone!(?) I did nothing to it over the weekend, noone else knows the system is up yet. I went back to make sure I was using the right account ... yes. Tried with and without privs, no diff. How can this be?! Self healing applications?!

What I've found is that the problem only occurs if "the user" is an RTA device ie. a Decnet session (SET HOST 0) which is what I was doing on Friday. If I Telnet in, it works fine. Something changed in Decnet IV in VMS 7.3-2?

I will continue the investigation.

Cheers,
Art
Art Wiens
Respected Contributor

Re: VMS upgrade v7.2-2 to v7.3-2 - subsequent application failure

Using the SET WATCH FILE command and side by side comparing a Telnet vs Decnet session: at the point where it should give a message saying "Exiting the xxx application", on the Decnet session there is an extra file access then the error:

%XQP, Thread #0, Access (0,0,0) Status: 00000910
%XQP, Thread #0, Access SORTMSG.EXE;1 (2232,3,0) Status: 00000001
%XQP, Thread #0, Control function (2232,3,0) Status: 00000001
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=000000000000
0009, PC=0000000000EE3298, PS=0000001B

The Telnet session does not access this file - other than that, all other accesses are the same.

Art
Hoff
Honored Contributor

Re: VMS upgrade v7.2-2 to v7.3-2 - subsequent application failure

This smells like a sensitivity to whatever cruft happens to be left sitting on the stack, or similar.

Exit and error handlers are sensitive to these sorts of subtleties, and ASTs can encounter similar sensitivities.

Errors that move are errors that involve uninitialized values, or stack values, or race conditions, unsynchronized completions or other such.
Hein van den Heuvel
Honored Contributor

Re: VMS upgrade v7.2-2 to v7.3-2 - subsequent application failure

Well that's all odd/fun.

btw... the status in the $set watch log shown is hex:
$ exit %x910
%SYSTEM-W-NOSUCHFILE, no such file

but that's probably reflecting on the lines just before that.

Odd, how a sortmsg can touched or not.

Are the paths through (sy)login.com exactly the same?
Toss in a $SHOW TERM/FULL, and maybe a $SHOW LOG/PROC, $SHOW LOG/JOB ?

fwiw,
Hein.

Volker Halle
Honored Contributor

Re: VMS upgrade v7.2-2 to v7.3-2 - subsequent application failure

Art,

did you produce a process dump ? This should at least tell you, in which piece of software the ACCVIO is happening (PC=0EE3298). And what's on the call statck at that point of time.

Volker.

Art Wiens
Respected Contributor

Re: VMS upgrade v7.2-2 to v7.3-2 - subsequent application failure

The app is compiled/linked with debug and I found the source. I've traced it as far as I can and have included the output. I also traced it with a working Telnet session and it just moves past where the Decnet session fails. Google hasn't turned up much on FDV$DTERM.

Art
Hein van den Heuvel
Honored Contributor

Re: VMS upgrade v7.2-2 to v7.3-2 - subsequent application failure

Hi again Art,

I'm afraid I can not try just now, but I would guess that FDV$DTERM (Detach Terminal) is reacting to a bad 'TCA'.

I'd check the return status on the FDV$ATERM (attach Terminal) for the both situations.
It may have failed. Of course that's no excuse for DTERM to ACCVIO, but it may help your quest.

btw... odd for a program to mix direct screen IO (Escape sequence to line 24) and FMS calls.


fwiw,
Hein.
Hoff
Honored Contributor

Re: VMS upgrade v7.2-2 to v7.3-2 - subsequent application failure

FDV$DTERM is part of the rundown processing for DEC FMS.

Something has romped on the stack, or there's an exit handler messed up, the TCA has been sat upon, etc.

http://h18000.www1.hp.com/support/asktima/appl_tools/0090948E-B0A673E0-0801E7.html

Art Wiens
Respected Contributor

Re: VMS upgrade v7.2-2 to v7.3-2 - subsequent application failure

I found in the source where FMS is initialized:

FMS::FDV$Status_l = FDV$ATERM( FMS::TCA, ! Terminal Control Area &
"5000"L, ! Size of TCA &
"1"L) ! Logical I/O Channel number
if FMS::FDV$Status_l <> FDV$_SUC ! FDV status code
then Print "FDV$ATERM failed ";FMS::FDV$Status_l;" in ";PROGNAME;VERSION;RELEASEDATE;BEL
Exit Function

ie. it checks the status after it sets it up and doesn't fail there. I looked at FMS::TCA and it doesn't seem to change value during or after the failure.

DBG> dump fms::tca
-1776 17661952 4236 .........ù.. 00000000006643F9

Hoff, I read that article you gave ... although not an "exhaustive" analysis on my part, but the variables seems to be different in the various functions.

I did go back to the v7.2-2 system and tried it ... works fine by Telnet or Decnet. The code is circa 1985 and doesn't look like it's been touched since 1992 ... how long can a bug be "dormant"?! 15 years I guess! ;-)

I should just hand this over to "the programmers" but I'm not sure they won't be back at my desk shortly thereafter. What else can I check for? (and how?)

Cheers,
Art
Hein van den Heuvel
Honored Contributor

Re: VMS upgrade v7.2-2 to v7.3-2 - subsequent application failure

Art,

That looks like relatively solid BASIC/FMS coding judging by the clear type casting and such. Still, stuff can slip in.
The size '5000' is a little suspect. I don't think I've ever used anythign but '12'. The AWKSP yes, there a larger vale helped avoiding re-allocations.

I now recall issues with this set of calls, way back when (1991!) at that time resulting in 'Illegal String Class Error' on program 'end', but it was triggered by FDV$DTERM being passed a bad/stale TCA and it just went with it, corrupting variables on the stack. This sort of thing could easily be version dependent.. is random stack addresses are used at time point.
Andd ACCVIO is only a bit away from illegal string so to speak.

The TCA and WORKSPACE are arrays of 3 LONGwords, or 12 byte strings passed by descriptor. The contents is 'opaque', but the data must ofcourse stay valid. A static variable is most save. You can use (dynamic) strings, but they must be pre-extended to the 12 bytes.
Looks like a record structure is used.
You may want to doublecheck the details.
You may want to doublecheck 'option type=explicit' is in effect.

From the FMS manuals:
"The locations for workspace, terminal control area, run-time
memory-resident form area, and status recording variables
must all continue to exist while the Form Driver is using
them. They must remain allocated until the workspace and
terminal control area are detached, until forms in memory
location are deleted, and until the status reporting variables
are no longer used. Protect the variables by placing them in
a common storage area; otherwise, the compiler might place
them in dynamic storage."

http://www.sysworks.com.au/disk$axpdocjun042/progtool/dyy4aaa6.bkb

http://www.sysworks.com.au/disk$axpdocjun042/progtool/dyy4aaa6.p109.bkb

ATERM FDV$ATERM (tca.ml.da,size.rl.r,channel.rl.r[,trmnal.rt.dx1 [,faketrm-
typ.rt.dx1[,options.rl.r]]])

good luck!
Hein.
Hoff
Honored Contributor

Re: VMS upgrade v7.2-2 to v7.3-2 - subsequent application failure

{{ how long can a bug be "dormant"?! 15 years I guess! ;-) }}

For as long as the code is old, technically.

{{ I should just hand this over to "the programmers" but I'm not sure they won't be back at my desk shortly thereafter. What else can I check for? (and how?) }}

The trigger in that support article was stuff messing about within the TCA, intentionally or otherwise. A stack bug can, for instance, slam other storage (such as the TCA) on the stack. An IOSB that gets written to storage no longer allocated, etc.

I'd first review the whole of the code, and add explicit error checking throughout. Then head off after any asynchronous processing and the associated coding errors that can lurk. There exists a list of some of the more common coding errors over in topic (1661) of the old Ask The Wizard area. If things get weird, start looking for subtle and latent errors, and by solidifying the code.

Volker Halle
Honored Contributor

Re: VMS upgrade v7.2-2 to v7.3-2 - subsequent application failure

Art,

you might want to use DBG> DUMP/LONG to make sure, you see all the binary data in the TCA area.

Volker.
Art Wiens
Respected Contributor

Re: VMS upgrade v7.2-2 to v7.3-2 - subsequent application failure

Hey ... 10 points for me!! Look what I found:

http://ftp.support.compaq.com.au/pub/patches/vms/axp/v7.1/fms/2.4/decfmseco5024.README

It mentions a couple of cases where ACCVIO's can occur, none match my scenario "exactly" but worth a shot.

It looks like our FMS installation was the original FMS v2.4 ie. no patches applied.

I installed the patch and tried the app a few times Decnet and Telnet ... no crashes on exiting! Hopefully it's "fixed" now.

Thanks all for the help/advice, learned quite a few things along the way - never used the debugger interface before - never looked at the app's mainmenu source before.

Cheers,
Art