Operating System - OpenVMS
1752800 Members
5206 Online
108789 Solutions
New Discussion юеВ

Re: Alpha 4100 5/466 1GB memory

 
SOLVED
Go to solution
Chris Smith_23
Frequent Advisor

Re: Alpha 4100 5/466 1GB memory

Volker

OK. I visited the site yesterday afternoon and took copies of the CLUE$VMP4_...LIS file and the clue.txt produced by the SDA sequence you suggested. These are in the form of session logs from a SecureCRT serial connection to the Alpha. Both are zipped into one file called VMP4.zip.

I also managed to ftp the sysdump.dmp file to my OVMS Alpha system here so any further info I can gather without the trip to site.

Many thanks

Chris
Volker Halle
Honored Contributor

Re: Alpha 4100 5/466 1GB memory

Chris,

here is a summary of the crash:

Bugcheck Type: SSRVEXCEPT, Unexpected system service exception
VMS Version: V7.2-1
Current Process: _FTA8:
Current Image:
Failing PC: FFFFFFFF.92CA27C4 IMG$ADD_PRIVILEGED_VECTOR_ENTRY+00544
Failing PS: 00000000.0000000A
Module: IMAGE_MANAGEMENT (Link Date/Time: 28-MAY-1999 23:35:15.10)
Offset: 0000C7C4

failing instruction stream:

IMG$ADD_PRIVILEGED_VECTOR_ENTRY+00524: LDL R17,#X0040(R23)
IMG$ADD_PRIVILEGED_VECTOR_ENTRY+00528: LDL R18,#X003C(R23)
IMG$ADD_PRIVILEGED_VECTOR_ENTRY+0052C: BIS R31,R31,R2
IMG$ADD_PRIVILEGED_VECTOR_ENTRY+00530: SUBL R17,#X01,R3
IMG$ADD_PRIVILEGED_VECTOR_ENTRY+00534: ADDL R23,R18,R23
IMG$ADD_PRIVILEGED_VECTOR_ENTRY+00538: BLT R3,#X00002D
IMG$ADD_PRIVILEGED_VECTOR_ENTRY+0053C: LDQ_U R31,(SP)
IMG$ADD_PRIVILEGED_VECTOR_ENTRY+00540: LDL R19,#X0010(R24)
IMG$ADD_PRIVILEGED_VECTOR_ENTRY+00544: LDL R5,#X0018(R23)

R3 = 00000000.6C6C6C6B
...
R17 = 00000000.6C6C6C6C
R18 = 00000000.6C6C6C6C
...
R23 = 00000000.6F10EC6C

The system crashes with ACCVIO in EXEC mode when executing instruction LDL R5,#X0018(R23) due to an invalid address in R23

R23 had a value of 02A48000 at the entry into the above instruction stream and should have pointed to some data structure. R17 and R18 (queue header ?) have been loaded from this data structure, but the datastructure contains invalid data:
6C6C6C6C = '||||' in ASCII - when using these values as adresses, the system crashes (now as BUGCHECKFATAL is 1).

This address space is occupied by the global image DXML$FGS_BLAS1E, so the problem could be in that code. It could also be some run-away data copy/move, which has overwritten the address space.

Once you can read the dump on your OpenVMS system, you need to try to find out, which part of memory has been overwritten by 6C6C6C6C. Start with

SDA> EXA 02A48000;50 ! should show corruption

then work backwards by 100 or 1000, e.g.

SDA> EXA 02A48000-100;50 etc.

to find, where the corruption starts.

Volker.
Volker Halle
Honored Contributor
Solution

Re: Alpha 4100 5/466 1GB memory

Chris,

all the files in this process are from DKB0:. You can find the file-ids in the SDA> SHOW PROC/CHAN output.

You could use the following command, to dump the file headers to determine the file names:

$ dump/head/block=count=0 DKB0:/ident=

where is the file-id - first number from VMP4$DKB0:(xxx,xxx,0)

If any of those files are .EXE files, run ANAL/IMAGE on them.

It may be, that one of the image files are corrupted (by that 6c6c6c6 pattern !).

SDA> SHOW PROC/IMAGE shows bad start/end address values for DXML$FGS_BLAS1

Volker.
Chris Smith_23
Frequent Advisor

Re: Alpha 4100 5/466 1GB memory

Volker

Well, you don't hang about do you? I'd barely had time to shut my laptop down when your reply came in!

The bad news is that, having ftp'ed the sysdump.dmp file from my laptop to my OVMS Alpha system, when I ran SDA I got a message about wrong header type for this version of SDA!

As I posted earlier, I did find a lot of files on DKB0: with inconsistant headers and possible bad blocks. So the chances are that one of the processing applications has been corrupted hence causing the violation. I'm due back there tomorrow and will let you know what I find.

Many, many thanks for your help.

Cheers

Chris
Chris Smith_23
Frequent Advisor

Re: Alpha 4100 5/466 1GB memory

Volker

You are a genius! I did a dump/head... on all the fids for this process then ana/ima on the executables without finding any problems. Then I did ANA/IMA on SYS$LIBRARY:DXML$FGS_BLAS1.EXE. That threw up a very interesting error which showed the string of 6Cs we were looking for. I reinstalled the DXML run-time library fron the layered products CD and the process now runs to a point beyond the original failure. We are not comletely out of the woods yet but I think the rest of the problems are disk header curruption related. I've attached the console session log of the ana/ima so you can see the error. I guess the module image may have got corrupted when the original memory problem occured - who knows.

Anyway the customer is very happy now.

Thank you very much for your hard work in analysing this problem and pointing me in the right direction.

Cheers

Chris
Volker Halle
Honored Contributor

Re: Alpha 4100 5/466 1GB memory

Chris,

the ANAL/IMA output shows a corruption in the IMAGE ACTIVATOR FIXUP SECTION:

EIAF$L_QRELFIXOFF : 6C6C6C6C
EIAF$L_LRELFIXOFF : 6C6C6C6C
EIAF$L_QDOTADROFF : 6C6C6C6C
EIAF$L_LDOTADROFF : 6C6C6C6C
EIAF$L_CODEADROFF : 6C6C6C6C
EIAF$L_LPFIXOFF : 6C6C6C6C
EIAF$L_CHGPRTOFF : 6C6C6C6C
EIAF$L_SHLSTOFF : 6C6C6C6C
EIAF$L_SHRIMGCNT : 6C6C6C6C
EIAF$L_SHLEXTRA : 6C6C6C6C
EIAF$L_PERMCTX : 6C6C6C6C
EIAF$L_LPPSBFIXOFF : 6C6C6C6C

The crash happened in IMAGE_MANAGEMENT code, which is processing this kind of information, bingo !

Volker.

PS: Again, this example shows, that there always is a way in OpenVMS to DIAGNOSE a problem to come up with the a solution (sometimes just a workaround) for the underlying problem. Diagnosis works much better than speculation. RE-INSTALL VMS would also have worked, but that's not the way we solve problems in OpenVMS ;-)
Chris Smith_23
Frequent Advisor

Re: Alpha 4100 5/466 1GB memory

Volker

As soon as I saw the strings of 6Cs in the fix-up area I knew we'd found the culprit.

Back in the 1980s & 90s I spent many hours staring at crash dumps from VAX-VMS but haven't done much of that recently. We used to use VAX-VMS as a vehicle for hosting real-time aircraft & systems models. If I phoned support at DEC at the Viables in Basingstoke (but a memory now) I used to be told that I probably knew more about VMS than they did!!!

If I may, a supplementary question regarding my OVMS Alpha system here?

As I mentioned long ago in the thread, I am unable to log in using CDE or DECWindows. I can log in in Failsafe Mode. In the other modes the system seems to hang for many minutes with the blues screen of CDE or the black screen of DECWindows before eventually displaying a grey pattern screen with an up/left (diagonal) pointer arrow which I can move about - but that is all. I can log in via the network using telnet but the console is useless. Any idea what may be causing this?

Cheers

Chris
Volker Halle
Honored Contributor

Re: Alpha 4100 5/466 1GB memory

Chris,

check the SYS$MANAGER:DECW$*.LOG files for any errors. Consider to open another thread for troubleshooting your DECwindows problem.

Volker.
Chris Smith_23
Frequent Advisor

Re: Alpha 4100 5/466 1GB memory

Volker

OK. Thanks very much. I'll close this thread.

All the best.

Chris
Chris Smith_23
Frequent Advisor

Re: Alpha 4100 5/466 1GB memory

Thanks to Volker's analysis of the crash dump data, a solution was found which, luckily, only involved reinstalling the DXML runtime libraries.