Operating System - OpenVMS
1829121 Members
2031 Online
109986 Solutions
New Discussion

System Crash, Patch allready downloaded

 
SOLVED
Go to solution
Mohamed  K Ahmed
Trusted Contributor

System Crash, Patch allready downloaded

Dear All,

Today I got to work to find my OpenVMS V 7.3-1 ES40 system crashed. With no hardware error detected, I rebooted the system, I looked at the errorlog file when I found that there was an error " System Assert Failure Detected, error during CTR processing of EVT seg"

I searched the compaq site when I found that it is a bug and a remedy for it is in the following patch:
VMS731_SYS-V0500 Facility Kit For OpenVMS Alpha V7.3-1

I checked my system, I have this patch downloaded as well as the VMS731_SYS-V0600.

My question, why this error happened and crashed the system while its remedy patch is allready installed

Please help

Mohamed
15 REPLIES 15
Ian Miller.
Honored Contributor

Re: System Crash, Patch allready downloaded

this sort of issue is best worked by logging a call with hp and suppling the clue file. The crash may be similar but different to the one fixed by VMS731_SYS-V0500
____________________
Purely Personal Opinion
Volker Halle
Honored Contributor

Re: System Crash, Patch allready downloaded

Mohamed,

system crashes can have various different reasons. The output of the SDA> CLUE CRASH command would give a first idea about the type of crash. Looking at the error log is typically only necessary for MACHINECHK crashes.

The only ASSERTFAIL crash reference found in VMS731_SYS-V0500 is a crash at CACHE$RESUME_C+00AC0 in SYS$XFCACHE+offset.
If this really is the crash you've experienced on your machine, it's either another problem leading to a crash at the same location or the patch did not work (under all circumstances).

Feel free to post the output of CLUE CRASH and I may be able to tell you, if it's the same problem.

Volker.
Mohamed  K Ahmed
Trusted Contributor

Re: System Crash, Patch allready downloaded

Somehow, the clue crash does not work, I used the command ana/crash sysdump.dmp and it says that the dump file was 1981582 blocks too small when dump written; analysis may not be possible.

In the errorlog file, it doesn't say anything about the XFC cache, so may be it is another problem.

I will call HP and try to get something to them so they can analyse

Mohamed
Volker Halle
Honored Contributor

Re: System Crash, Patch allready downloaded

Mohamed,

you can get something out of the errlog system bugcheck entry: Obtain the PC value from the bugcheck errlog entry and do ANAL/SYS then SDA> EXA/INS - this will at least tell you which module/execlet the crash was in. ASSERTFAIL crashes are only to be seen in SYS$XFCACHE. This works for inline crashes (not for exception-related ones) and assumes that the execlet is the same in the dump and in the running system (no patches installed and activated during reboot).

You should make sure, that you have a dump file, which is big enough for your system memory configuration. Make sure you have set DUMPSTYLE bits 0 (selective dump) and 3 (compressed dump) in MODPARAMS.DAT (default for DUMPSTYLE is 9). Then run @AUTOGEN GETDATA TESTFILES to see what size of SYSDUMP.DMP AUTOGEN would suggest (make sure there is no DUMPFILE=0 entry in MODPARAMS.DAT).

Without a valid CLUE file or dump, even HP cannot help you.

Volker.
Mohamed  K Ahmed
Trusted Contributor

Re: System Crash, Patch allready downloaded

Volker,

you were right
SYSTEM=>ana/sys

OpenVMS (TM) Alpha system analyzer

SDA> exa/ins FFFFFFFF8031C508
SYS$XFCACHE+02508: LDL R17,#X0064(R3)
SDA> exit

Regarding the modparams.dat file, I found dumpstyle = 0 and
dumpstyle=1

so I changed to just one line dumpstyle=9

Mohamed

Volker Halle
Honored Contributor

Re: System Crash, Patch allready downloaded

Mohamed,

we're getting there...

If you do SDA> EXA/INS 8031C508-10;10 you can even absolutely verify, that there is a BUGCHK instruction at SYS$XFCACHE+02504, which has taken down your system.

Looks like this crash is in routine XfcExtentReadUnpin - this is the only BUGCHK at low addresses in SYS$XFCACHE which is followed by the instruction shown in your previous reply.

Volker.
Mohamed  K Ahmed
Trusted Contributor

Re: System Crash, Patch allready downloaded

SYSTEM=>ana/sys

OpenVMS (TM) Alpha system analyzer

SDA> EXA/INS 8031C508-10;10
SYS$XFCACHE+024F8: BSR R26,#X000559
SYS$XFCACHE+024FC: LDL R0,#XFEF0(R2)
SYS$XFCACHE+02500: BIS R0,#X05,R16
SYS$XFCACHE+02504: BUGCHK
SYS$XFCACHE+02508: LDL R17,#X0064(R3)
SDA>

So it is really a bug check and it should have been corrected by the patch but it didn't ????
Volker Halle
Honored Contributor
Solution

Re: System Crash, Patch allready downloaded

Mohamed,

this is an inline bugcheck (one of many within XFC). XFC detected some internal inconsistecny and declared a bugcheck to take down the system to give XFC engineering a chance to diagnose and solve the underlying problem. However, without a valid dump, there is no hope that the problem could be determined...

This is NOT the problem described in VMS731_SYS-V0500 (note that the SYS$XFCACHE offset is quite different).

Crashes is XFC are typically adressed by XFC patches, the most recent one for V7.3-1 is VMS731_XFC-V0300, is that patch installed ? Check with SDA> SHOW EXEC SYS$XFCACHE for a link date of 24-JUN-2004

Volker.
Volker Halle
Honored Contributor

Re: System Crash, Patch allready downloaded

Mohamed,

this problem has been seen before. If you call HP, ask them for the solution for QXCM1000087908 - the answer may just be: install VMS731_XFC-V0300...

Volker.
Martin P.J. Zinser
Honored Contributor

Re: System Crash, Patch allready downloaded

Hello Mohamed,

Volker gave you very sound advise. The first
thing you have to make sure is that you are
able to take a valid crash the next time you
are having a problem. With a valid dump hp has very good chances to find out what went wrong in detail, without things are much more difficutlt. If you do have the opportunitiy, try to test the new dumpstyle settings (mcr opccrash can be used to force a crash).

Having said that, and given that you do experienced an inline bugcheck, I still suggest to open a call with hp. Since an inline bugcheck is a deliberate decision by the programmer to respond to a specific corruption, someone with access to the source should be able to at least determine <> went wrong and maybe can guess why...

Greetings, Martin

Mohamed  K Ahmed
Trusted Contributor

Re: System Crash, Patch allready downloaded

Volker,
I checked the link date and it was Oct 2003.
The latest update patch I installed had the XFC V200. I guess I am going to install the XFC V 300 then I will monitor the system if it happened again.
At least now I know what to do if it happened again.

Thanks for your help

Mohamed
Volker Halle
Honored Contributor

Re: System Crash, Patch allready downloaded

Mohamed,

the crash case mentioned in my previous reply also had happened with VMS731_XFC-V0200 installed. If you really want to make sure, that installing VMS731_XFC-V0300 solves this problem, log a call with HP and just ask for the solution for that escalated problem.

I assume you did run AUTOGEN up to the SETPARAMS phase after setting DUMPSTYLE=9 in MODPARAMS.DAT to make sure, that the system parameter gets set in ALPAVMSSYS.PAR and a sufficiently large SYS$SYSTEM:SYSDUMP.DMP had been created. You need to reboot the system to make use of the newly created/extended dump file size, but this can be combined with the reboot after installing the XFC patch.

Volker.
Jeff Chisholm
Valued Contributor

Re: System Crash, Patch allready downloaded

Mohamed,

Hopefully you've got a system dumpfile to look at.

This is a new bugcheck in OpenVMS Alpha V7.3 for tracking problems in the eXtended File Cache (XFC). When the XFC code detects an
inconsistancy, it invokes the XFC_ASSERT macro, which calls the XFC_BUGCHECK routine. This is the only place where an ASSERTFAIL bugcheck is taken.

Look at the stack for the preserved value of R16 and in register RA (R26) for the address from where XFC_BUGCHECK has been called.

The important symptom of the problem is the calling routine name and offset in register RA.

Log a case to the HP customer support center. We'll be happy to have a look at this and get you fixed up.

Regards,
Jeff Chisholm
VMS Internals Support
Colorado Springs
le plus ca change...
Volker Halle
Honored Contributor

Re: System Crash, Patch allready downloaded

Jeff,

the XFC_ASSERT macro has been changed (in versions of XFC later than V7.3 SSB) to generate a true inline BUGCHK code, which makes analyzing ASSERTFAIL crashes much easier, as the offset in SYS$XFCACHE directly points to the XFC_ASSERT macro invocation and you don't have to use R26 (RA) anymore to find the caller of the XFC_BUGCHECK routine.

Volker.
Mohamed  K Ahmed
Trusted Contributor

Re: System Crash, Patch allready downloaded

Closing the thread