Operating System - OpenVMS
1827458 Members
5745 Online
109965 Solutions
New Discussion

ERRFMT Crashes - V8.2 Alpha

 
SOLVED
Go to solution
Troodon
Frequent Advisor

ERRFMT Crashes - V8.2 Alpha

When the system is booted, ERRFMT loops opening new ERRLOG.SYS files, It creates 10 of them and then crashes, mailing SYSTEM and leaving a .DMP file. The only message is "break on unhandled exception preceding SHARE$ERRFMT+142844".

I think this began after installing the patches F11X V1.0, INSTAL V1.0 and LOADSS V1.0 in April.

I wonder if anyone else is having this problem.
13 REPLIES 13
Ian Miller.
Honored Contributor

Re: ERRFMT Crashes - V8.2 Alpha

and what do hp say about this problem?
____________________
Purely Personal Opinion
Jeff Chisholm
Valued Contributor

Re: ERRFMT Crashes - V8.2 Alpha

We used to get a fair number of these back in the days of V5.n Have not seen it in ages though. Here's what I wrote up at the time, I think it covers all of the bases. Try the workaround I've laid out for you here and report back on the results please. /jeff

"RMS-F-RSZ" Error When ERRFMT Fails At Startup

COMPONENT: ANALYZE/ERROR Utility

SOURCE: Customer Support Center / USA


SYMPTOM:

During VMS startup after a reboot, the error formatter process ERRFMT
fails to start. VMS displays the following error messages on the
console:

%%%%%%%%%%% OPCOM %%%%%%%%%%%
Message from user SYSTEM on XXXXXX
ERRFMT - ERROR ACCESSING ERROR LOG FILE
%RMS-F-RSZ, invalid record size

%%%%%%%%%%% OPCOM %%%%%%%%%%%
Message from user SYSTEM on XXXXXX
ERRFMT - DELETING ERRFMT PROCESS
ERROR LOG FILE UNWRITEABLE
TO RESTART ERRFMT PROCESS, USE "@SYS$SYSTEM:STARTUP ERRFMT"


GENERAL ANALYSIS:

When a VMS system is shut down or crashes, VMS copies the error log
buffers from memory to disk. The buffers are copied to the system
dumpfile, SYS$SYSTEM:SYSDUMP.DMP, if the file exists. If it doesn't,
VMS writes the buffers to the primary pagefile on the system disk,
SYS$SYSTEM:PAGEFILE.SYS, if that file exists.

On reboot, the SYSINIT process looks for the SYSDUMP.DMP and the
PAGEFILE.SYS file on the system disk, and copies these error log
buffers from the first file it finds into the new error log buffers in
memory. Later, when STARTUP starts the ERRFMT process, the error log
buffers are copied from memory to the error log file,
SYS$ERRORLOG:ERRLOG.SYS.

If SYSINIT reads invalid data, either because the data is corrupt or
the file is new and has never had valid error log information written
to it, then the ERRFMT process aborts with the RSZ error when it tries
to read the data and copy it to the ERRLOG.SYS file.

The data can be invalid for either of these reasons:


ANALYSIS #1:

The data is corrupt; it may have been corrupt when copied or have been
corrupted later. This can occur if the disk has hardware errors.


SOLUTION #1:

Create a new version of the file, either SYSDUMP.DMP or PAGEFILE.SYS,
and reboot. When you create a new file, you no longer use the same
disk blocks, which may have become corrupt.

Create a file of the same size as the old one, but with the next
higher revision number. This example shows how to create a new
dumpfile; to create a new pagefile, substitute the filename
PAGEFILE.SYS.

$ DIR SYS$SYSTEM:SYSDUMP.DMP/SIZE=ALL ! get size/version number

SYSDUMP.DMP;n x blocks

$ RUN SYS$SYSTEM:SYSGEN
SYSGEN> CREATE SYS$SYSTEM:SYSDUMP.DMP;n+1 /SIZE = x
SYSGEN> EXIT

Reboot, so that VMS can identify the new file as the dump file. Then
purge the older version of the file.

CAUTION: Do not remove the old version until AFTER you reboot.
Otherwise further data corruption may occur.

If NO errors are reported for the disk, and if you already had a valid
dumpfile or pagefile, reboot the system again.


ANALYSIS #2:

VMS had not identified the file as a dumpfile before the previous
system boot. For example, if you create a new dumpfile, you must
shutdown and reboot for VMS to recognize it as a dumpfile. When you
shut down, VMS does not yet recognize the new dumpfile and so cannot
write the buffers to it. When you reboot, VMS does map the dumpfile
and so tries to read the buffers from it. Because the buffers are not
in the file, VMS just reads whatever random bit patterns are at the
buffer location. The random data often triggers the RSZ error.


SOLUTION #2:

If you created a new file, either SYSDUMP.DMP or PAGEFILE.SYS, and
rebooted so that VMS could map it as a dumpfile, then reboot again.
During this next shutdown, VMS writes valid data to the file.


ANALYSIS #3:

The dumpfile is restored from a backup, either as an individual file
or as part of restoring an image backup. The dumpfile is usually
marked "Backups disabled", as seen in a DIRECTORY/FULL of the file.
BACKUP does not save the file contents of such a file, but it does
save the file header information. During a restore operation, BACKUP
re-creates this file based on the information in the file header,
allocating enough disk space so the file is the same size as before.
The allocated disk blocks do not contain dumpfile information, so the
error formatter may not be able to read them, and it fails with the
RSZ error.

SOLUTION $3:

Reboot twice after restoring the dumpfile. During the first reboot,
VMS maps the file as a dumpfile. During the second shutdown VMS
writes valid data to the file, so the error formatter can read it
during the second reboot.


ANALYSIS #4:

The disk might have hardware errors.

SOLUTION #4:

If the error persists, examine the disk for hardware errors:

$ SHOW ERROR

Call your Field Service Representative to further investigate any
reported errors.
le plus ca change...
Volker Halle
Honored Contributor

Re: ERRFMT Crashes - V8.2 Alpha

Troodon,

you have the ERRFMT.DMP process dump, you can analyze it with $ ANAL/PROC ERRFMT.DMP. Please also look at possible OPCOM messages issued from ERRFMT.

The article given by Jeff does not include the reference to SYS$ERRLOG.DMP - which is being used in more recent versions of OpenVMS (since V7.1) to save the errlog buffers. You may want to try re-creating that file as well.

Is your system disk shadowed ?

Volker.
Volker.
Jeff Chisholm
Valued Contributor

Re: ERRFMT Crashes - V8.2 Alpha

Thanks Volker, you are quite correct.
le plus ca change...
Troodon
Frequent Advisor

Re: ERRFMT Crashes - V8.2 Alpha

Recreated SYSDUMP, PAGEFILE & SYS$ERRLOG with no effect on the error.

The system disk is not shadowed and there are no errors on the device or controller.

The hardware is standard.

The error in ANALYZE/PROCESS is as I indicated in the original post; it is not "invalid record size", it is "unhandled exception".

If I had to hazard a guess based on the behavior, I would think it is having trouble with writing ERRLOG.SYS after opening it, so it tries to create a new one and loops on failure.
Jeff Chisholm
Valued Contributor

Re: ERRFMT Crashes - V8.2 Alpha

I talked this over with a few other people, It's not a common issue, please log a call to your CSC. Regards, Jeff
le plus ca change...
Volker Halle
Honored Contributor

Re: ERRFMT Crashes - V8.2 Alpha

Troodon,

ANAL/PROCESS should show you the exception, which caused ERRFMT to write an image dump. What is it ? What's the current PC and instruction stream ?

Could you post the ANAL/PROC output and then

DBG>SDA
SDA> EXA/INS @PC-20;30
SDA> SHOW PROC/CHAN
SDA> EXIT
DBG> EXIT

This must be some very specific HW/SW condition. Did you run @AUTOGEN and did SYS$ERRLOG.DMP get sized correctly based on the system parameters ERRLOGBUFFERS * ERLBUFFERPAGES ?

Volker.
Troodon
Frequent Advisor

Re: ERRFMT Crashes - V8.2 Alpha

DBG> exa pc
0\%PC: 208380
DBG> exa/ins @pc-20:@pc+30
SHARE$ERRFMT+142824: LDA R16,#X04FC(R31)
SHARE$ERRFMT+142828: LDAH R17,#X000F(R31)
SHARE$ERRFMT+142832: BIS R31,#X02,R25
SHARE$ERRFMT+142836: LDQ R27,#X0090(R2)
SHARE$ERRFMT+142840: JSR R26,(R26)
SHARE$ERRFMT+142844: LDQ R26,#X0068(R2)
SHARE$ERRFMT+142848: LDQ R16,#X0058(R2)
SHARE$ERRFMT+142852: BIS R31,R31,R17
SHARE$ERRFMT+142856: BIS R31,R31,R18
SHARE$ERRFMT+142860: LDQ R27,#X0070(R2)
SHARE$ERRFMT+142864: BIS R31,#X03,R25
SHARE$ERRFMT+142868: JSR R26,(R26)
SHARE$ERRFMT+142872: BIS R31,R0,R3

SDA> exa/ins @pc
%SDA-E-NOREAD, unable to access location FFFFFFFF.FFFF0000
SDA> exa/ins @pc-20;30
%SDA-E-NOREAD, unable to access location FFFFFFFF.FFFEFFE0
SDA>

Channel CCB Window Status Device/file accessed
------- --- ------ ------ --------------------
0010 7FF7C000 00000000 SIREN$DKA0:
0020 7FF7C020 81CCDF40 SIREN$DKA0:(772,22,0)
0030 7FF7C040 81CA6C40 SIREN$DKA0:(2182,6,0) (section file)
0040 7FF7C060 81CA6BC0 SIREN$DKA0:(9435,6,0) (section file)
0050 7FF7C080 81CA9EC0 SIREN$DKA0:(9552,14,0) (section file)
0060 7FF7C0A0 81CA9940 SIREN$DKA0:(8495,11,0) (section file)
0070 7FF7C0C0 81CA87C0 SIREN$DKA0:(8437,9,0) (section file)
0080 7FF7C0E0 81CA7BC0 SIREN$DKA0:(6726,10,0) (section file)
0090 7FF7C100 81CA58C0 SIREN$DKA0:(6740,12,0) (section file)
00A0 7FF7C120 81E081C0 SIREN$DKA0:(8271,8,0)
00B0 7FF7C140 81CA8EC0 SIREN$DKA0:(8232,13,0) (section file)
00C0 7FF7C160 81E069C0 SIREN$DKA0:(2455,21,0)
00D0 7FF7C180 81CB2A40 SIREN$DKA0:(2401,6,0) (section file)
00E0 7FF7C1A0 81CB1F40 SIREN$DKA0:(2382,6,0) (section file)
00F0 7FF7C1C0 81CB25C0 SIREN$DKA0:(7386,11,0) (section file)

Total number of open channels : 15.
SDA>

(If I look at the channels with SDA when it's running, they are:
0010 7FF7C000 00000000 SIREN$DKA0:
0020 7FF7C020 81E4FDC0 SIREN$DKA0:[VMS$COMMON.SYSEXE]ERRFMT.EXE;1
0030 7FF7C040 81CA6C40 SIREN$DKA0:[VMS$COMMON.SYSLIB]LIBOTS.EXE;1 (section file)
0040 7FF7C060 81CA6BC0 SIREN$DKA0:[VMS$COMMON.SYSLIB]LIBRTL.EXE;1 (section file)
0050 7FF7C080 81CA9EC0 SIREN$DKA0:[VMS$COMMON.SYSLIB]DECC$SHR.EXE;1 (section file)
0060 7FF7C0A0 81CA9940 SIREN$DKA0:[VMS$COMMON.SYSLIB]DPML$SHR.EXE;1 (section file)
0070 7FF7C0C0 81CA87C0 SIREN$DKA0:[VMS$COMMON.SYSLIB]CMA$TIS_SHR.EXE;1 (section file)
0080 7FF7C0E0 81CA7BC0 SIREN$DKA0:[VMS$COMMON.SYSLIB]MAILSHR.EXE;1 (section file)
0090 7FF7C100 81CA58C0 SIREN$DKA0:[VMS$COMMON.SYSLIB]MAILSHRP.EXE;1 (section file)

Total number of open channels : 9.

%AUTOGEN-I-BEGIN, TESTFILES phase is beginning.

+---+

Calculations for page, swap, and dump files.
--------------------------------------------

Errorlog dumpfile calculations:

No errorlog dump file modifications should be made.
Errorlog dumpfile will remain at 42 blocks.
Troodon
Frequent Advisor

Re: ERRFMT Crashes - V8.2 Alpha

Although this is pasted as part of the question, here's the output from ANA/PROC.

Siren$ ana/proc sys$errorlog:errfmt.dmp

OpenVMS Alpha Debug64 Version V8.2-017


%DEBUG-I-NODSTS, no Debugger Symbol Table: no DSF file found and
-DEBUG-I-NODSTIMG, no symbols in DISK$SIREN:[VMS$COMMON.SYSEXE]ERRFMT.EXE;1
%DEBUG-I-NOLOCALS, image does not contain local symbols
%DEBUG-I-NOGLOBALS, some or all global symbols not accessible
%SYSTEM-F-IMGDMP, dynamic image dump signal at PC=00000000000F0000, PS=00032DFC
-RUF-W-NORMAL, normal successful completion
break on unhandled exception preceding SHARE$ERRFMT+142844
Volker Halle
Honored Contributor

Re: ERRFMT Crashes - V8.2 Alpha

Troodon,

I'm currently on vacation with limited internet access. Except a more detailled answer after 26-JUN-2006.

Sorry,

Volker.
Kenneth G Lang
New Member
Solution

Re: ERRFMT Crashes - V8.2 Alpha

I have experienced the same issues with both 8.2 and 8.3 during my engineering testing. HP engineering concluded the problem was with the logical DECC$FILE_SHARING "ENABLE" being set at the SYSTEM Level in sys$manager:SYLOGICALS.com.

They relinked the ERRFMT.exe file with noshare for both 8.2 and 8.3. I did tests on both and the problem is resolved. (another workaround is set the logical at the process level, vs. System level).

There will be an ECO update released later this year for 8.2, and I believe the fix will be encorporated into the production release of 8.3, but do not quote me on that.

We logged these in the PTR system for HP engineering.

cheers,

Ken
Troodon
Frequent Advisor

Re: ERRFMT Crashes - V8.2 Alpha

That would seem to be *the* answer.

I have the share logical enabled for another application which was ported from opensource.

I will check this out later and report back!

W
Troodon
Frequent Advisor

Re: ERRFMT Crashes - V8.2 Alpha

That solved the problem.

Thanks, all.

I'm closing the thread and assigning points.