1751765 Members
4865 Online
108781 Solutions
New Discussion юеВ

Re: bad dumpfile

 
SOLVED
Go to solution
Volker Halle
Honored Contributor

Re: bad dumpfile

Willem,

to confirm whether the system has crashed, try to look at ERRLOG.SYS first. If SYS$ERRLOG.DMP on the system disk could be written during the crash - it will be written BEFORE writing SYSDUMP.DMP - you'll find the crash entry in ERRLOG.SYS. SYS$ERRLOG.DMP is written via the boot path (BOOTED_DEV or via the entries in DUMP_DEV).
If you find no crash entry, consider to connect something to the console line to capture the console output, when this reoccurs. Also consider a possible power-failure causing a reboot without a crash entry !

You can look at the definitions of DUMP_DEV with F$GETENV("DUMP_DEV") in the running system.

Did this system EVER write something to SYSDUMP.DMP (try DUMP...) ?

Volker.
John Eerenberg
Valued Contributor

Re: bad dumpfile

Willem,

Let me know if you want the crash dump copy during boot. I have some details about when CLUE starts and systartup_vms (they start almost at the same time and run in parallel). So I stall systartup_vms based on CLUE startup's activities and snag the crash dump as soon in the boot as it gets. Had to do that when one system kept crashing and it was (at that time) impossible to get a proper copy.

The details for setting up SYS$Errorlog.dmp, sysgen, etc. are not real obvious. Let me know if you have questions.

john
It is better to STQ then LDQ
Willem Grooters
Honored Contributor

Re: bad dumpfile

Back home from vacation so I could access my system directly.
I wanted to do some examination and following your suggestions, and again, ran into a problem, but now I have at least a clue what's wrong.

I tried to login and that switched the monitor to console showing a message message (System Service Exception), and a partial dump is written (so tells me the console - including expanding dot-strip, so I bet SYSDUMP.DMP is written...) However, the message is NOT written in OPERATOR.LOG.

Stay tuned for more information.
Willem Grooters
OpenVMS Developer & System Manager
Volker Halle
Honored Contributor

Re: bad dumpfile

Willem,

when the system crashes, output ONLY goes to the physical console terminal (OPA0:).

Now check SYS$MANAGER:CLUE$STARTUP.LOG when the system comes up after the crash for any error messages from reading the dump file. OR simply $ TYPE CLUE$HISTORY to see the most recent system crash information (1 line per crash - 132 columns wide).

Volker.
David B Sneddon
Honored Contributor

Re: bad dumpfile

Willem,

You said...<
I tried to login and that switched the monitor to console showing a message message (System Service Exception), and a partial dump is written (so tells me the console - including expanding dot-strip, so I bet SYSDUMP.DMP is written...) >

Is it a "partial" dump i.e. incomplete?
If it is in fact incomplete then you will
have problems analyzing it.

Dave
Volker Halle
Honored Contributor

Re: bad dumpfile

Dave,

a partial (or selective) memory dump is controlled by parameter DUMPSYTLE bit 0 (0=full dump, 1=selective dump).

The output for a selective dump looks like this on the console terminal:

**** OpenVMS I64 Operating System E8.2 - BUGCHECK ****

** Bugcheck code = 00000965: DEBUGCRASH, Debugger forced system crash
** Crash CPU: 00 Primary CPU: 00 Active CPUs: 00000001
** Current Process = NULL
** Current PSB ID = 00000001
** Image Name =

**** Starting compressed selective memory dump at 10-AUG-2004 07:54
.............................................................................................................
** System space, key processes, and key global pages have been dumped.
** Now dumping remaining processes and global pages...
.........................................................
...Complete ****


Volker.
David B Sneddon
Honored Contributor

Re: bad dumpfile

Volker,

I am aware of "selective" dumps and all the
documentation uses the word "selective" and
not "partial" -- I assumed that the use of
the word "partial" was indicative of an
incomplete (not selective) dump.

Dave
Willem Grooters
Honored Contributor

Re: bad dumpfile

Had a problem accessing yesterday....

Just some answers where not yet given:

Wim - Too small: could be. It's 5000 blocks (by heart, will check) for a 512MB machine. No message in OPERATOR.LOG but on console, it says it did write dump, but not whether it has been complete ("writing partially dump" - for what I've seen). But latest AUTOGEN did not suggest an increase so I kept
it as it was.

(Given above response on "partial dump" it might well be the problem)

Mobeen - this is an option, but I don't think the disk is the problem: 50% free space (about)

Helmuth - Not from the console, but I checked all disks attached to the system and only DKA0 (the system disk) holds SYSDUMP.DMP. It's not a member of a shadowset.

Volker - One of the things to look at. In the early days, it DID write a system dump (7.3-1, before squeezing clustersize and upgrading to 7.3-2). I'll check the variables and logicals

John - In case of a system error, I definitely want the dumpfile, just to see what was wrong. Be it hardware (machinecheck (given the current temperatures not really impossible) or software. As far as I can see, CLUE is used to analyze the dumpfile (there is a CLUE logfile stating failure
of analysis) but copying it to another location is a good idea - whenever I
feel the need to analyze it, it will be present (good excersise ;-))

Volker - CLUE logging just says the sysdump does not contain an OpenVMS ALPHA dump and that the blocksize is invalid.

Willem
Willem Grooters
OpenVMS Developer & System Manager
Volker Halle
Honored Contributor

Re: bad dumpfile

Willem,

5000. blocks is way too small ! For an 128 MB system with selective, compressed dumps, we have a dumpfile of 83000 blocks. Remove any DUMPFILE=n or DUMPSTYLE=n statements from MODPARAMS.DAT and re-run @SYS$UPDATE:AUTOGEN GETDATA TESTFILES NOFEEDBACK and see what AUTOGEN would suggest. DUMPSTYLE=9 is the default for OpenVMS Alpha (selective and compressed).

If you have DUMPSTYLE=0 and the dumpfile is too small, you get:

** Dumping memory to HBVS unit 300
**** Starting full memory dump at 10-AUG-2004 09:01...
................................................................................
..
**** Memory dump incomplete - dump file is 178565 blocks too small

Still I don't see the word 'partial' in there.

Did you try a DUMP/BL=COUNT=1 SYSDUMP.DMP to see if anything had been written to the file at all ?

Volker.

Willem Grooters
Honored Contributor

Re: bad dumpfile

Surprise,surprise.

The last crash was last night and that gave me some valid data in SYSDUMP.DMP. However, it could not yet been analyzed. See attachement for details on CLUE logging, data on SYSDUMP.DMP. I also added SYSGEN parameters DUMP*.

I looked at CLU$HISTORY, there is 7.3 and 7.3-1 data in it, but not any data exists for VMS7.3-2, and indeed (as observerved thanks to Volkert's suggestions to look in SYSERR.LOG) some have occurred (Machine checks, mainly, in the last (too hot) days).

I followed Volkert's advice to run Autgen (no DUMP parameters in modparams.dat) and that tells me:

Dump file calculations:

A 206329 block dump file should be created.

this is smaller than the dumpfile now on disk! That file has a tremendous amount of retrieval headers, I estamet one for each block. I stopped DUMP/HEADER on this file after the first block of data.

I have no DUMP_DEV environment variable, not in VMS nor in SRM.

Anyway, how to craete a new SYSDUMP.DMP that fits my (and the system's) needs?

Willem
Willem Grooters
OpenVMS Developer & System Manager