1824806 Members
4039 Online
109674 Solutions
New Discussion юеВ

bad dumpfile

 
SOLVED
Go to solution
Willem Grooters
Honored Contributor

bad dumpfile

VMS7.3-2:

Had a crash and wanted to analyse it, but SDA cannot open the dumpfile:

$ ana/crash sysdump.dmp

OpenVMS (TM) system dump analyzer
%SDA-E-NOTALPHADUMP, dump file does not contain an OpenVMS Alpha dump

Same error was in CLUE log generated after booting from this crash.

Looked in HELP and found /OVERRIDE switch, but that didn't get me any further:

$ ana/crash sysdump.dmp/over

OpenVMS (TM) system dump analyzer
%SDA-W-NOTALPHADUMP, dump file does not contain an OpenVMS Alpha dump
%SDA-W-DUMPEMPTY, dump file contains no valid dump
%SDA-W-INCDUMPFORM, dump file format incompatible with this version of SDA
...analyzing an Alpha full memory dump in override mode...

%LIB-F-BADBLOSIZ, bad block size

I presume I need to create a new dumpfile, so do I need to crash the machine for that?

Willem
Willem Grooters
OpenVMS Developer & System Manager
29 REPLIES 29
John Gillings
Honored Contributor

Re: bad dumpfile

Willem,

Was this system rebooted between the crash and the attempt to analyze the dump?

It looks like exactly what the error says, there's no dump information to analyze.

You don't need to create a new dump file. The file itself can't be "corrupt" - it's just a bunch of blocks. If the system crashes, the dump will be written to the file, and should be analyzable. HOWEVER, you MUST analyze it or use the SDA COPY command to copy it elsewhere before rebooting the system again. A "normal" reboot will invalidate any dump information.

If the dump has been lost, perhaps you have console logs? These may be sufficient to identify the cause of the bugcheck.

Maybe this makes it a bit clearer:

BUGCHECK
BOOT
dump can be analyzed or COPYd
REBOOT
dump is now invalid

A crucible of informative mistakes
Wim Van den Wyngaert
Honored Contributor
Solution

Re: bad dumpfile

Willem,

Perhaps your dump file is too small (e.g. due to upgrades in which the dump file size wasn't enlarged). If you have a log of the console you can check if it dumped ...

WIm
Wim
Willem Grooters
Honored Contributor

Re: bad dumpfile

Hi John.

Quite obviously a boot: after crash! I was triggered to investigate by the contents of a CLUE logfile - created during that boot.

The requirement to copy the dumpfile fires another question.
AFAIK, it's usual to check for a crash on (re)boot. I didn't add anything (yet) to do this, but there IS a CLUE logfile on my system, so I assume something is built-in somewhere. This does raise the question when SYS$SYSTEM:SYSDUMP.DBP is 'invalidated'. It must be AFTER the investigation in the startup procedure. Given the output in the logfile, this is not safe enough!
What would happen if the system stopped for another reason (power loss, for instance). In that case there will be no SYSDUMP written, and SYSDUMP.DMP may be invalid - as stated. (This could well have happend, which would explain the situation)

Willem
Willem Grooters
OpenVMS Developer & System Manager
Wim Van den Wyngaert
Honored Contributor

Re: bad dumpfile

Willim & John,

Just a stupid question (don't have that many crashes over here).

Why is a dump invalidated during a normal boot ? Couldn't it stay valid until the next crash ?

Can I rename the dump and do re-create ?

Wim
Wim
Ian Miller.
Honored Contributor

Re: bad dumpfile

I think a crash dump may be invalidated on a normal shutdown not by startup
____________________
Purely Personal Opinion
Mobeen_1
Esteemed Contributor

Re: bad dumpfile

Willem,
Looks like you need to define a new dump device (try to do this to a different disk/place) and then the only option of generating a new dump is by bringing down the machine and while at boot prompt (>>>) issue a Ctrl+P

rgds
Mobeen
Helmut Ammer
Occasional Advisor

Re: bad dumpfile

Hello Willem,

do you know from the console log what physical device the dump was written to.

Is your dump file located on the system disk?

Is the dump disk a shadowset?
If yes check pp 3-6 in Alpha System Analysis Tools Manual (V7.3-1 AA-REZTC-TE).

Regards
Helmut
Volker Halle
Honored Contributor

Re: bad dumpfile

Willem,

starting with OpenVMS Alpha V7.1, system dumpfiles are NOT invalidated anymore during normal shutdowns. The errlog-buffers are written to SYS$SYSTEM:SYS$ERRLOG.DMP, so SYSDUMP.DMP is not written to during shutdown at all.

Try $ DUMP/BL=COU=1 SYS$SYSTEM:SYSDUMP.DMP to see if anything has been written at all.
If the system disk can be accessed via multiple pathes, all of them may need to be added to the DUMP_DEV console environment variable.

You should also be able to at least find the bugchecks entry with $ ANAL/ERR/ELV TRANSLATE /SIN=.../BEFORE=... - the errlog entries are read from SYS$ERRLOG.DMP during startup and added to ERRLOG.SYS

As of V7.3-2, SDA should be able to correctly find dump information on shadowed system disks.

Volker.
Willem Grooters
Honored Contributor

Re: bad dumpfile

Currently I'm far, far way from the server so I am not able to try out the suggestions.
However:
When cheking the status (Luckily I can access it), I found it restarted one night. When trying to see why, I got the very same message.
Today, attempting to run a commandprocedure, the connection appeared hung - but yes: the machine had indeed rebooted - presumably because of a crash. ANALYZE/CRASH however, which is in the startup-sequence somewhere, revealed exactly the same problem; a new CLUE-logfile has been created stating the very same message.
So it seems I have no ability to find out what was wrong.
Willem Grooters
OpenVMS Developer & System Manager
Volker Halle
Honored Contributor

Re: bad dumpfile

Willem,

to confirm whether the system has crashed, try to look at ERRLOG.SYS first. If SYS$ERRLOG.DMP on the system disk could be written during the crash - it will be written BEFORE writing SYSDUMP.DMP - you'll find the crash entry in ERRLOG.SYS. SYS$ERRLOG.DMP is written via the boot path (BOOTED_DEV or via the entries in DUMP_DEV).
If you find no crash entry, consider to connect something to the console line to capture the console output, when this reoccurs. Also consider a possible power-failure causing a reboot without a crash entry !

You can look at the definitions of DUMP_DEV with F$GETENV("DUMP_DEV") in the running system.

Did this system EVER write something to SYSDUMP.DMP (try DUMP...) ?

Volker.
John Eerenberg
Valued Contributor

Re: bad dumpfile

Willem,

Let me know if you want the crash dump copy during boot. I have some details about when CLUE starts and systartup_vms (they start almost at the same time and run in parallel). So I stall systartup_vms based on CLUE startup's activities and snag the crash dump as soon in the boot as it gets. Had to do that when one system kept crashing and it was (at that time) impossible to get a proper copy.

The details for setting up SYS$Errorlog.dmp, sysgen, etc. are not real obvious. Let me know if you have questions.

john
It is better to STQ then LDQ
Willem Grooters
Honored Contributor

Re: bad dumpfile

Back home from vacation so I could access my system directly.
I wanted to do some examination and following your suggestions, and again, ran into a problem, but now I have at least a clue what's wrong.

I tried to login and that switched the monitor to console showing a message message (System Service Exception), and a partial dump is written (so tells me the console - including expanding dot-strip, so I bet SYSDUMP.DMP is written...) However, the message is NOT written in OPERATOR.LOG.

Stay tuned for more information.
Willem Grooters
OpenVMS Developer & System Manager
Volker Halle
Honored Contributor

Re: bad dumpfile

Willem,

when the system crashes, output ONLY goes to the physical console terminal (OPA0:).

Now check SYS$MANAGER:CLUE$STARTUP.LOG when the system comes up after the crash for any error messages from reading the dump file. OR simply $ TYPE CLUE$HISTORY to see the most recent system crash information (1 line per crash - 132 columns wide).

Volker.
David B Sneddon
Honored Contributor

Re: bad dumpfile

Willem,

You said...<
I tried to login and that switched the monitor to console showing a message message (System Service Exception), and a partial dump is written (so tells me the console - including expanding dot-strip, so I bet SYSDUMP.DMP is written...) >

Is it a "partial" dump i.e. incomplete?
If it is in fact incomplete then you will
have problems analyzing it.

Dave
Volker Halle
Honored Contributor

Re: bad dumpfile

Dave,

a partial (or selective) memory dump is controlled by parameter DUMPSYTLE bit 0 (0=full dump, 1=selective dump).

The output for a selective dump looks like this on the console terminal:

**** OpenVMS I64 Operating System E8.2 - BUGCHECK ****

** Bugcheck code = 00000965: DEBUGCRASH, Debugger forced system crash
** Crash CPU: 00 Primary CPU: 00 Active CPUs: 00000001
** Current Process = NULL
** Current PSB ID = 00000001
** Image Name =

**** Starting compressed selective memory dump at 10-AUG-2004 07:54
.............................................................................................................
** System space, key processes, and key global pages have been dumped.
** Now dumping remaining processes and global pages...
.........................................................
...Complete ****


Volker.
David B Sneddon
Honored Contributor

Re: bad dumpfile

Volker,

I am aware of "selective" dumps and all the
documentation uses the word "selective" and
not "partial" -- I assumed that the use of
the word "partial" was indicative of an
incomplete (not selective) dump.

Dave
Willem Grooters
Honored Contributor

Re: bad dumpfile

Had a problem accessing yesterday....

Just some answers where not yet given:

Wim - Too small: could be. It's 5000 blocks (by heart, will check) for a 512MB machine. No message in OPERATOR.LOG but on console, it says it did write dump, but not whether it has been complete ("writing partially dump" - for what I've seen). But latest AUTOGEN did not suggest an increase so I kept
it as it was.

(Given above response on "partial dump" it might well be the problem)

Mobeen - this is an option, but I don't think the disk is the problem: 50% free space (about)

Helmuth - Not from the console, but I checked all disks attached to the system and only DKA0 (the system disk) holds SYSDUMP.DMP. It's not a member of a shadowset.

Volker - One of the things to look at. In the early days, it DID write a system dump (7.3-1, before squeezing clustersize and upgrading to 7.3-2). I'll check the variables and logicals

John - In case of a system error, I definitely want the dumpfile, just to see what was wrong. Be it hardware (machinecheck (given the current temperatures not really impossible) or software. As far as I can see, CLUE is used to analyze the dumpfile (there is a CLUE logfile stating failure
of analysis) but copying it to another location is a good idea - whenever I
feel the need to analyze it, it will be present (good excersise ;-))

Volker - CLUE logging just says the sysdump does not contain an OpenVMS ALPHA dump and that the blocksize is invalid.

Willem
Willem Grooters
OpenVMS Developer & System Manager
Volker Halle
Honored Contributor

Re: bad dumpfile

Willem,

5000. blocks is way too small ! For an 128 MB system with selective, compressed dumps, we have a dumpfile of 83000 blocks. Remove any DUMPFILE=n or DUMPSTYLE=n statements from MODPARAMS.DAT and re-run @SYS$UPDATE:AUTOGEN GETDATA TESTFILES NOFEEDBACK and see what AUTOGEN would suggest. DUMPSTYLE=9 is the default for OpenVMS Alpha (selective and compressed).

If you have DUMPSTYLE=0 and the dumpfile is too small, you get:

** Dumping memory to HBVS unit 300
**** Starting full memory dump at 10-AUG-2004 09:01...
................................................................................
..
**** Memory dump incomplete - dump file is 178565 blocks too small

Still I don't see the word 'partial' in there.

Did you try a DUMP/BL=COUNT=1 SYSDUMP.DMP to see if anything had been written to the file at all ?

Volker.

Willem Grooters
Honored Contributor

Re: bad dumpfile

Surprise,surprise.

The last crash was last night and that gave me some valid data in SYSDUMP.DMP. However, it could not yet been analyzed. See attachement for details on CLUE logging, data on SYSDUMP.DMP. I also added SYSGEN parameters DUMP*.

I looked at CLU$HISTORY, there is 7.3 and 7.3-1 data in it, but not any data exists for VMS7.3-2, and indeed (as observerved thanks to Volkert's suggestions to look in SYSERR.LOG) some have occurred (Machine checks, mainly, in the last (too hot) days).

I followed Volkert's advice to run Autgen (no DUMP parameters in modparams.dat) and that tells me:

Dump file calculations:

A 206329 block dump file should be created.

this is smaller than the dumpfile now on disk! That file has a tremendous amount of retrieval headers, I estamet one for each block. I stopped DUMP/HEADER on this file after the first block of data.

I have no DUMP_DEV environment variable, not in VMS nor in SRM.

Anyway, how to craete a new SYSDUMP.DMP that fits my (and the system's) needs?

Willem
Willem Grooters
OpenVMS Developer & System Manager
Willem Grooters
Honored Contributor

Re: bad dumpfile

Sorry, forgot the attachement....

Willem
Willem Grooters
OpenVMS Developer & System Manager
Dale A. Marcy
Trusted Contributor

Re: bad dumpfile

To create a new dumpfile, use the following command procedure and follow directions:

@sys$update:swapfiles
Willem Grooters
Honored Contributor

Re: bad dumpfile

I hoped it would all help....
But no - but I may have made a mistake (though the system should handle it).

* I created a new dumpfile using @sys$update:swapfiles.
* rebooted
* during reboot, ^P and CRASH on prompt (this I should not have done?)
* boot
* examine CLUE output:

$ type clue*.LOG

SYS$SYSROOT:[SYSMGR]CLUE$STARTUP_DIANA.LOG;21

$ Set NoOn
$ VERIFY = F$VERIFY(F$TRNLNM("SYLOGIN_VERIFY"))

OpenVMS (TM) system dump analyzer
%SDA-E-NOTALPHADUMP, dump file does not contain an OpenVMS Alpha dump

OpenVMS (TM) system analyzer

%CLUE-I-CLEANUP, housekeeping started...
%CLUE-I-MAXBLOCK, maximum blocks allowed 5000 blocks
%CLUE-I-STAT, total of 8 CLUE files, 504 blocks.
SYSTEM job terminated at 10-AUG-2004 20:47:23.53

Accounting information:
Buffered I/O count: 136 Peak working set size: 11472
Direct I/O count: 229 Peak virtual size: 180416
Page faults: 803 Mounted volumes: 0
Charged CPU time: 0 00:00:01.11 Elapsed time: 0 00:00:07.77
$

ANA/CRASH didn't do much good either, the very same message.
Now ANA/CRASH/OVER shows:

$ ANA/crash SYS$SYSTEM:SYSDUMP.DMP/OVER

OpenVMS (TM) system dump analyzer
%SDA-W-NOTALPHADUMP, dump file does not contain an OpenVMS Alpha dump
%SDA-W-DUMPEMPTY, dump file contains no valid dump
%SDA-W-INCDUMPFORM, dump file format incompatible with this version of SDA
...analyzing an Alpha full memory dump in override mode...

%LIB-F-BADBLOSIZ, bad block size

I'll try to OPCCRASH from VMS (then there SHOULD be a valid dump, I guess...)

Willem
Willem Grooters
OpenVMS Developer & System Manager
Willem Grooters
Honored Contributor

Re: bad dumpfile

... and that did the trick. I now have a valid dumpfile, which can be analyzed.

To summarize:

In case of an invalid dumpfile like above, do this:

1. $ @SYS$UPDATE:AUTOGEN GETDATA TESTFILES NOFEEDBACK
2. Keep the size of SYSDUMP.DMP
3. if SYSDUMP.DMP is smaller than this calculated size:
3.a. $ @SYS$UPDATE:SWAPFILES
3.a.1 leave pagefile sizes unchanged
3.a.2 Enter new (calculated) size of DUMPFILE.DMP
3.b I assume (haven't tried) that if SYSDUMP.DMP is larger than calculated, you won't need to re-create a new dumpfile. But here I may be wrong
4. Reboot the system
5. After system has come up:
6. $ MCR OPCCRASH
7. Boot the system

If step 6 isn't done, I think the dumpfile will NOT be created properly. CRASH on the SRM prompt didn't work (but I may have issued it too early (during boot), I may have neded to wait until system was up).

Anyway, so far so good. Now wait for the next (real) crash...

Thanx to all.

Willem
Willem Grooters
OpenVMS Developer & System Manager
Volker Halle
Honored Contributor

Re: bad dumpfile

Willem,

your attachment shows, that you had a good and big enough (443600. blocks) dumpfile on that system disk. It had been written to under V7.3-2 (you'll see the string V7.3-2 and the nodename, process name and system type in the first block). The file was also perfectly contiguous (just 1 retrieval pointer with the length of the file).

If you want to look just at the file header, use DUMP/HEADER/BLOCK=COUNT=0 !

In your first attempt to create a dump with CTRL-P CRASH, you may have been too early during boot. Once STARTUP executes, the system is ready to write a dump. Did you capture the console output ? It would have told you, if a dump has been written.

Note that you can have AUTOGEN create the dumpfile for you. Just let it run beyond the GENFILES phase. If AUTOGEN creates a new (smaller) dumpfile, don't forget to purge the old one AFTER the next reboot.

As you observed MACHINECHK crashes in ERRLOG.SYS, be aware that there may be situations (due to hardware errors), which prevent the dumpfile write to complete ! This may well explain the message from SDA:

%SDA-F-DUMPINCOMPL, the dump file write was not completed

There is no need to run OPCCRASH to 'create' the dumpfile. @AUTOGEN or @SWAPFILES creates the file. And if the system crashes, it writes the dump into SYSDUMP.DMP.

Now let's wait for the real crash...

Volker.