Operating System - OpenVMS
1748255 Members
3901 Online
108760 Solutions
New Discussion юеВ

Re: Not creating Crash Dump file - what am I missing?

 
SOLVED
Go to solution
Rich Hearn
Regular Advisor

Not creating Crash Dump file - what am I missing?

Hi all,

I've got a 2 node extended cluster (nodes 6 miles apart), separate system disks, with a common disk between them for some items, working off a san.

after my last crash, Node1 had no crash dump, node2 did have one - it used to work; not sure what has changed or when. Sysdump.dmp does exist in each sys$specific:[sysexe]


CACHE1::DISK$INFSYS:[RJHEARN]_>
CACHE1::DISK$INFSYS:[RJHEARN]_>sear list_sysgen.cache1 dump

DUMPSTYLE 9 9 0 -1 Bitmask D
DUMPBUG 1 1 0 1 Boolean
SAVEDUMP 0 0 0 1 Boolean
DUMPSTYLE 9 9 0 -1 Bitmask D
DUMPBUG 1 1 0 1 Boolean
SAVEDUMP 0 0 0 1 Boolean

CACHE1::DISK$INFSYS:[RJHEARN]_>
CACHE1::DISK$INFSYS:[RJHEARN]_>
CACHE1::DISK$INFSYS:[RJHEARN]_>

CACHE2::DISK$INFSYS:[RJHEARN]_>
CACHE2::DISK$INFSYS:[RJHEARN]_>
CACHE2::DISK$INFSYS:[RJHEARN]_>sear list_sysgen.cache2 dump

DUMPSTYLE 9 9 0 -1 Bitmask D
DUMPBUG 1 1 0 1 Boolean
SAVEDUMP 0 0 0 1 Boolean
DUMPSTYLE 9 9 0 -1 Bitmask D
DUMPBUG 1 1 0 1 Boolean
SAVEDUMP 0 0 0 1 Boolean

CACHE2::DISK$INFSYS:[RJHEARN]_>
CACHE2::DISK$INFSYS:[RJHEARN]_>

Looking for thoughts as to what else I want to be checking.

Thanks,
Rich
_
14 REPLIES 14
Ian Miller.
Honored Contributor

Re: Not creating Crash Dump file - what am I missing?

Is it dumping to the system disk or to a seperate disk, and is the system disk local to each node or on the SAN?

When both nodes crashed did they crash for the same reason? Is there an BUGCHECK entry in the ERRLOG for both?
____________________
Purely Personal Opinion
cnb
Honored Contributor

Re: Not creating Crash Dump file - what am I missing?


VAX, Alpha, Integrity?

Version of OVMS?

Any error or OPCOM messages before, during or after the crash on why it isn't creating the dump?

Check OPERATOR.LOG, the error log or console for clues.

Hardware failure...

Insufficient system disk space or DOSD device disk space to capture the dump...

System lost connection to dump device...

Configuration changes...

Check with @sys$update:swapfiles.com

There could be any number of reasons.


Some general configuration info is here:

http://h30266.www3.hp.com/odl/vax/opsys/vmsos73/vmsos73/6017/6017pro_070.html#und_dump

SDA:

http://h71000.www7.hp.com/doc/73final/documentation/pdf/ovms_73_alpha_sys_tools.pdf?jumpid=reg_R1002_USEN


HTH
Bill Hall
Honored Contributor

Re: Not creating Crash Dump file - what am I missing?

Rich,

You can probably find the answer in your console output written at the time of the crash.

But I'll speculate that the dump device was not available, or the boot path to the system disk, at the time of the crash. I'll assume multipath access to the system disk and you don't have the console variable dump_dev defined, or it isn't correct.

What does $write sys$output f$getenv("dump_dev") return on your systems?

Bill
Bill Hall
Rich Hearn
Regular Advisor

Re: Not creating Crash Dump file - what am I missing?


Ian,

thanks for responding. The dump goes to the system disk and each system disk is local.

Was the crash for the same reason? - I suspect so, but the "official" word is a CPUSPINWAIT timeout. The question I'm still trying to answer regarding that is - what caused it?

There is not a BUGCHECK entry per se, but there is a "VMS Crash Restart Event" in both nodes errlog files at the appropriate times.

cnb,

I appreciate you taking the time for this. One of these days, I'll include everything needed the 1st time I write :^)

It's an Alpha ES47 VMS 8.3 16GB mem - you may have hit it, tho I only noticed it after reading Bills' post.

CACHE1::DISK$INFSYS:[RJHEARN]_>
CACHE1::DISK$INFSYS:[RJHEARN]_>show dev dsa

Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
DSA1: Mounted 0 CACHE1_V83-0 9957114 2944 2
DSA2: Mounted 0 CACHE2_V83-0 31794303 1 2
CACHE1::DISK$INFSYS:[RJHEARN]_>


Bill,

Thank you also for your thoughts... Re-re-re-visiting the console output, I see what I missed before: (disk size)

CACHE1::SYS$COMMON:[SYSMGR]_>

**** OpenVMS Alpha Operating System V8.3 - BUGCHECK ****

** Bugcheck code = 0000078C: CPUSPINWAIT, CPU spinwait timer expired
** Crash CPU: 00000001 Primary CPU: 00000000 Node Name: CACHE1
** Supported CPU count: 00000040
** Active CPUs: 00000000.0000000F
** Current Process: MMHUFF
** Current PSB ID: 00000001
** Image Name: $1$DGA51:[CACHE.CACHE1.BIN]CACHE.EXE

** Dumping error log buffers to HBVS unit 0

**** Unable to dump error log buffers to remaining shadow set members
** Error log buffers not dumped to HBVS unit 0 (master member)

** Dumping memory to HBVS unit 0
**** Starting compressed selective memory dump at 22-JUL-2009 09:03...
................................................................................
.
.
.
.....................................................................
**** Memory dump complete - not all key processes or global pages saved

halted CPU 0


I thought it was only the errlog buffers that were not being written, since it stated
memory dump complete - guess I need to get
more disk space on the system disk as you & cnb pointed out. I'm guessing I must've had enough room on DSA2 for the dump to be written then compressed down to 2502032 Blks

Rich

p.s.

both nodes are identical except for "booted_osflags" (as it should be)

CACHE1::DISK$INFSYS:[RJHEARN]_>@ f$getenv.com
$ write sys$output f$getenv("boot_dev")
SCSI 2 1 0 0 0 0 0
$ write sys$output f$getenv("bootdef_dev")
SCSI 2 1 0 0 0 0 0
$ write sys$output f$getenv("booted_dev")
SCSI 2 1 0 0 0 0 0
$ write sys$output f$getenv("boot_file")

$ write sys$output f$getenv("booted_file")

$ write sys$output f$getenv("boot_osflags")
1,0
$ write sys$output f$getenv("booted_osflags")
1,0
$ write sys$output f$getenv("boot_reset")
OFF
$ write sys$output f$getenv("dump_dev")

$ write sys$output f$getenv("enable_audit")
ON
$ write sys$output f$getenv("license")
MU
$ write sys$output f$getenv("char_set")

$ write sys$output f$getenv("language")
6
$ write sys$output f$getenv("tty_dev")
0
$ vfy = f$verify(0)
CACHE1::DISK$INFSYS:[RJHEARN]_>

cnb
Honored Contributor
Solution

Re: Not creating Crash Dump file - what am I missing?

Glad it helped locate the issue.

This appears to be the standard default settings (compressed & selective) for your environment.

From SYSGEN help on SYS_PAR DUMPSTYLE:

.....

If you plan to enable the Volume Shadowing minimerge feature on
an Alpha or I64 system disk, be sure to specify DOSD to an
alternate disk.

NOTE

On Alpha and I64 systems, you can save space on the system
disk and, in the event of a crash, save time recording
the system memory, by using the OpenVMS Alpha and I64 dump
compression feature. Unless you override the default AUTOGEN
calculations (by setting DUMPSTYLE in MODPARAMS.DAT),
AUTOGEN uses the following algorithm:

o On a system with less than 128 MB of memory, the system
sets the DUMPSTYLE to 1 (a raw selective dump) and sizes
the dump file appropriately.

o On a system with 128 MB of memory or greater, the system
sets the DUMPSTYLE to 9 (a compressed selective dump),
and creates the dump file at two-thirds the value of the
corresponding raw dump.


Rgds,

cnb
Rich Hearn
Regular Advisor

Re: Not creating Crash Dump file - what am I missing?


cnb,

Haven't had to think about the system disk space size in 5 yrs - guess it's time :^)

Tnx agn,
Rich
_
Bill Hall
Honored Contributor

Re: Not creating Crash Dump file - what am I missing?

Rich,

My suspicion that console variable dump_dev was not defined correctly was based on your statement "working off a san" that I assumed meant that your boot device was a multipathed san device. I wasn't thinking shadowed system, but the requirement to define dump_dev to all paths to a dump device and to all shadow-set members of the device that holds the system dump file applies in this case. The dump file must be written to all members of the shadow-set if they are available.

This is not a space problem, its a problem with your definition of dump_dev. As an example, >>>set dump_dev dka100,dkb100. Substitute all of the members of your shadowed boot device.

I also noticed that you only have one shadow set member in bootdef_dev environment variable. Bootdef_dev should also contain all members of the shadowed boot device. This allows you to boot from any member of the shadow-set in the event one or more fail.

Bill

Bill
Bill Hall
Steve Reece_3
Trusted Contributor

Re: Not creating Crash Dump file - what am I missing?

Rich,

Not sure whether you're booting off internal disks or whether you're booting off the SAN(s). In either case, what's the shadow set for the system disk made up of? you need to make sure that different shadowset members have different unit numbers/LUNs for VMS - you shouldn't mix (for example) DKA0 and DKB0 or you can end up writing over the crash dump. DKA0 and DKB100 would be ok as I understand it. So, if you are booting off the SAN(s), you could work with $1$DGA1 and $1$DGA10.
Steve Reece_3
Trusted Contributor

Re: Not creating Crash Dump file - what am I missing?

p.s. I don't see that the system disk being short of or having no space would have the effect of not writing the dump. If the system has rebooted and mapped the dump file successfully and you're dumping onto the system disk, the system should write to the file that's already been mapped shouldn't it? The time that you'd run out of space for the dump is when you copied it sideways within SDA?