Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Not creating Crash Dump file - what am I missing?

 
SOLVED
Go to solution
Rich Hearn
Regular Advisor

Not creating Crash Dump file - what am I missing?

Hi all,

I've got a 2 node extended cluster (nodes 6 miles apart), separate system disks, with a common disk between them for some items, working off a san.

after my last crash, Node1 had no crash dump, node2 did have one - it used to work; not sure what has changed or when. Sysdump.dmp does exist in each sys$specific:[sysexe]


CACHE1::DISK$INFSYS:[RJHEARN]_>
CACHE1::DISK$INFSYS:[RJHEARN]_>sear list_sysgen.cache1 dump

DUMPSTYLE 9 9 0 -1 Bitmask D
DUMPBUG 1 1 0 1 Boolean
SAVEDUMP 0 0 0 1 Boolean
DUMPSTYLE 9 9 0 -1 Bitmask D
DUMPBUG 1 1 0 1 Boolean
SAVEDUMP 0 0 0 1 Boolean

CACHE1::DISK$INFSYS:[RJHEARN]_>
CACHE1::DISK$INFSYS:[RJHEARN]_>
CACHE1::DISK$INFSYS:[RJHEARN]_>

CACHE2::DISK$INFSYS:[RJHEARN]_>
CACHE2::DISK$INFSYS:[RJHEARN]_>
CACHE2::DISK$INFSYS:[RJHEARN]_>sear list_sysgen.cache2 dump

DUMPSTYLE 9 9 0 -1 Bitmask D
DUMPBUG 1 1 0 1 Boolean
SAVEDUMP 0 0 0 1 Boolean
DUMPSTYLE 9 9 0 -1 Bitmask D
DUMPBUG 1 1 0 1 Boolean
SAVEDUMP 0 0 0 1 Boolean

CACHE2::DISK$INFSYS:[RJHEARN]_>
CACHE2::DISK$INFSYS:[RJHEARN]_>

Looking for thoughts as to what else I want to be checking.

Thanks,
Rich
_
14 REPLIES
Ian Miller.
Honored Contributor

Re: Not creating Crash Dump file - what am I missing?

Is it dumping to the system disk or to a seperate disk, and is the system disk local to each node or on the SAN?

When both nodes crashed did they crash for the same reason? Is there an BUGCHECK entry in the ERRLOG for both?
____________________
Purely Personal Opinion
cnb
Honored Contributor

Re: Not creating Crash Dump file - what am I missing?


VAX, Alpha, Integrity?

Version of OVMS?

Any error or OPCOM messages before, during or after the crash on why it isn't creating the dump?

Check OPERATOR.LOG, the error log or console for clues.

Hardware failure...

Insufficient system disk space or DOSD device disk space to capture the dump...

System lost connection to dump device...

Configuration changes...

Check with @sys$update:swapfiles.com

There could be any number of reasons.


Some general configuration info is here:

http://h30266.www3.hp.com/odl/vax/opsys/vmsos73/vmsos73/6017/6017pro_070.html#und_dump

SDA:

http://h71000.www7.hp.com/doc/73final/documentation/pdf/ovms_73_alpha_sys_tools.pdf?jumpid=reg_R1002_USEN


HTH
Bill Hall
Honored Contributor

Re: Not creating Crash Dump file - what am I missing?

Rich,

You can probably find the answer in your console output written at the time of the crash.

But I'll speculate that the dump device was not available, or the boot path to the system disk, at the time of the crash. I'll assume multipath access to the system disk and you don't have the console variable dump_dev defined, or it isn't correct.

What does $write sys$output f$getenv("dump_dev") return on your systems?

Bill
Bill Hall
Rich Hearn
Regular Advisor

Re: Not creating Crash Dump file - what am I missing?


Ian,

thanks for responding. The dump goes to the system disk and each system disk is local.

Was the crash for the same reason? - I suspect so, but the "official" word is a CPUSPINWAIT timeout. The question I'm still trying to answer regarding that is - what caused it?

There is not a BUGCHECK entry per se, but there is a "VMS Crash Restart Event" in both nodes errlog files at the appropriate times.

cnb,

I appreciate you taking the time for this. One of these days, I'll include everything needed the 1st time I write :^)

It's an Alpha ES47 VMS 8.3 16GB mem - you may have hit it, tho I only noticed it after reading Bills' post.

CACHE1::DISK$INFSYS:[RJHEARN]_>
CACHE1::DISK$INFSYS:[RJHEARN]_>show dev dsa

Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
DSA1: Mounted 0 CACHE1_V83-0 9957114 2944 2
DSA2: Mounted 0 CACHE2_V83-0 31794303 1 2
CACHE1::DISK$INFSYS:[RJHEARN]_>


Bill,

Thank you also for your thoughts... Re-re-re-visiting the console output, I see what I missed before: (disk size)

CACHE1::SYS$COMMON:[SYSMGR]_>

**** OpenVMS Alpha Operating System V8.3 - BUGCHECK ****

** Bugcheck code = 0000078C: CPUSPINWAIT, CPU spinwait timer expired
** Crash CPU: 00000001 Primary CPU: 00000000 Node Name: CACHE1
** Supported CPU count: 00000040
** Active CPUs: 00000000.0000000F
** Current Process: MMHUFF
** Current PSB ID: 00000001
** Image Name: $1$DGA51:[CACHE.CACHE1.BIN]CACHE.EXE

** Dumping error log buffers to HBVS unit 0

**** Unable to dump error log buffers to remaining shadow set members
** Error log buffers not dumped to HBVS unit 0 (master member)

** Dumping memory to HBVS unit 0
**** Starting compressed selective memory dump at 22-JUL-2009 09:03...
................................................................................
.
.
.
.....................................................................
**** Memory dump complete - not all key processes or global pages saved

halted CPU 0


I thought it was only the errlog buffers that were not being written, since it stated
memory dump complete - guess I need to get
more disk space on the system disk as you & cnb pointed out. I'm guessing I must've had enough room on DSA2 for the dump to be written then compressed down to 2502032 Blks

Rich

p.s.

both nodes are identical except for "booted_osflags" (as it should be)

CACHE1::DISK$INFSYS:[RJHEARN]_>@ f$getenv.com
$ write sys$output f$getenv("boot_dev")
SCSI 2 1 0 0 0 0 0
$ write sys$output f$getenv("bootdef_dev")
SCSI 2 1 0 0 0 0 0
$ write sys$output f$getenv("booted_dev")
SCSI 2 1 0 0 0 0 0
$ write sys$output f$getenv("boot_file")

$ write sys$output f$getenv("booted_file")

$ write sys$output f$getenv("boot_osflags")
1,0
$ write sys$output f$getenv("booted_osflags")
1,0
$ write sys$output f$getenv("boot_reset")
OFF
$ write sys$output f$getenv("dump_dev")

$ write sys$output f$getenv("enable_audit")
ON
$ write sys$output f$getenv("license")
MU
$ write sys$output f$getenv("char_set")

$ write sys$output f$getenv("language")
6
$ write sys$output f$getenv("tty_dev")
0
$ vfy = f$verify(0)
CACHE1::DISK$INFSYS:[RJHEARN]_>

cnb
Honored Contributor
Solution

Re: Not creating Crash Dump file - what am I missing?

Glad it helped locate the issue.

This appears to be the standard default settings (compressed & selective) for your environment.

From SYSGEN help on SYS_PAR DUMPSTYLE:

.....

If you plan to enable the Volume Shadowing minimerge feature on
an Alpha or I64 system disk, be sure to specify DOSD to an
alternate disk.

NOTE

On Alpha and I64 systems, you can save space on the system
disk and, in the event of a crash, save time recording
the system memory, by using the OpenVMS Alpha and I64 dump
compression feature. Unless you override the default AUTOGEN
calculations (by setting DUMPSTYLE in MODPARAMS.DAT),
AUTOGEN uses the following algorithm:

o On a system with less than 128 MB of memory, the system
sets the DUMPSTYLE to 1 (a raw selective dump) and sizes
the dump file appropriately.

o On a system with 128 MB of memory or greater, the system
sets the DUMPSTYLE to 9 (a compressed selective dump),
and creates the dump file at two-thirds the value of the
corresponding raw dump.


Rgds,

cnb
Rich Hearn
Regular Advisor

Re: Not creating Crash Dump file - what am I missing?


cnb,

Haven't had to think about the system disk space size in 5 yrs - guess it's time :^)

Tnx agn,
Rich
_
Bill Hall
Honored Contributor

Re: Not creating Crash Dump file - what am I missing?

Rich,

My suspicion that console variable dump_dev was not defined correctly was based on your statement "working off a san" that I assumed meant that your boot device was a multipathed san device. I wasn't thinking shadowed system, but the requirement to define dump_dev to all paths to a dump device and to all shadow-set members of the device that holds the system dump file applies in this case. The dump file must be written to all members of the shadow-set if they are available.

This is not a space problem, its a problem with your definition of dump_dev. As an example, >>>set dump_dev dka100,dkb100. Substitute all of the members of your shadowed boot device.

I also noticed that you only have one shadow set member in bootdef_dev environment variable. Bootdef_dev should also contain all members of the shadowed boot device. This allows you to boot from any member of the shadow-set in the event one or more fail.

Bill

Bill
Bill Hall
Steve Reece_3
Trusted Contributor

Re: Not creating Crash Dump file - what am I missing?

Rich,

Not sure whether you're booting off internal disks or whether you're booting off the SAN(s). In either case, what's the shadow set for the system disk made up of? you need to make sure that different shadowset members have different unit numbers/LUNs for VMS - you shouldn't mix (for example) DKA0 and DKB0 or you can end up writing over the crash dump. DKA0 and DKB100 would be ok as I understand it. So, if you are booting off the SAN(s), you could work with $1$DGA1 and $1$DGA10.
Steve Reece_3
Trusted Contributor

Re: Not creating Crash Dump file - what am I missing?

p.s. I don't see that the system disk being short of or having no space would have the effect of not writing the dump. If the system has rebooted and mapped the dump file successfully and you're dumping onto the system disk, the system should write to the file that's already been mapped shouldn't it? The time that you'd run out of space for the dump is when you copied it sideways within SDA?
Rich Hearn
Regular Advisor

Re: Not creating Crash Dump file - what am I missing?


Bill & Steve - apologies for my delay in getting back here.


Bill,

Thanks for the boot_dev info - it, obviously, needs to be corrected. It does confuse me tho' that the node Cache2 has the same set up and *does* write a crash dump.

Each boot disk is a shadowed local disk to each system, so I thought I was ok with it.
Your thoughts about the corrections will be applied...


Steve,

the boot device is local (internal) and shadowed using Dka0 and Dkb0 - the thinkin' being if a controller "died" we'd have the redundancy - maybe not so good in retrospect...
Volume Name is: CACHE1_V83-0 Shadow Set is: Dsa1: Disk order is: _$1$DKA0: _$1$DKB0: None
Volume Name is: CACHE2_V83-0 Shadow Set is: Dsa2: Disk order is: _$2$DKA0: _$2$DKB0: None

This *used* to work - that's what's got me befuddled. I've not made any *intentional* changes (we all know how that goes :^)

Thanks for your thoughts and time.
Rich
_
The Brit
Honored Contributor

Re: Not creating Crash Dump file - what am I missing?

Rich,
This is just an excerpt from an old Alpha management manual. Note number 3. below.

On Alpha systems, the requirements for writing the DOSD are the following:

The dump device directory structure must resemble the current system disk structure. The [SYSn.SYSEXE]SYSDUMP.DMP file will reside there, with the same boot time system root.
Use AUTOGEN to create this file. In the MODPARAMS.DAT file, the following symbol prompts AUTOGEN to create the file:
DUMPFILE_DEVICE = $nnn$ddcuuuu


You can enter a list of devices.

1. The dump disk must have an ODS-2 file structure.

2. The dump device cannot be part of a volume set.

3. The dump device cannot be part of a shadow set unless it is also the system device and the master member of the shadow set.

Use the following format to specify the dump device environment variable DUMP_DEV at the console prompt:

>>> SET DUMP_DEV device-name[...]

If I remember correctly, the DUMP_DEV definitions are only required if you are dumping OFF the system disk. (not true in your case)

Dave
Rich Hearn
Regular Advisor

Re: Not creating Crash Dump file - what am I missing?


Dave,

"the DUMP_DEV definitions are only required if you are dumping OFF the system disk"

Ah, that could explain why I never recall having set it before.

Tnx,
Rich
_
Steve Reece_3
Trusted Contributor

Re: Not creating Crash Dump file - what am I missing?

Hi Rich,

Having shadowed disks is just fine and dandy and will achieve resilience (as you suggest).
The thing you need to do is switch one of the disks to a different SCSI ID - make them, say, DKA0 and DKB100.

If I recall correctly, I think the name of the chap that described crashing and having different SCSI IDs on disks being essential was Richard Bishop. If you have the same SCSI ID on two disks in a shadow set, the driver that is being used for the crash dump writing isn't sure which is the master member so will sometimes get it right, othertimes get it wrong.

Steve
Rich Hearn
Regular Advisor

Re: Not creating Crash Dump file - what am I missing?



Steve,

Fascinating stuff. Faux pas par excellence on my part I'd say, what with the same drive #'s. I'll have to see what I can do to get them changed. Thank you for picking up on my missing your point - do 'preciate you clarifying.

Rich
_