- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Not creating Crash Dump file - what am I missing?
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-30-2009 07:30 AM
тАО07-30-2009 07:30 AM
I've got a 2 node extended cluster (nodes 6 miles apart), separate system disks, with a common disk between them for some items, working off a san.
after my last crash, Node1 had no crash dump, node2 did have one - it used to work; not sure what has changed or when. Sysdump.dmp does exist in each sys$specific:[sysexe]
CACHE1::DISK$INFSYS:[RJHEARN]_>
CACHE1::DISK$INFSYS:[RJHEARN]_>sear list_sysgen.cache1 dump
DUMPSTYLE 9 9 0 -1 Bitmask D
DUMPBUG 1 1 0 1 Boolean
SAVEDUMP 0 0 0 1 Boolean
DUMPSTYLE 9 9 0 -1 Bitmask D
DUMPBUG 1 1 0 1 Boolean
SAVEDUMP 0 0 0 1 Boolean
CACHE1::DISK$INFSYS:[RJHEARN]_>
CACHE1::DISK$INFSYS:[RJHEARN]_>
CACHE1::DISK$INFSYS:[RJHEARN]_>
CACHE2::DISK$INFSYS:[RJHEARN]_>
CACHE2::DISK$INFSYS:[RJHEARN]_>
CACHE2::DISK$INFSYS:[RJHEARN]_>sear list_sysgen.cache2 dump
DUMPSTYLE 9 9 0 -1 Bitmask D
DUMPBUG 1 1 0 1 Boolean
SAVEDUMP 0 0 0 1 Boolean
DUMPSTYLE 9 9 0 -1 Bitmask D
DUMPBUG 1 1 0 1 Boolean
SAVEDUMP 0 0 0 1 Boolean
CACHE2::DISK$INFSYS:[RJHEARN]_>
CACHE2::DISK$INFSYS:[RJHEARN]_>
Looking for thoughts as to what else I want to be checking.
Thanks,
Rich
_
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-30-2009 07:51 AM
тАО07-30-2009 07:51 AM
Re: Not creating Crash Dump file - what am I missing?
When both nodes crashed did they crash for the same reason? Is there an BUGCHECK entry in the ERRLOG for both?
Purely Personal Opinion
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-30-2009 08:26 AM
тАО07-30-2009 08:26 AM
Re: Not creating Crash Dump file - what am I missing?
VAX, Alpha, Integrity?
Version of OVMS?
Any error or OPCOM messages before, during or after the crash on why it isn't creating the dump?
Check OPERATOR.LOG, the error log or console for clues.
Hardware failure...
Insufficient system disk space or DOSD device disk space to capture the dump...
System lost connection to dump device...
Configuration changes...
Check with @sys$update:swapfiles.com
There could be any number of reasons.
Some general configuration info is here:
http://h30266.www3.hp.com/odl/vax/opsys/vmsos73/vmsos73/6017/6017pro_070.html#und_dump
SDA:
http://h71000.www7.hp.com/doc/73final/documentation/pdf/ovms_73_alpha_sys_tools.pdf?jumpid=reg_R1002_USEN
HTH
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-30-2009 08:31 AM
тАО07-30-2009 08:31 AM
Re: Not creating Crash Dump file - what am I missing?
You can probably find the answer in your console output written at the time of the crash.
But I'll speculate that the dump device was not available, or the boot path to the system disk, at the time of the crash. I'll assume multipath access to the system disk and you don't have the console variable dump_dev defined, or it isn't correct.
What does $write sys$output f$getenv("dump_dev") return on your systems?
Bill
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-30-2009 10:03 AM
тАО07-30-2009 10:03 AM
Re: Not creating Crash Dump file - what am I missing?
Ian,
thanks for responding. The dump goes to the system disk and each system disk is local.
Was the crash for the same reason? - I suspect so, but the "official" word is a CPUSPINWAIT timeout. The question I'm still trying to answer regarding that is - what caused it?
There is not a BUGCHECK entry per se, but there is a "VMS Crash Restart Event" in both nodes errlog files at the appropriate times.
cnb,
I appreciate you taking the time for this. One of these days, I'll include everything needed the 1st time I write :^)
It's an Alpha ES47 VMS 8.3 16GB mem - you may have hit it, tho I only noticed it after reading Bills' post.
CACHE1::DISK$INFSYS:[RJHEARN]_>
CACHE1::DISK$INFSYS:[RJHEARN]_>show dev dsa
Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
DSA1: Mounted 0 CACHE1_V83-0 9957114 2944 2
DSA2: Mounted 0 CACHE2_V83-0 31794303 1 2
CACHE1::DISK$INFSYS:[RJHEARN]_>
Bill,
Thank you also for your thoughts... Re-re-re-visiting the console output, I see what I missed before: (disk size)
CACHE1::SYS$COMMON:[SYSMGR]_>
**** OpenVMS Alpha Operating System V8.3 - BUGCHECK ****
** Bugcheck code = 0000078C: CPUSPINWAIT, CPU spinwait timer expired
** Crash CPU: 00000001 Primary CPU: 00000000 Node Name: CACHE1
** Supported CPU count: 00000040
** Active CPUs: 00000000.0000000F
** Current Process: MMHUFF
** Current PSB ID: 00000001
** Image Name: $1$DGA51:[CACHE.CACHE1.BIN]CACHE.EXE
** Dumping error log buffers to HBVS unit 0
**** Unable to dump error log buffers to remaining shadow set members
** Error log buffers not dumped to HBVS unit 0 (master member)
** Dumping memory to HBVS unit 0
**** Starting compressed selective memory dump at 22-JUL-2009 09:03...
................................................................................
.
.
.
.....................................................................
**** Memory dump complete - not all key processes or global pages saved
halted CPU 0
I thought it was only the errlog buffers that were not being written, since it stated
memory dump complete - guess I need to get
more disk space on the system disk as you & cnb pointed out. I'm guessing I must've had enough room on DSA2 for the dump to be written then compressed down to 2502032 Blks
Rich
p.s.
both nodes are identical except for "booted_osflags" (as it should be)
CACHE1::DISK$INFSYS:[RJHEARN]_>@ f$getenv.com
$ write sys$output f$getenv("boot_dev")
SCSI 2 1 0 0 0 0 0
$ write sys$output f$getenv("bootdef_dev")
SCSI 2 1 0 0 0 0 0
$ write sys$output f$getenv("booted_dev")
SCSI 2 1 0 0 0 0 0
$ write sys$output f$getenv("boot_file")
$ write sys$output f$getenv("booted_file")
$ write sys$output f$getenv("boot_osflags")
1,0
$ write sys$output f$getenv("booted_osflags")
1,0
$ write sys$output f$getenv("boot_reset")
OFF
$ write sys$output f$getenv("dump_dev")
$ write sys$output f$getenv("enable_audit")
ON
$ write sys$output f$getenv("license")
MU
$ write sys$output f$getenv("char_set")
$ write sys$output f$getenv("language")
6
$ write sys$output f$getenv("tty_dev")
0
$ vfy = f$verify(0)
CACHE1::DISK$INFSYS:[RJHEARN]_>
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-30-2009 10:35 AM
тАО07-30-2009 10:35 AM
SolutionThis appears to be the standard default settings (compressed & selective) for your environment.
From SYSGEN help on SYS_PAR DUMPSTYLE:
.....
If you plan to enable the Volume Shadowing minimerge feature on
an Alpha or I64 system disk, be sure to specify DOSD to an
alternate disk.
NOTE
On Alpha and I64 systems, you can save space on the system
disk and, in the event of a crash, save time recording
the system memory, by using the OpenVMS Alpha and I64 dump
compression feature. Unless you override the default AUTOGEN
calculations (by setting DUMPSTYLE in MODPARAMS.DAT),
AUTOGEN uses the following algorithm:
o On a system with less than 128 MB of memory, the system
sets the DUMPSTYLE to 1 (a raw selective dump) and sizes
the dump file appropriately.
o On a system with 128 MB of memory or greater, the system
sets the DUMPSTYLE to 9 (a compressed selective dump),
and creates the dump file at two-thirds the value of the
corresponding raw dump.
Rgds,
cnb
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-30-2009 10:49 AM
тАО07-30-2009 10:49 AM
Re: Not creating Crash Dump file - what am I missing?
cnb,
Haven't had to think about the system disk space size in 5 yrs - guess it's time :^)
Tnx agn,
Rich
_
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-30-2009 12:13 PM
тАО07-30-2009 12:13 PM
Re: Not creating Crash Dump file - what am I missing?
My suspicion that console variable dump_dev was not defined correctly was based on your statement "working off a san" that I assumed meant that your boot device was a multipathed san device. I wasn't thinking shadowed system, but the requirement to define dump_dev to all paths to a dump device and to all shadow-set members of the device that holds the system dump file applies in this case. The dump file must be written to all members of the shadow-set if they are available.
This is not a space problem, its a problem with your definition of dump_dev. As an example, >>>set dump_dev dka100,dkb100. Substitute all of the members of your shadowed boot device.
I also noticed that you only have one shadow set member in bootdef_dev environment variable. Bootdef_dev should also contain all members of the shadowed boot device. This allows you to boot from any member of the shadow-set in the event one or more fail.
Bill
Bill
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-31-2009 08:25 AM
тАО07-31-2009 08:25 AM
Re: Not creating Crash Dump file - what am I missing?
Not sure whether you're booting off internal disks or whether you're booting off the SAN(s). In either case, what's the shadow set for the system disk made up of? you need to make sure that different shadowset members have different unit numbers/LUNs for VMS - you shouldn't mix (for example) DKA0 and DKB0 or you can end up writing over the crash dump. DKA0 and DKB100 would be ok as I understand it. So, if you are booting off the SAN(s), you could work with $1$DGA1 and $1$DGA10.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-31-2009 08:29 AM
тАО07-31-2009 08:29 AM
Re: Not creating Crash Dump file - what am I missing?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-04-2009 10:46 AM
тАО08-04-2009 10:46 AM
Re: Not creating Crash Dump file - what am I missing?
Bill & Steve - apologies for my delay in getting back here.
Bill,
Thanks for the boot_dev info - it, obviously, needs to be corrected. It does confuse me tho' that the node Cache2 has the same set up and *does* write a crash dump.
Each boot disk is a shadowed local disk to each system, so I thought I was ok with it.
Your thoughts about the corrections will be applied...
Steve,
the boot device is local (internal) and shadowed using Dka0 and Dkb0 - the thinkin' being if a controller "died" we'd have the redundancy - maybe not so good in retrospect...
Volume Name is: CACHE1_V83-0 Shadow Set is: Dsa1: Disk order is: _$1$DKA0: _$1$DKB0: None
Volume Name is: CACHE2_V83-0 Shadow Set is: Dsa2: Disk order is: _$2$DKA0: _$2$DKB0: None
This *used* to work - that's what's got me befuddled. I've not made any *intentional* changes (we all know how that goes :^)
Thanks for your thoughts and time.
Rich
_
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-04-2009 11:09 AM
тАО08-04-2009 11:09 AM
Re: Not creating Crash Dump file - what am I missing?
This is just an excerpt from an old Alpha management manual. Note number 3. below.
On Alpha systems, the requirements for writing the DOSD are the following:
The dump device directory structure must resemble the current system disk structure. The [SYSn.SYSEXE]SYSDUMP.DMP file will reside there, with the same boot time system root.
Use AUTOGEN to create this file. In the MODPARAMS.DAT file, the following symbol prompts AUTOGEN to create the file:
DUMPFILE_DEVICE = $nnn$ddcuuuu
You can enter a list of devices.
1. The dump disk must have an ODS-2 file structure.
2. The dump device cannot be part of a volume set.
3. The dump device cannot be part of a shadow set unless it is also the system device and the master member of the shadow set.
Use the following format to specify the dump device environment variable DUMP_DEV at the console prompt:
>>> SET DUMP_DEV device-name[...]
If I remember correctly, the DUMP_DEV definitions are only required if you are dumping OFF the system disk. (not true in your case)
Dave
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-04-2009 11:52 AM
тАО08-04-2009 11:52 AM
Re: Not creating Crash Dump file - what am I missing?
Dave,
"the DUMP_DEV definitions are only required if you are dumping OFF the system disk"
Ah, that could explain why I never recall having set it before.
Tnx,
Rich
_
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-05-2009 12:20 AM
тАО08-05-2009 12:20 AM
Re: Not creating Crash Dump file - what am I missing?
Having shadowed disks is just fine and dandy and will achieve resilience (as you suggest).
The thing you need to do is switch one of the disks to a different SCSI ID - make them, say, DKA0 and DKB100.
If I recall correctly, I think the name of the chap that described crashing and having different SCSI IDs on disks being essential was Richard Bishop. If you have the same SCSI ID on two disks in a shadow set, the driver that is being used for the crash dump writing isn't sure which is the master member so will sometimes get it right, othertimes get it wrong.
Steve
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-05-2009 01:19 AM
тАО08-05-2009 01:19 AM
Re: Not creating Crash Dump file - what am I missing?
Steve,
Fascinating stuff. Faux pas par excellence on my part I'd say, what with the same drive #'s. I'll have to see what I can do to get them changed. Thank you for picking up on my missing your point - do 'preciate you clarifying.
Rich
_