"crash dump complete"... but where?!?

Evert Jan van Ramselaar · ‎05-11-2004

Imagine this console output:

~~~~~ start output ~~~~~
Assertpanic (cpu 0): Assert Failed: tokp->tok_wantptr == NULL
file: svrtok.c:2071, caller: 0xfffffc0000f8394c

DUMP: blocks available: 60348287
DUMP: blocks wanted: 1884850 (partial compressed dump) [OKAY]
DUMP: Device Disk Blocks Available
DUMP: ------ ---------------------
DUMP: 0x13001d6 45265310 - 60348284 (of 60348285) [primary swap]
DUMP.prom: Open: dev 0x5100181, block 524288: SCSI3 3 1 0 0 0 0 0 PN: WWID:01000010:6000-1fe1-001e-4210-0009-0460-7093-0083
registering new portname - PN: WWID:01000010:6000-1fe1-001e-4210-0009-0460-7093-0083
wwid already registered at 0

cb_open: failed SCSI3 3 1 0 905 0 0 0 @wwid0 , dgb114.1905.0.1.3

DUMP.prom: Open Error: dev 0x5100181: open failed (SCSI3)
DUMP: Error: init header write on dev 0x13001d6: errno 5 (primary swap)

DUMP: first crash dump failed: attempting memory dump...
DUMP: compressing 1884848KB into 15279239KB memory...
DUMP: Starting Address Ending Address Size(MB)
DUMP: ------------------ ------------------ --------
DUMP: 0xfffffc03feec6000 - 0xfffffc03fffedfef 17.1 (indicator)
DUMP: Writing data.................................................. [50MB]
DUMP: Writing data.................................................. [100MB]
DUMP: Writing data.................................................. [150MB]
DUMP: Writing data.................................................. [200MB]
DUMP: Writing data.................................................. [250MB]
DUMP: Writing data.................................... [286MB]
DUMP: crash dump complete.
halted CPU 1
halted CPU 2
halted CPU 3

halted CPU 0

halt code = 5
HALT instruction executed
PC = fffffc0000baa870

P00>>>
~~~~~ end output ~~~~~

At the time of the crash there was a SAN problem, so the dump could not be saved to the swap disk. However, the memory dump after the first failure seems to work ok, hence the "crash dump complete" message.

Anyway, after booting the server (SAN problems were solved at that time), nothing was saved to /var/adm/crash. What would have been the right way to boot the server and preserve the crash dump for further investigation? Now we have nothing to investigate on but the console output.

EJ

Contrary to popular belief, Unix is userfriendly. It just happens to be selective about who it makes friends with.

Mobeen_1 · ‎05-11-2004

EJ,
On the Alpha's if you would want to be absolutely sure, you can prefer to do a "force dump" while you are at the boot prompt >>>.

This will make sure that, all that in the cache is written to the dump device configured.

The way to carry out a force dump is to issue Ctrl+P while you are at the boot prompt and wait for the dump to complete before proceeding to boot the server up.

regards
Mobeen

Michael Schulte zur Sur · ‎05-12-2004

Hi,

compressing 1884848KB into 15279239KB memory...

the dump was in the memory.
It is good to have the srm variable dump_dev set to a local disk to provide a place for dump, since Tru64 tries to dump into the swap space. So if you resetted the machine, the dump is gone.

greetings,

Michael

Johan Brusche · ‎05-12-2004

The opsys version is not mentionned, but for more recent versions the "dumpdev" kernel variable is obsolete. In this case the dumpdevice was NOT available due to a SAN problem, and no option other than dumping into memory was left.

For kernel variables that influence dump behaviour, check the manpage for sys_attrs_generic, and in particular the variables found via:

sysconfig -q generic | grep dump

compressed_dump = 1
dump_exmem_addr = 0
dump_exmem_size = 0
dump_exmem_include = 0
dump_kernel_text = 0
dump_savecnt = 1
dump_sp_threshold = 16384
dump_to_memory = 0
dump_user_pte_pages = 0
expected_dump_compression = 500
live_dump_zero_suppress = 1
live_dump_dir_name = /var/adm/crash
partial_dump = 1

Also the manpage for "savecore" migth be worthwhile reading, and evenso the startup script /sbin/init.d/savecore. From that reading, you will learn that the variables SAVECORE_FLAGS and SAVECORE_DIR in /etc/rc.config also can influence the saving of mem-dumps.

Another peace of info missing is what message was recorded on the console after the "Checking for crash dumps" string, during the subsequent reboot.

In this case maybe, booting into single user mode with >>>boot -fl s, then using mountroot and mount /var and then manually running the command /sbin/savecore -vP could have helped.

Rgds,
Johan.

_JB_

Evert Jan van Ramselaar · ‎05-12-2004

@Mobeen:
We will try the force dump when we get the same problem again (which I hope will not be in the near future).

@Michael:
We did not reset the machine after the crash, but just typed "boot" at the boot prompt.

@Johan:
Part of the console log at first boot:
~~~~~
The system is coming up. Please wait...
Checking for crash dumps
Initializing paging space
~~~~~

So it seems like savecore did not find anything. Maybe it only searched in swap and not in memory?

Also the command "grep SAVE /etc/rc.config" does not give any result. Could the fact that we do not have any SAVECORE settings in rc.config be a part of our problem?

Contrary to popular belief, Unix is userfriendly. It just happens to be selective about who it makes friends with.

Johan Brusche · ‎05-12-2004

On the part of forcing a dump.....
Ctrl+P is not a SRM-prompt command, it can only be used on the console of an OpenVMS system to enter into console SRM mode.

On a Tru64 running system halt-button or RCM>halt in, must be used to get into the SRM >>> prompt.(CTRL^P is ignored).

Once at >>> prompt, use the "crash" command, to generate a dump.

The "dump_dev" mentioned by Michael is a console prompt>>> variable.(Seems to undefined on our AS4100).

Johan.

_JB_

Michael Schulte zur Sur · ‎05-12-2004

Johan,

I allways set dump_dev on console prompt to ensure, the machine has a place to write to in case the machine crashes before swap is defined.
It should have gotten the dump from memory during startup. I have this more than once.

Are you sure, you did not reset the machine in one way or another?

hth,

Michael

Evert Jan van Ramselaar · ‎05-13-2004

Well actually, at first boot:

~~~~~
P00>>> boot
Initializing...
~~~~~

Does "Initializing" indicate that it is resetting itself? Or do resets always have to be performed manually?

If the init does mean reset, we are back to my original question: How do I save a dump that is in memory but not on (swap) disk?

In the past we have played around with the dump_dev settings at the SRM prompt, but this does not seem to produce the desired result.

Some extra info:
System: ES45
Firmware: V6.6
OS: Tru64 v5.1B pk3

EJ

Contrary to popular belief, Unix is userfriendly. It just happens to be selective about who it makes friends with.

Mobeen_1 · ‎05-13-2004

EJ,
Well when you do a boot, the first part of the process is that it inits your CPUs. But i have seen lots of people doing init of a CPU and then boot up where as the boot command will also do that. I am not sure what the reasons for that are. May be some of our colleagues here can highlight any reasons that exist for such an action.

From what i understand, its pretty clear that the boot process has CPU init incorporated into.

regards
Mobeen

Michael Schulte zur Sur · ‎05-13-2004

Hi EJ,

you have given a good point.
There is a console variable, which forces a reset before a boot.
show boot*
will show you boot_reset or so and this is the same as if you hit the reset button, the memory is wiped.

hth,

Michael

Evert Jan van Ramselaar · ‎05-13-2004

@Michael:

~~~~~
# consvar -g boot_reset
boot_reset = ON
~~~~~

This explains why the crashdump is lost at reboot. Thanks for pointing me to this item.

@all:

Thanks for your help!

Contrary to popular belief, Unix is userfriendly. It just happens to be selective about who it makes friends with.

Johan Brusche · ‎05-13-2004

consvar -g boot_reset

consvar -s boot_reset OFF

JB.

_JB_

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

"crash dump complete"... but where?!?

"crash dump complete"... but where?!?

Re: "crash dump complete"... but where?!?

Re: "crash dump complete"... but where?!?

Re: "crash dump complete"... but where?!?

Re: "crash dump complete"... but where?!?

Re: "crash dump complete"... but where?!?

Re: "crash dump complete"... but where?!?

Re: "crash dump complete"... but where?!?

Re: "crash dump complete"... but where?!?

Re: "crash dump complete"... but where?!?

Re: "crash dump complete"... but where?!?

Re: "crash dump complete"... but where?!?