Operating System - OpenVMS
1827838 Members
1506 Online
109969 Solutions
New Discussion

Re: Shortening Memory Dump Time

 
SOLVED
Go to solution
Wim Van den Wyngaert
Honored Contributor

Re: Shortening Memory Dump Time

Karl,

Multiply that by 24 (Jack has 48 GB) and you have 10 minutes.

But the clue process is running detached. So harm should be limited.

I once had the problem that the dump was on the system disk, a shadow merge had to be done due to the crash and a lot of cluster stations without system disk were trying to boot. The clue process was still active after 2 hours and the stations didn't boot correctly.

Wim
Wim
Uwe Zessin
Honored Contributor

Re: Shortening Memory Dump Time

Booting a lot of satellites from one system disk at the same time is not a good idea, because the disk is forced into hopeless thrashing. DEC had developed a staggered boot. I don't recall how they did it, but for a poor-mans solution you could remove the satellite boot configurations from the volatile database.
.
Jack Trachtman
Super Advisor

Re: Shortening Memory Dump Time

Jeff,

Thanks for tracking down the internal info. This confirms what most of have presumed - that the dump I/O is *synchronous* and (at least in my case) is getting written to a non-cached device.

As soon as I get a chance, I'm going to create a DOSD to a SAN disk on another test system here, disable compressed/selective dumping to get the maximum amount of memory dumped, and force a crash. With EVA RAIDed disks and large controller cache I expect to see a noticable reduction in dump time.

Thanks again to everyone
Uwe Zessin
Honored Contributor

Re: Shortening Memory Dump Time

To get it fast on the EVA, you could also turn off the mirrored cache for that virtual disk. If I recall correctly, it is available in recent versions of CV-EVA and as long as the virtual disk is not presented to a server.
.
Jan van den Ende
Honored Contributor

Re: Shortening Memory Dump Time

Jack,


I'm going to create a DOSD to a SAN disk


I have no docs at hand right now, but if my memory serves me well, a DSOD has to be 'directly connected' (somewhere in the chapter about setting it up).
Or does a SAN disk also fit that description?

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Uwe Zessin
Honored Contributor

Re: Shortening Memory Dump Time

Yes, that's possible, although it is hard to grasp from the documentation, because the examples I have seen use an old "DU" device.

http://h71000.www7.hp.com/DOC/722final/6650/6650pro_003.html#index_x_84

""2.14.1.1 System Dump to System Disk on Alpha

If there is more than one path to the system disk, the console environment variable DUMP_DEV must describe all paths to the system disk. This ensures that if the original boot path becomes unavailable because of failover, the system can still locate the system disk and write the system dump to it.

...

Certain configurations (for example, those using Fibre Channel disks) may contain more combinations of paths to the system disk than can be listed in DUMP_DEV.""

Best is to read the whole section as it appears that there are some, well, let's be nice and call them "limitations".
.
Robert Brooks_1
Honored Contributor

Re: Shortening Memory Dump Time

Jan wrote . . .

I have no docs at hand right now, but if my memory serves me well, a DOSD has to be 'directly connected' (somewhere in the chapter about setting it up).
Or does a SAN disk also fit that description?


Yes, a SAN disk does count; just be sure to correctly set the DUMP_DEV environment variable with all the available paths to the device.
You'll most likely need to use the WWIDMGR to make all the paths visible at the console, as you would for a bootable device.
Jack Trachtman
Super Advisor

Re: Shortening Memory Dump Time

Question about DOSD:

I presently have a 1GB disk on our EVA as the cluster quorum disk, with basically nothing on it. I'm thinking of expanding that disk to hold the dump file.

Any potential problems with using a quorum disk to hold a dump file? thanks
Uwe Zessin
Honored Contributor

Re: Shortening Memory Dump Time

Good question. All I/Os would be shared within the same disk group anyway due to the virtualization. It looks like the dump code does small synchronous I/Os, so the chance is good that it does not interfere much with the quorum disk polling.
.
Wim Van den Wyngaert
Honored Contributor

Re: Shortening Memory Dump Time

Uwe,

I solved the boot problem by allowing only 1 node in the startup at the time, thus the nodes are serialy booted. It takes a long time but all nodes boot without any problem.

Wim
Wim
Volker Halle
Honored Contributor

Re: Shortening Memory Dump Time

Wim,

timing the operation for CLUE during startup (from CLUE$STARTUP.LOG) is not a valid measurement for the speed of writing the whole dumpfile. CLUE only needs to read a small part of the dump.

Why not try SDA> COPY/DECOMPRESS NLA0: ? If we assume, that no special performance optimizations have been coded into the COPY command, this should take about the same (or a little bit less) time than writing the dump. It will also perform the decompression - with the same algorithm and CPU load as the compression when writing the dump.

Volker.
Uwe Zessin
Honored Contributor

Re: Shortening Memory Dump Time

Wim,
sorry if I didn't make it clear, but that's what meant by 'staggered boot':
only a subset of satellites it booting at the same time.

Volker,
NLA0: is a record-oriented device. Does this affect your experiment? I've heard about cases where a disk-to-disk COPY was supposed to be faster than a COPY to the null device due to switch to record mode.
.
Volker Halle
Honored Contributor

Re: Shortening Memory Dump Time

Uwe,

the dump file is opened for Block I/O, so it should work fine when copying it to the null device.

Volker.
Volker Halle
Honored Contributor

Re: Shortening Memory Dump Time

Data from a real crash:

ES40 833 MHZ, V7.3-2, DUMPSTYLE=13, DOSD dump to local 9 GB SCSI disk (BF00963643), 4 GB memory, Dumpfile size: 905852 blocks.

Time to dump: 7:29 min = 449 sec
Highest VBN written: 905852.
Uncompressed equivalent: 3142461.
Compression ratio: 3.47 : 1 (28.8%)

Dump write rate: 452 MB in 449 sec = 1 MB/sec

None of the SDA> COPY/DECOMPRESS etc. tests show an equivalent throughput rate (the same dump file as above was used for the tests):

SDA> COPY/DECOMPRESS NLA0: from HSG80 shared disk: 61 sec

SDA> COPY/DECOMPRESS from local SCSI disk: 23 sec

SDA> COPY/COMPRESS TO local SCSI disk: 98 sec

So the only real throughput data has to come from the real crash, no other similar operation provides meaningful data.

Volker.
Jack Trachtman
Super Advisor

Re: Shortening Memory Dump Time

Thanks all