Operating System - OpenVMS
1827872 Members
1135 Online
109969 Solutions
New Discussion

Shortening Memory Dump Time

 
SOLVED
Go to solution
Jack Trachtman
Super Advisor

Shortening Memory Dump Time

VMS V7.3-2
GS1280
48GB memory
shadowed system disk on local I/O shelves

We have recently grown out GS1280 to 48GB and
last week had our first (hardware) crash on this configuration. I was flabbergasted to find that it took 30 minutes to dump memory to disk!!!

We have DUMPSTYLE set to the default of 9 (for a compressed, selective dump).

Is there any way to reduce the amount of time for the memory dump? (e.g, would creating a dedicated disk on our EVA5000 and enabling DOSD (Dump Off System Disk) appreciably reduce the dump time, or would this be a wasted effort?) Thanks
39 REPLIES 39
Bill Hall
Honored Contributor

Re: Shortening Memory Dump Time

Jack,

VMS V7.3-2
ES40 4-667Mhz
16GB Memory
system disk is 13GB LUN on XP1024
page and swapfiles located on a 4 member SW-RAID stripeset (LUNS on XP1024)
Dual 2GB fibre HBAs to SAN

DOSD to local 36GB SCSI
DUMPSTYLE 14

Last crash dump took 2 hours and 40 minutes. I recall being told by CSC that 10 minutes per GB of memory was average for large memory systems.
Bill Hall
Jack Trachtman
Super Advisor

Re: Shortening Memory Dump Time

Bill,

Did you really mean to say 10 minutes/GB?
For us that would mean that we only dumped
3 GB (compressed)
John Gillings
Honored Contributor

Re: Shortening Memory Dump Time

Jack,

How big is your dump file?

Enabling DOSD might not speed things up significantly (for an equivalent size dump file), but it won't be wasted effort as it gives you more flexibility in managing dumps.

Since you're using a selective dump style, you can speed things up by making your dump file smaller. Processes will be dumped until the file is full.

The tradeoff is if you make it too small, you may not get all the information you need to analyze the crash. See SYS$SYSTEM:SYS$DUMP_PRIORITY.DAT to control which processes get dumped first.

Unfortunately you can't really predict how long a particular configuration will take to dump. You now have one data point. Change the size of your dump file and note how long it takes next time. After a while you should gather enough information to make an informed choice as to the best size for your environment.
A crucible of informative mistakes
Jack Trachtman
Super Advisor

Re: Shortening Memory Dump Time

John,

My dump file size is based on AUTOGEN recommendations (which I beleive is total memory + room for errlog buffers).

The idea of limiting the dump file size is something I've never heard of before, and sounds like it would be in the "unsupported" realm. Are you actually doing this?

BTW - the file SYS$SYSTEM:SYS$DUMP_PRIORITY.DAT doesn't exist on my system disk.

Thanks
Volker Halle
Honored Contributor

Re: Shortening Memory Dump Time

Jack,

if you are using selective dumps (DUMPSTYLE bit 0 set), you don't need a dumpfile size equivalent to your physical memory size, but AUTOGEN will take this into account as well as the compression bit. Selective dumps are completely supported, let AUTOGEN GETDATA GENFILES figure out the suggested dumpfile size for you.

In the case of selective dumps, the sytem space will be dumped first, followed by 'important' processes, followed by the other processes until the dumpfile is full. Which processes to dump first could be fine-tuned using SYS$DUMP_PRIORITY.TEMPLATE -> .DAT.

Depending on the type of crash problem, you seldom need many or all of the processes in the dump, maybe except for complicated system/process hangs.

See Chapter 2 Managing Page, Swap, and Dump Files of the System Manager's Manual Volume 2:

http://h71000.www7.hp.com/doc/732FINAL/aa-pv5nh-tk/aa-pv5nh-tk.HTMl

To get any idea on how long writing a dumpfile may take, you could note the time of the SYSGEN> CREATE dev:[dir]SYSDUMP.DMP/SIZ= command with highwater-marking turned on (SET VOL/HIGH dev:). Don't do this on the system disk, as it tends to block lots of file system activity. Maybe try it on a temporary disk of the same type first.

Volker.
John Abbott_2
Esteemed Contributor

Re: Shortening Memory Dump Time

Hi Jack,

Look for SYS$SYSTEM:SYS$DUMP_PRIORITY.TEMPLATE as an example, copy it to .dat and make the changes
required. It works for us, although I must confess that I can't remember the last unplanned system dump in production, must be a few years...

Kind Regards
John.
Don't do what Donny Dont does
Jan van den Ende
Honored Contributor

Re: Shortening Memory Dump Time

Jack,

System manager's Manual, Volume 2: Tuning, Monitoring and Complex Systems
contains an entire chapter on this titled:
Minimising System Dump File Size When Disk Space Is Unsufficient.

Sounds hardly "unsupported" to me!

I guess you may equally apply this "when available dump time is unsufficient"

And if you have no SYS$SYSTEM:SYS$DUMP_PRIORITY.DAT, then I guess something is missing at your site.
If you really haven't got it, just ask, and i will post it. It is just a smallish ACSII file.

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Bill Hall
Honored Contributor

Re: Shortening Memory Dump Time

Jack,

Yes, I did mean 10 minutes per every 1GB of memory. Our DUMPSTYLE setting includes full console output. It really doesn't write a lot of text to the console, but it does give a indication of where it is in the process of going through memory and a relative indication of how fast it is processing.

From the last crash we had:
** Bugcheck code = 000007EC: MULDEALNPAG, Multiple deallocation of nonpaged pool
.
.
.
Memory dump complete, 18349659 blocks used of 33562445 blocks in dump file...

Bill Hall
Jeff Chisholm
Valued Contributor

Re: Shortening Memory Dump Time

To recalculate the size of your dumpfile, rename the existing one out of the way so that Autogen doesn't see it. Verify that dumpstyle is set to 9 in modparams.

@sys$update:autogen getdata testfiles nofeedback

Take Autogen's recommendation and add 10% standard slack. Compare that to the file that you renamed...

Correct way to remove the old file is rename, reboot, delete. Or just create a new version, reboot, delete old version.
le plus ca change...
Uwe Zessin
Honored Contributor

Re: Shortening Memory Dump Time

10 Minutes per GigaByte means only 1.66 MegaBytes / second

30 Minutes per 48 GigaByte means about 26.66 MegaBytes /second

I wonder what else the first system is doing during the dump...
A 36 GigaByte disk should do better for a spiral transfer.
The dumpfile is contiguous, isn't it?
.
Jan van den Ende
Honored Contributor

Re: Shortening Memory Dump Time

Uwe wrote


The dumpfile is contiguous, isn't it?


I should hope so! AFAIK, the dumpfile is written without ant filesystem knowledge (it has to be, must also be available for filesystem trouble). So, no way to handle fragmentations.

If you create DUMPFILE in a supported way, it WILL be forced to be contiguous.

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Uwe Zessin
Honored Contributor

Re: Shortening Memory Dump Time

Correct me if I'm wrong, but as far as I know, it must be _within_ a single file header. Just checked AUTOGEN.COM and it creates/extends files without the /CONTIGUOUS qualifier.

Hey, even better:
I attach the output from my system - I hope you trust me that this is not made up!
.
Jeff Chisholm
Valued Contributor

Re: Shortening Memory Dump Time

A primitive boot driver is used to write the data. Block transfers, not file access. Note that the 'modification date' never gets updated on the file.

Transfer speed will vary depending on your disk and controller configuration. Contention would possibly be a factor if you're in a cluster.

A dumpfile is 'best try contiguous' unless you specify the /CONT qualifier in the create command from Sysgen. This means you can have as many fragments on disk as you can describe in a single block header. This was once 6 fragments, I haven't checked lately.

If you try to extend the file several times, or create it on a fragmented disk, you'll get a 'file header full' error.
le plus ca change...
Travis Craig
Frequent Advisor

Re: Shortening Memory Dump Time

Volker,

When you listed the priority order of things to be dumped, do global pages fit in there somewhere? We have a system of applications that use tons of global pages for their shared databases.

--Travis
My head is cold.
Bill Hall
Honored Contributor

Re: Shortening Memory Dump Time

Uwe,

The disk is actually an 18GB RZ1EA-VW. It is on a local SCSI bus, I don't recall which controller. The only file on the disk is the dump file and it has three extents the same as yours.

I've been told the issue is as Jeff stated earlier, the very primitive driver used to write to the dump file. Supposedly its the way it has to be to be somewhat assured you can write anything to dump file when you don't know what may be broken.

You really want to do a full dump and have a dump file that is large enough for your current memory config. There's nothing more frustrating then a crash, doing a selective dump or having too small a dump file and then not being able to analyze the dump and get to a definitive cause of the crash. So you get to do it all over again.

Jack,
Have you had the dump analyzed yet? Did you by chance copy it to see exactly how much was written to the file? Your 1280 has got to be at least 2 and maybe 3 times faster than the ES40. You might have processed and written much more. Just curious.
Bill Hall
Volker Halle
Honored Contributor
Solution

Re: Shortening Memory Dump Time

Travis,

GLOBAL PAGES will only be dumped, if they are in the workingset of any KEY processes (currrent processes on all CPUs, followed by those in the internal or user specified priority list, followed by those in RWxxx state) or - in the last step when writing a selective dump file - if they are in the workingset of any other processes, after dumping those, if there is still room in the dump file.

Details are in the manual referred to earlier under the heading:

Understanding the Order of Information in a Selective System Dump

The SDA> SHOW DUMP command will show you things like, highest VBN written, compression ratio and information about the LMBs (Logical Memory Block) saved in the selective dump.

Volker.
Uwe Zessin
Honored Contributor

Re: Shortening Memory Dump Time

The DS-RZ1EA-VW disk drive does 7,200 RPM, but its internal data rate is specified with 240 MegaBits / second, max. (whatever that means).
.
Jack Trachtman
Super Advisor

Re: Shortening Memory Dump Time

Some notes:

- the dump file at the time of the crash
was 4,479,786 blocks
(- AUTOGEN is recommending a dump file of
9,668,246 blocks)
- the ANA/CRASH SHO DUMP/HEADER shows "Count of blocks dumped for memory 00443681" which I believe is hex for 4,470,401 decimal

So if I understand all of this, it took 30 minutes on our GS1280 to dump about 2.2GB of memory to a local 15K RPM disk, which is just over a 1 MB/sec.

Am I looking at this correctly? Do I have some kind of configuration or hardware problem here? (Thanks for all the responses so far)
Uwe Zessin
Honored Contributor

Re: Shortening Memory Dump Time

Hello Jack,

I get the same numbers like you when doing the calculations. You might put a file of same size on another disk and time a COPY/OVERLAY while the dumpfile is not in use.

It would be interesting to find out how the memory dump is written to the disk. I now suspect it is doing lots of small I/Os. Perhaps somebody with access to the source code can find out what is going on.

And somebody with a HSG or HSZ controller might want to induce a crash and watch the I/O numbers with VTDPY.
.
Volker Halle
Honored Contributor

Re: Shortening Memory Dump Time

Jack,

the data from both your GS1280 and Bill's ES40 show about the same throughput of about 1 MB/sec. Remember this is most probably synchronous, non-double-buffered IO...

There is supposed to be an article about OpenVMS Bugchecks in the next technical journal, maybe it also talks about performance.

Volker.
Jeff Chisholm
Valued Contributor

Re: Shortening Memory Dump Time

Hi Folks,
I was seeing some information in here that I knew was just a bit off, so I went to VMS engineering for a definitive answer. My source doesn't want to post here, prefers to work formal customer cases when, and if, they get elevated. Touchy subject really, being the model employee and replying here. It's not meant to be a formal support channel, but this set of questions clearly deserves a real answer.

So here's your primer for dumpfile related stuff. I'll organize it a bit differently and put it in the knowledge base for posterity.
Regards, /jeff

le plus ca change...
Volker Halle
Honored Contributor

Re: Shortening Memory Dump Time

Jeff,

thanks for this information ;-)

So the only thing we can do is collect information about the data rate obtained when writing big dumpfiles on different configurations (this note has all the info needed to collect this data) and provide feedback here. Then if someones config provides much better (or worse) data rates, it's time to start checking again.

Volker.
Wim Van den Wyngaert
Honored Contributor

Re: Shortening Memory Dump Time

Also note that when you boot, the dump file will be analyzed by clue$startup. It will also read the X GB and delay the boot.

Just to have an idea how much : can someone check clue$startup.log on a system that crashed and had a lot of memory to dump ?

Wim


Wim
Karl Rohwedder
Honored Contributor

Re: Shortening Memory Dump Time

Maybe not really big :-):

A 1.000.000 block dumpfile on a DS20 (2GB mem) on a U320 SCSI disk takes 25 seconds elapsed for CLUE on boot.

regards Kalle