Re: Question on System Dumps and TC

Alzhy · ‎04-29-2004

On HP-UX Systems, should we always size primary swap (being the dump device) the same as system memory - so we can capture a kernel core image during TC's or system crashes?

Also, is it okay/advisable to have my primary swap on a different disk and outside of vg00?

What tool do I need to analyze crash dumps on HP-UX? Specifically I want to view what the processes are when it was TC'd or crashed.

Our systems are very large memory machines with one machine projected to be a close to 100GB memory..

Hakuna Matata.

RAC_1 · ‎04-29-2004

When system crashes or TOC is done two things take place.

1. Memory contents are dumped into dump area.
(This is dump area defined by lvlnboot/crashconf. The dump device can be put in /etc/fstab. Like
/dev/vg00/lvol2 / dump default 0 0)
2. When system boots again, the save crash saves the crash as defined in /etc/rc.config.d/savecrash

You can have seperate dump and swap area, so primary swap need not be equal to physical RAM. Total swap should be 1.5/2 times RAM.
And dump area should be equal to RAM.

But as most of sysadmins use same lvol for swap and dump, it should be equal to RAM and still total swap should be 1.5/2 times RAM. You can define secondary swap for this.)

Analyze the crash
Use Q4 analysis. (Search forms for q4 and you will get lot of details.)
Simple one that I use is as follows.

adb -m /var/adm/crash/crash.0/vmunix /var/adm/crash/crash.0

Once on adb prompt do
msgbuf+8/s

there is also a tool crashinfo

Anil

There is no substitute to HARDWORK

Pete Randall · ‎04-29-2004

Nelson,

Rarely does one need the full crash dump and, obviously, in as situation like yours, you don't need to tie up 100GB of disk space. With large memory systems that are not likely to have significant swapping, swap areas are kept smaller - like half the size of memory or less - and dump usually uses the swap area. I think Clay pointed out recently that the RC engineer who gets to analyze your dump will probably thank you that you kept it small.

Pete

Pete

Todd McDaniel_1 · ‎04-29-2004

Nelson,

My primary swap /dev/vg00/swap is usu only 1024000.

Secondary and teriary swap are the ones that get my swap up to 30 to 50% of my memory. With the way the dumps are used nowadays, it is extremely rare to need a full dump. One of my boxes has 75GB of swap. It would be very impractical to have 75GB of swap for extremely rare crashes...

You rarely need more than 30% of memory for swap, but upto 50% is not unusual.

Regarding swap on vg00, You MUST have at least 1 swap area on vg00. It must be available at boot time. Or your system wont boot. AND MUST BE MIRRORED if you have a disk failure and no mirror your system wont boot either.

tools have been mentioned so I wont bring that up.

Unix, the other white meat.

Sundar_7 · ‎04-29-2004

Hi Nelson,

No. You dont have to size the dump area the same as system memory.

From HP-UX 11.0 you have the option to seletively dump the kernel pages.

# /sbin/crashconf -v

Refer man page of crashconf

If you have 11i, you have the option of compressed dumps.
http://software.hp.com/portal/swdepot/displayProductInfo.do?productNumber=CDUMP11i

Click on the above link.

I believe there is a limitation that the swap/dump area should be within first 2 GB. I could be wrong here.

You can use Q4 to analyze the crash dumps.

ftp://contrib:9unsupp8@192.170.19.51/crash

Visit the above links for tools and documentation

-- Sundar

Learn What to do ,How to do and more importantly When to do ?

Mel Burslan · ‎04-29-2004

correct me if I am wrong but when you are running with memory configurations like 100GB orders, swap is pretty much redundant. Your system better does not swap at all with this much of memory. If it does, it usually means it is not configured right and that much of money spent on buying memory is wasted. You should have a swap space of course but nowhere near the amount of physical memory you have. I would be surprised if any swapping happens on your server. So the purpose of your swap/dump space will be mainly dump space.

And by no means I am an expert in kernel development and debugging but for quite a while (like since ver. 10.20 I could remember), the dumps are being stored in a compressed manner (.gz files) and not taking up as much space as the actual physical memory or anywhere close to it.

Assuming with that much of memory, you would be running one or more large databases, the memory utilization will be what so called sparse, hence the compression ratio will significantly be to your advantage. But again it is very hard to say how much dump space you will need to store the memory image without being a kernel developer.

________________________________
UNIX because I majored in cryptology...

Navin Bhat_2 · ‎04-29-2004

Hello Nelson,
Use crashconf -v to verify that you have enough space and you can use the flags to turn off or on what you dont want dumped based on the problem.

e.g
#crashconf -v
CLASS PAGES INCLUDED IN DUMP DESCRIPTION
-------- ---------- ---------------- -------------------------------------
UNUSED 387390 no, by default unused pages
USERPG 34339 no, by default user process pages
BCACHE 90209 no, by default buffer cache pages
KCODE 2565 no, by default kernel code pages
USTACK 935 yes, by default user process stacks
FSDATA 300 yes, by default file system metadata
KDDATA 37045 yes, by default kernel dynamic data
KSDATA 37041 yes, by default kernel static data

Total pages on system: 589824
Total pages included in dump: 75321

DEVICE OFFSET(kB) SIZE (kB) LOGICAL VOL. NAME
------------ ---------- ---------- ------------ -------------------------
31:0x01a000 310368 1822720 1:0x000001 /dev/vx/dsk/rootdg/swapvol
----------
1822720

You can run crashinfo to get the perprocess stack and also p4/q4 to get into the nitty-gritty of structures etc...

Todd McDaniel_1 · ‎04-29-2004

one more thing...

remember that your /var/adm/crash should be as big as your swap/dump space...

# swapinfo
Kb Kb Kb PCT START/ Kb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 1024000 0 1024000 0% 0 - 1 /dev/vg00/swap
dev 16343040 0 16343040 0% 0 - 0 /dev/vg01/swap1
dev 10240000 0 10240000 0% 0 - 0 /dev/vg01/swap2

# bdf /var/adm/crash
Filesystem kbytes used avail %used Mounted on
/dev/vgcrash/crash 26624000 6788 26204272 0% /var/adm/crash

Unix, the other white meat.

Sanjay_6 · ‎04-29-2004

Hi,

Try this link for info on dump,

http://docs.hp.com/hpux/onlinedocs/os/syscrash.html
and this link from ITRC,

http://www1.itrc.hp.com/service/cki/docDisplay.do?docLocale=en_US&docId=200000072951514

The ITRC doc id is KBRC00012922.

I don't think it is advisable to put the swap space on any VG other than vg00. What you can do is make a small primary swap "lvol2" under vg00 and add additional seconday swaps under other VGs.

You can use Q4 to analyse dumps.

Hope this helps.

Regds

Alzhy · ‎04-29-2004

On a server that we need to understand fully what is going on, we have the following swap config (a 32 GB rp8420):

Kb Kb Kb PCT START/ Kb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 4194304 0 4194304 0% 0 - 1 /dev/vg00/swap
dev 35520512 0 35520512 0% 0 - 0 /dev/swap1/swap
dev 35520512 0 35520512 0% 0 - 0 /dev/swap2/swap
dev 35520512 0 35520512 0% 0 - 0 /dev/swap3/swap

I do not have a separate /var/adm/crash -- as we have /var sized 8-16GB.

My primary swap on all our servers are sized just 4GB, regardless of Memory and being the default dump device - it should only be able to hold 4GB of kernel image...? The reson we so much swap space is the very nature of our processes/apps -- which are very heavy on memory reservations.

In my last TC on this system, dump mentioned that it cannot do a full dump hence it will be doing a partial dump... COnsidering the sizes of the processes, would our "partial crash dump" have captured relevant data about what was going on with the server when we TC'd it?

Hakuna Matata.

Sanjay_6 · ‎04-29-2004

Hi,

A partial dump is not good enough. You need a full dump to do a analysis and know what really happened.

Hope this helps.

Regds

Sundar_7 · ‎04-29-2004

Nelson,

/sbin/crashconf -v has all the answers for you

If you havenâ t fiddled with crashconf then try this

# /sbin/crashconf -v | grep "Total pages included in dump:"

If your dump device is as big as the above output then you are good to go :-)

Sund

Learn What to do ,How to do and more importantly When to do ?

Sanjay_6 · ‎04-29-2004

Hi,

you can use Q4 to pre-process a dump,

http://www1.itrc.hp.com/service/cki/docDisplay.do?docLocale=en_US&docId=200000072399670

The itrc doc id is OZBEKBRC00000611.

Hope this helps.

Regds

Alzhy · ‎04-29-2004

Thank you & Shukria once more to all ye Lords of HP-UX!

I just gathered that since we now will have a fully validated capability to migrate our bootdisks and swap/dump devices to our SAN disks - that we should be able to have larger contigous dump space... (note. SecurePath 3.0D) so we can do full load image analysis...

We are about to deploy a massive application that is a bread and butter division of a large Fortune 100 Shop...

Hakuna Matata.

Todd McDaniel_1 · ‎04-29-2004

From your last statement, I still have major trepidation regarding moving OS and Swap off your local devices.

Just b/c it is possible, doesnt make it feasible or practical, in my mind, but that is up to you and your team.

Just playing a bit of the Devil's advocate...

Unix, the other white meat.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Question on System Dumps and TC

Question on System Dumps and TC