Operating System - OpenVMS
1828371 Members
3054 Online
109976 Solutions
New Discussion

Intermitent Hang or Crash on OpenVMS Cluster V7.3

 
SOLVED
Go to solution
Dominique_11
Frequent Advisor

Intermitent Hang or Crash on OpenVMS Cluster V7.3

Hi,

Cluster configuration: 2 DS20E on OpenVMS V7.3 connected with 1 RA7000.
All disk are shared on RA7000

Some times, the cluster members are Hang or crash. I envisaged to pass the last mandatory patches , but I would wish to benefit from it to parameterize the 2 systems correctly.

Could you help me for setting the correct MIN value parameters in MODPARAMS.DAT according to the result of the Autogen (getdata testfiles).
I will give you informations on 2nd node on attach files.

Thank you in advance.

Best regards,
dominique
10 REPLIES 10
Karl Rohwedder
Honored Contributor

Re: Intermitent Hang or Crash on OpenVMS Cluster V7.3

Dominique,

if you run AUTOGEN from SAVPARAMS to SETPARAMS I doubt, that incorrect system parameters will lead to system crashes. AUTOGEN normally provides sufficient settings for parameters.

If your systems are crashing, it would help, if you could attach e.g. the CLUE$HISTORY file or the CLUE...LIS files of one of these crashes.
If a system hangs and you are no longer able to login, you should force a crash to get a dump and provide info from that.

I suppose Volker is back from Nashua and eager to analyze some crashes :-)

regards Kalle
Jefferson Humber
Honored Contributor

Re: Intermitent Hang or Crash on OpenVMS Cluster V7.3

Hi Dominique,

I doubt that not having your MODPARAMS.DAT file set 100% correctly or a few parameters under cooked will lead to a crash IMHO.

First thing I'd do is get upto the latest ECO levels. From looking at your config the following are available but not installed;

VMS73_F11X-V0400, VMS73_AUDSRV-V0200, VMS73_SYS-V0800, VMS73_XFC-V0500, VMS73_DRIVER-V0700, VMS73_RPC-V0400, VMS73_SHADOWING-V0400, VMS73_FIBRE_SCSI-V0700, VMS73_LAN-V0600, VMS73_SYSLOA-V0400, VMS73_PCSI-V0200, VMS73_RADEON-V0100, VMS73_ACRTL-V0600, VMS73_LMF-V0100.

Have a read through the release notes here -> ftp://ftp.itrc.hp.com/openvms_patches/alpha/V7.3/ see which ones would be applicable to your environment not all will be.

Do you get any kind of dump when the cluster crashes ? If so this maybe mentioned specificially in one of the release notes as being fixed.

I'd be interested to see a passive AUTOGEN report for your system too, this would help to sort your MODPARAMS.DAT file out.

Hope this helps,

Jeff
I like a clean bowl & Never go with the zero
Volker Halle
Honored Contributor

Re: Intermitent Hang or Crash on OpenVMS Cluster V7.3

Dominique,

OpenVMS systems tpyically do not crash due to incorrectly configured system parameters, although it may depend on how 'bad' the system parameters actually have been set.

If you are having system crashes in this cluster, consider to provide the CLUE$HISTORY text file as a start, then we might ask for the CLUE$COLLECT:CLUE$node_ddmmyy_hhmm.LIS files from any of the interesting crashes.

To tune your system, I would suggest that you just run @SYS$UPDATE:AUTOGEN SAVPARAMS SETPARAMS FEEDBACK and then reboot the node.
You shold have a look at the SYS$SYSTEM:AGEN$PARAMS.REPORT file after running AUTOGEN, instead of the AGEN$FEEDBACK.DAT file.

It looks like nonpaged pool is a little bit too small, but there we no allocation failure, so no problem to be expected.

You might also want to set AUTO_ACTION RESTART to prevent the system from remaining at the console prompt, if it should ever issue a HALT in kernel mode (there are some HW errors, which could cause this). With AUTO_ACTION RESTART, the system would write a crash and automatically reboot in such a case.

Volker.
Dominique_11
Frequent Advisor

Re: Intermitent Hang or Crash on OpenVMS Cluster V7.3

Thank you for your all answers. I give you the clue$xxxxxxxxx.list and agen$params.report the files in attach.

For information: When I see the message below in the in the agen$params.report file:

GBLPAGES parameter information:
Feedback information.
Old value was 1616538, New value is 25381910
Maximum used GBLPAGES: 64832
Global buffer requirements: 1572864
Pagelets reserved for memory resident sections: 0
- AUTOGEN parameter calculation has been overridden.
The calculated value was 25243622. The value 25381910
will be used in accordance with the following requirements:
GBLPAGES has been increased by 138288.
GBLPAGES minimum value is 1616538.


What is the best value (MIN_GBLPAGES) to ADD in MODPARAMS.DAT ?

Thank you in advance,

Regards,
Dominique
Steven Schweda
Honored Contributor

Re: Intermitent Hang or Crash on OpenVMS Cluster V7.3

> What is the best value (MIN_GBLPAGES) to
> ADD in MODPARAMS.DAT ?

What difference will it make? The feedback
calculated value (25243622) or the calc'd
plus ADD_ value (25381910) is far above
(15X) the current MIN_ value (1616538).

If the calculated value(s) work, and if
they're expected to be calculated properly in
the future, then why worry about the MIN_
value?

MIN_ and/or ADD_ values are useful when the
calculated values are too low, and AUTOGEN
needs some help to get reasonable values.
Volker Halle
Honored Contributor

Re: Intermitent Hang or Crash on OpenVMS Cluster V7.3

Dominique,

the value of SWAPFILE1_SIZE = 2147139636 looks a little bit 'suspicious'. It looks like a hex P1 space address, so somehting might have gone wrong when running AUTOGEN... Other than this, AGEN$PARAMS.REPORT looks o.k.

The XQPERR crash is a known footprint, but can be caused by different problems - analysis could be quite difficult. It would be suggested to first install VMS73_F11X-V0400 (and pre-requisite patches) and continue troubleshooting, if the crash happens again afterwards.

Volker.
Dominique_11
Frequent Advisor

Re: Intermitent Hang or Crash on OpenVMS Cluster V7.3

Thank you very much for your answers.

So, I must plan an intervention with my customer for execute the following actions

Update firmware DS20E
Installation of the last patches Launching of a autogen on the 2 system

I would inform you when the operations will be made.

Best regards,
Dominique
Jefferson Humber
Honored Contributor

Re: Intermitent Hang or Crash on OpenVMS Cluster V7.3

Hi Dominique,

I'd agree with Volker, go with the VMS73_F11X-V0400 as a first pass.

It's interesting to note though that in the VMS73_UPDATE-V0300 release notes a near identical issue was corrected XQPERR @ SEARCH_FCB_C+00E70, but this kit is already installed on your system.

Jeff
I like a clean bowl & Never go with the zero
Volker Halle
Honored Contributor
Solution

Re: Intermitent Hang or Crash on OpenVMS Cluster V7.3

Jeff,

both the more recent patches VMS73_F11X-V0300 and VMS73_F11X-V0400, as well as VMS73_F11X-V0200 (which is the patch currently installed), describe solutions for possible XQPERR crashes in SEARCH_FCB_C in conjunction with a FID_TO_SPEC operation. Sometimes it may take a couple of attempts to solve a problem or there may be different problems leading to the same/similar crash footprint.

Dominique,

please be aware, that you might get stuck, if the problem is not solved in an already existing V7.3 patch, as this version is long out of support. You may want to consider upgrade to V7.3-2.

Volker.
Jefferson Humber
Honored Contributor

Re: Intermitent Hang or Crash on OpenVMS Cluster V7.3

Volker,

Sorry my mistake, for some reason I thought he V0300 Update kit included the VMS73_F11X-V0300 kit, but you are quite correct it only rolls upto the V0200 patch.

Jeff
I like a clean bowl & Never go with the zero