Operating System - OpenVMS
1748205 Members
4641 Online
108759 Solutions
New Discussion

Re: A COM process on OpenVMS guests

 

Re: A COM process on OpenVMS guests

Yes. Hoff,

 

The cases have already been reported to HP for some time, investigated by HP-UX and the Lab according to HP. Now the OpenVMS teams are involved as well.

 

We have installed a third UNOF patch for the HPVM after these crashes. I feeled a bit embarrassed when opened this post,  because I think there could also be UNOF support in the communities, at least the valuable inputs. "Two heads are better than one."

 

Volker,

 

The host blade has only one cpu of 4 cores. I found the guest runs and crashes on a random core on which it starts.

 

#top
System: ITSEELM-                                      Fri Mar 23 05:07:07 2012
Load averages: 0.50, 0.45, 0.36
203 processes: 127 sleeping, 76 running
Cpu states:
CPU   LOAD   USER   NICE    SYS   IDLE  BLOCK  SWAIT   INTR   SSYS
 0    0.64   0.0%   0.0%  59.8%  40.2%   0.0%   0.0%   0.0%   0.0%
 1    0.21   0.0%   0.0%   3.2%  96.8%   0.0%   0.0%   0.0%   0.0%
 2    0.52   0.0%   0.0%  16.2%  83.8%   0.0%   0.0%   0.0%   0.0%
 3    0.64   0.2%   0.0%  85.6%  14.2%   0.0%   0.0%   0.0%   0.0%
---   ----  -----  -----  -----  -----  -----  -----  -----  -----
avg   0.50   0.0%   0.0%  41.2%  58.8%   0.0%   0.0%   0.0%   0.0%

System Page Size: 4Kbytes
Memory: 32473240K (32211056K) real, 34537852K (33776996K) virtual, 55907024K fre
e  Page# 1/23

CPU TTY  PID USERNAME PRI NI   SIZE    RES STATE    TIME %WCPU  %CPU COMMAND
 3   ?  3949 root     152 20  3253M  3151M run    105:52 82.10 81.95 hpvmapp   <-guest process
 3   ?  3829 root     152 20  3253M  3151M run     96:28 20.33 20.30 hpvmapp
 2   ?  3863 root     152 20  3253M  3151M run     85:52  8.32  8.31 hpvmapp
 2   ?  4039 root     152 20  3253M  3151M run    150:12  7.63  7.61 hpvmapp
 1   ?  3893 root     152 20  3253M  3151M run    121:31  7.03  7.02 hpvmapp
 0   ?  3930 root     152 20  3253M  3151M run     67:21  4.04  4.03 hpvmapp


#machinfo

CPU info:
  1 Intel(R)  Itanium(R)  Processor 9340 (1.6 GHz, 20 MB)
          4.79 GT/s QPI, CPU version E0
          4 logical processors (4 per socket)

Memory: 98198 MB (95.9 GB)

Firmware info:
   Firmware revision:  01.24
   FP SWA driver revision: 1.18
   IPMI is supported on this system.
   BMC firmware revision: 1.30

Platform info:
   Model:                  "ia64 hp Integrity BL860c i2"
   Machine ID number:      97be7d8e-6105-11e0-8a68-294e7ff67a42
   Machine serial number:  CZ31099TF4

OS info:
   Nodename:  ITSEELM-
   Release:   HP-UX B.11.31
   Version:   U (unlimited-user license)
   Machine:   ia64
   ID Number: 2545843598
   vmunix _release_version:
@(#) $Revision: vmunix:    B.11.31_LR FLAVOR=perf

And I also uploaded the list files in clue$collect.

 

Best regards,

Richard

 

 

Volker Halle
Honored Contributor

Re: A COM process on OpenVMS guests

Richard,

 

4 unusual crashes within 4 days. Something is really wrong here:

 

PGFIPLHI in SWP$SHELL_INIT_C+00DA1 trying to execute st8 [r31] = r20 with R31=40000000.00000000

 

KRNLSTAKNV in SCH$INTERRUPT_C+00B90

 

and 2 MACHINECHK crashes on a HP VM virtual machine ?

 

Only HP will be able to help you here ...

 

Do you have other guests running on this HP VM node ? Any of them also seeing unusual problems ?

 

Volker.

Hoff
Honored Contributor

Re: A COM process on OpenVMS guests

Having looked at these OpenVMS crashes for years and years, making any headway without access to the crashdump files and without source code listings for OpenVMS is a whole lot of work; what's tricky at best becomes intractable.  

 

And that's for VMS running native-booted.

 

This particular configuration is exceptionally complex, and I've found each of the layers here (HP-VM, HP-UX, Integrity, Itanium) can and variously does introduce errors.

 

CLUE CRASH is good for the "saw that already" crashes that can be automatically scanned and identified, if HP is using those sorts of crash-scanning tools.  For figuring out why VMS face-planted, the dump file is (for me) far more interesting. 

 

Given you've already undoubtedly run the error logs and related and looked for hardware glitches and HP has run the CLUE CRASH past whatever they're using these days, if in your situation, I'd next get rid of HP-UX and HP-VM here, and boot OpenVMS native onto the Tukwila hardware.  That'll either change the footprint substantially, eliminate the crashes entirely, or (because VMS has a better view into the hardware, if that's the trigger here) potentially identify the error.

 

And FWIW, I continue to be surprised that folks are willing to do this with production environments and don't choose native boot either directly or via EFI-level partitioning.  While I do grok the "cool factor" and the "power and cooling" decisions of VMs, the particular nature of HP's VM implementation for Itanium (you can't boot a VM on the VM here, for testing or debug) and the "moving target" that is the VMS error-decoding tools, makes for a very hairy stack here.  You're basically beholden to HP Support with this and similar cases, and across four (VMS, VM, UX, Integrity) HP entities.

 

Volker Halle
Honored Contributor

Re: A COM process on OpenVMS guests

Richard,

 

are you aware of this article from the OpenVMS technical journal V16 ?

 

OpenVMS Guest Troubleshooting

 

http://h71000.www7.hp.com/openvms/journal/v16/troubleshooting.html

 

Maybe you can use some of the troubleshooting guidelines in that article to obtain more and better information.

 

Volker.