- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: A COM process on OpenVMS guests
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-17-2012 12:29 PM
03-17-2012 12:29 PM
A COM process on OpenVMS guests
Hi HP Community,
Could you please advice on how to trouble a process in a COM state?
OS: OpenVMS 8.4 guest on HPVM
Process: ixiikc
We have run the same program in hundreds of the servers and I haven't seen the same problem on a physical server. But we have seen this on the virtual guests.
Docs I have read some.
http://labs.hoffmanlabs.com/node/231
BA554-90017 HP OpenVMS System Analysis Tools Manual
Best regards,
Richard
IKEA IT
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-17-2012 01:17 PM - edited 03-17-2012 01:44 PM
03-17-2012 01:17 PM - edited 03-17-2012 01:44 PM
Re: A COM process on OpenVMS guests
A computable process is just that; a process in a computable state.
A computable process can represent a very well-tuned system, or it can indicate an application-level error.
An infinite loop or one of its variations, or an algorithm that doesn't scale well to the data (eg: polynomial time), etc.
It's all about the context for the application processing.
Determining the context involves figuring out whether the process is doing usable work or not, and whether the application code is operating with sufficient effiiciency. And where the process is spending its time.
Source code that runs on hundreds of servers does not confirm that the source code is bug free, nor that it's stable, nor well-written. Bugs in application code can be latent for years, and bugs can be triggered by differences in timing. And HP-VM can most certainly trigger differences in timing. And bugs can arise in even some of the best-written code, too; whether application code, or OpenVMS, or HP-VM.
Looking at this from my perspective, that you are asking this question in this particular fashion (and comparative lack of detail), and that you're citing a topic that's probably not germane to a computable process (here's one that's closer to the target), and that you have no program counter traces and have posted no details of the errant code, can all be inferred to strengthen the circumstantial case against this particular source code, too. That there's a bug somewhere in this application code.
Do your due dilligence here with the application source code, and either rule in your code as the trigger for the looping, or rule it out. Start by sampling the PCs in the loop; that'll tend to reduce the scope of the error. Use integrated debugging and integrated logging where that's available, and add that where it's not.
OpenVMS is certainly not error free. It does see a whole lot of use. And this case may well prove to be an OpenVMS bug or an HP-VM bug. But given the error is arising in your application code, you own figuring out if this is your bug, or if it's a bug in some supporting code.
See this recent thread for somebody that learned something about latent bugs in existing code.
And this old topic has a list of common source code bugs to look for in existing code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-18-2012 01:17 AM - edited 03-18-2012 10:25 PM
03-18-2012 01:17 AM - edited 03-18-2012 10:25 PM
Re: A COM process on OpenVMS guests
Richard,
an OpenVMS process, which has used 10:50 hours of CPU time in 11:57 hours of existance, while only doing minimum buffered IO, direct IO and pagefaults is certainly suspect of 'looping'. Except if it is supposed to do complex CPU-intensive work, which you would know about, right ?
The process only appears to be in COM state all the time, in reality, it is in CUR state (i.e. owning the CPU) most of the time, but you only see it in COM state, if this a system with just one CPU, which is executing your SHOW SYSTEM or SHOW PROC/CONT/ID=xx code at that time.
The best tool to obtain information about this process is PCS (the SDA extension PCS$SDA, PCS = PC sampling). This tool can be started in the running system and will collect PC values from the overall system or just that process.
$ ANALYZE/SYSTEM
SDA> PCS ! to get get some help
SDA> PCS LOAD
SDA> PCS START TRACE ! while your process is running/looping
SDA> PCS STOP TRACE ! stop trace after a couple of seconds
SDA> SET PROC ixiikc ! set context to looping process to allow SDA to symbolize adresses
SDA> PCS SHOW TRACE/STAT ! will show, which PC trace value has been captured how often
SDA> PCS SHOW TRACE/PID=<pid-of-your-process> ! show collected PC values in your process
SDA> PCS UNLOAD
SDA> EXIT
You will need to find the PC values collected from your process and map them to the source code, this needs access to the current linker map and source listing (machine code) listing, if the PC values are actually in P0 space.
If you carefully look at the data posted from the PCB (Process Control Blocks), you'll spot:
PCB$L_EXEC_COUNTER 003BF532
This is equivalent to about 10.9 hours spent in EXECUTIVE mode in the context of this process ! Let me guess: either RMS or Oracle RDB (or some other code running in EXEC mode).
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-19-2012 08:52 AM
03-19-2012 08:52 AM
Re: A COM process on OpenVMS guests
Hi,
Please accept my greetings from China.
I have read the docs in your reply and from that some more related docs via google.
I would be interested in these internal stuff of OpenVMS although it's too much to a system manager.
I don't have access to the code but I could check the development when it is must.
And I will try to use the English as much as possible I have.
->
The looping process ixiikc is gone when the guest crashed yesterday morning.
We have seen some errors in operator.log and app logs before the crash besides the ixiikc looping. And after the crash the it looks normal and calm both the system and application.
Before the crash->
M00029/SYS0.SYSMGR> sear operator.log.-1 "-e-","-f-","-w-"
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=0000000000000035, PC=FFFFFFFF922809A0, PS=0000000B
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=0000000000000035, PC=FFFFFFFF922809A0, PS=0000000B
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=0000000000000035, PC=FFFFFFFF92280940, PS=0000000B
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=0000000000000035, PC=FFFFFFFF922809A0, PS=0000000B
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=0000000000000035, PC=FFFFFFFF922809A0, PS=0000000B
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=0000000000000035, PC=FFFFFFFF922809A0, PS=0000000B
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=0000000000000035, PC=FFFFFFFF92280A00, PS=0000000B
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=0000000000000035, PC=FFFFFFFF922809A0, PS=0000000B
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=0000000000000035, PC=FFFFFFFF922809A0, PS=0000000B
%COSI-F-BUGCHECK, internal consistency failure
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=0000000000000035, PC=FFFFFFFF922809A0, PS=0000000B
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=0000000000000035, PC=FFFFFFFF922809A0, PS=0000000B
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=0000000000000035, PC=FFFFFFFF922809A0, PS=0000000B
%SYSTEM-F-ACCVIO, access violation, reason mask=04, virtual address=000000008081E000, PC=FFFFFFFF91D33040, PS=0000000B
M00029/MHS.LOGFILES> sear IXLC2_2.LOG.-1 "++++"
++++++++++ Error in batch!!!!!!!!!! ICQA2 dumped.
++++++++++ Error in batch!!!!!!!!!! ICQA4 dumped.
++++++++++ Error in batch!!!!!!!!!! IXID7 dumped.
After the crash->
M00029/SYS0.SYSMGR> sear operator.log "-e-","-f-","-w-"
%SEARCH-I-NOMATCHES, no strings matched
M00029/MHS.LOGFILES> sear IXLC2_2.LOG "++++"
%SEARCH-I-NOMATCHES, no strings matched
I have run analyze/crash_dump and attached output here.
M00029> ana/crash SAVEDUMP.DMP
OpenVMS system dump analyzer
...analyzing an I64 compressed full memory dump...
Dump taken on 18-MAR-2012 06:12:08.67 using version V8.4
MACHINECHK, Machine check while in kernel mode
SDA> set output /nohead sysdump.lis
SDA> read/exec
SDA> show crash
%SDA-W-NOREAD, unable to access location 00000000.00000088
SDA> show stack
SDA> show summary
SDA> show process/pcb/phd/reg
SDA> show symbol/all
SDA> exit
So could you find something related to the process looping and accvio errors in the crash, and the bad code behind.
Best regards,
Richard
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-19-2012 09:19 AM
03-19-2012 09:19 AM
Re: A COM process on OpenVMS guests
Richard,
a MACHINECHK crash is typically caused by a hardware problem, to see this kind of crash on a HPVM guest seems unusual. You need to extract the ERRLOG entries from the dump and process them with WEBES/SEA/'whatever OpenVMS errlog analysis tool of the day' to find out about the underlying HW error:
$ ANALYZE/CRASH SAVEDUMP.DMP
SDA> CLUE ERRLOG
...
SDA> EXIT
This will generate SYS$SCRATCH:CLUE$ERRLOG.SYS, you need to decode the entries in that errorlog file.
The %COSI-F-BUGCHECK message found in OPERATOR.LOG definitely points to Oracle RDB.
You will have to wait for the problem to re-appear, the looping iixikc process was not existant anymore at the time of the system crash.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-19-2012 11:28 PM
03-19-2012 11:28 PM
Re: A COM process on OpenVMS guests
Hi Volker,
SDA> clue errlog
Dumpfile Errorlog Entry Information:
------------------------------------
Sequence Date Time Error Message Type
-------- ----------- ----------- --------------------------------
396 18-MAR-2012 06:12:08.67 Machine Check 670
397 18-MAR-2012 06:12:08.67 * Crash Entry
Config Entry and Errlog Entries written to CLUE$ERRLOG.SYS file.
Use System Event Analyzer or Error Log Viewer to analyze.
I have SEA analyzed the file generated and attached below.
Best regards,
Richard
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-19-2012 11:44 PM
03-19-2012 11:44 PM
Re: A COM process on OpenVMS guests
Richard,
if I see the multiple question marks in the SEA output, I would conclude, that this version of SEA probably does not know about HPVM at all. I cannot see any error except that this HPVM guest system incurred a 670 UnCorrectable Processor Event - whatever this means in the context of a HPVM guest system.
Maybe check with the system managers of the HPVM host, whether anything has been reported on the underlying hardware system at that time and/or if they have done anything to your 'guest' system at the time of this MACHINECHK crash.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-20-2012 03:19 AM
03-20-2012 03:19 AM
Re: A COM process on OpenVMS guests
Voker,
We manage the host as well and we haven't changed the hardware / hpvm software, installed any patch etc.
We have seen the machine check on another guest m00238 as well.
M00238> ana/crash SAVEDUMP_238_120319.DMP
OpenVMS system dump analyzer
...analyzing an I64 compressed full memory dump...
Dump taken on 19-MAR-2012 23:53:54.40 using version V8.4
MACHINECHK, Machine check while in kernel mode
SDA> clue errlog
Dumpfile Errorlog Entry Information:
------------------------------------
Sequence Date Time Error Message Type
-------- ----------- ----------- --------------------------------
7221 19-MAR-2012 23:53:54.40 Machine Check 670
7222 19-MAR-2012 23:53:54.40 * Crash Entry
Config Entry and Errlog Entries written to CLUE$ERRLOG.SYS file.
Use System Event Analyzer or Error Log Viewer to analyze.
SDA> exit
We also see other crashes although I guest it might be the same root cause.
M00238> ana/crash SAVEDUMP_238_120317.DMP
OpenVMS system dump analyzer
...analyzing an I64 compressed full memory dump...
Dump taken on 17-MAR-2012 12:34:59.46 using version V8.4
PGFIPLHI, Pagefault with IPL too high
SDA> exit
Best regards,
Richard
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-20-2012 03:27 AM
03-20-2012 03:27 AM
Re: A COM process on OpenVMS guests
Richard,
are those HPVM guests using dedicated physical Itanium CPUs/cores ? Otherwise, how would you know, which physical processor has caused the problem, if you're running as a virtual machine guest ?
A PGFIPLHI crash ist most likely a software problem in OpenVMS. To get more information about that type of crash, please consider to provide the CLUE file (see CLUE$COLLECT:CLUE$node_ddmmyy_hhmm.LIS from that crash) or at least provide the output from SDA> CLUE CRASH (which shows the failing module and offset).
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-20-2012 05:38 AM
03-20-2012 05:38 AM
Re: A COM process on OpenVMS guests
Presuming this code does not incorporate kernel-mode software of its own, this looks like flaky hardware or flaky "hardware" (HP-VM) or flaky OpenVMS or kernel-mode software, and -- all the discussions of the RAS features aside -- these glitches can and do arise with some Itanium boxes.
You're arguably extending the time to resolution here by pursuing and debugging this here in HPSC. Call HP support. You've paid for that access privileges, after all. Pass along the CLUE CRASH data or potentially the full carcasses from the crashes to HP, and have them sort this out.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-22-2012 09:18 PM - edited 03-22-2012 09:19 PM
03-22-2012 09:18 PM - edited 03-22-2012 09:19 PM
Re: A COM process on OpenVMS guests
Yes. Hoff,
The cases have already been reported to HP for some time, investigated by HP-UX and the Lab according to HP. Now the OpenVMS teams are involved as well.
We have installed a third UNOF patch for the HPVM after these crashes. I feeled a bit embarrassed when opened this post, because I think there could also be UNOF support in the communities, at least the valuable inputs. "Two heads are better than one."
Volker,
The host blade has only one cpu of 4 cores. I found the guest runs and crashes on a random core on which it starts.
#top
System: ITSEELM- Fri Mar 23 05:07:07 2012
Load averages: 0.50, 0.45, 0.36
203 processes: 127 sleeping, 76 running
Cpu states:
CPU LOAD USER NICE SYS IDLE BLOCK SWAIT INTR SSYS
0 0.64 0.0% 0.0% 59.8% 40.2% 0.0% 0.0% 0.0% 0.0%
1 0.21 0.0% 0.0% 3.2% 96.8% 0.0% 0.0% 0.0% 0.0%
2 0.52 0.0% 0.0% 16.2% 83.8% 0.0% 0.0% 0.0% 0.0%
3 0.64 0.2% 0.0% 85.6% 14.2% 0.0% 0.0% 0.0% 0.0%
--- ---- ----- ----- ----- ----- ----- ----- ----- -----
avg 0.50 0.0% 0.0% 41.2% 58.8% 0.0% 0.0% 0.0% 0.0%
System Page Size: 4Kbytes
Memory: 32473240K (32211056K) real, 34537852K (33776996K) virtual, 55907024K fre
e Page# 1/23
CPU TTY PID USERNAME PRI NI SIZE RES STATE TIME %WCPU %CPU COMMAND
3 ? 3949 root 152 20 3253M 3151M run 105:52 82.10 81.95 hpvmapp <-guest process
3 ? 3829 root 152 20 3253M 3151M run 96:28 20.33 20.30 hpvmapp
2 ? 3863 root 152 20 3253M 3151M run 85:52 8.32 8.31 hpvmapp
2 ? 4039 root 152 20 3253M 3151M run 150:12 7.63 7.61 hpvmapp
1 ? 3893 root 152 20 3253M 3151M run 121:31 7.03 7.02 hpvmapp
0 ? 3930 root 152 20 3253M 3151M run 67:21 4.04 4.03 hpvmapp
#machinfo
CPU info:
1 Intel(R) Itanium(R) Processor 9340 (1.6 GHz, 20 MB)
4.79 GT/s QPI, CPU version E0
4 logical processors (4 per socket)
Memory: 98198 MB (95.9 GB)
Firmware info:
Firmware revision: 01.24
FP SWA driver revision: 1.18
IPMI is supported on this system.
BMC firmware revision: 1.30
Platform info:
Model: "ia64 hp Integrity BL860c i2"
Machine ID number: 97be7d8e-6105-11e0-8a68-294e7ff67a42
Machine serial number: CZ31099TF4
OS info:
Nodename: ITSEELM-
Release: HP-UX B.11.31
Version: U (unlimited-user license)
Machine: ia64
ID Number: 2545843598
vmunix _release_version:
@(#) $Revision: vmunix: B.11.31_LR FLAVOR=perf
And I also uploaded the list files in clue$collect.
Best regards,
Richard
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-22-2012 09:55 PM
03-22-2012 09:55 PM
Re: A COM process on OpenVMS guests
Richard,
4 unusual crashes within 4 days. Something is really wrong here:
PGFIPLHI in SWP$SHELL_INIT_C+00DA1 trying to execute st8 [r31] = r20 with R31=40000000.00000000
KRNLSTAKNV in SCH$INTERRUPT_C+00B90
and 2 MACHINECHK crashes on a HP VM virtual machine ?
Only HP will be able to help you here ...
Do you have other guests running on this HP VM node ? Any of them also seeing unusual problems ?
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-23-2012 05:53 AM
03-23-2012 05:53 AM
Re: A COM process on OpenVMS guests
Having looked at these OpenVMS crashes for years and years, making any headway without access to the crashdump files and without source code listings for OpenVMS is a whole lot of work; what's tricky at best becomes intractable.
And that's for VMS running native-booted.
This particular configuration is exceptionally complex, and I've found each of the layers here (HP-VM, HP-UX, Integrity, Itanium) can and variously does introduce errors.
CLUE CRASH is good for the "saw that already" crashes that can be automatically scanned and identified, if HP is using those sorts of crash-scanning tools. For figuring out why VMS face-planted, the dump file is (for me) far more interesting.
Given you've already undoubtedly run the error logs and related and looked for hardware glitches and HP has run the CLUE CRASH past whatever they're using these days, if in your situation, I'd next get rid of HP-UX and HP-VM here, and boot OpenVMS native onto the Tukwila hardware. That'll either change the footprint substantially, eliminate the crashes entirely, or (because VMS has a better view into the hardware, if that's the trigger here) potentially identify the error.
And FWIW, I continue to be surprised that folks are willing to do this with production environments and don't choose native boot either directly or via EFI-level partitioning. While I do grok the "cool factor" and the "power and cooling" decisions of VMs, the particular nature of HP's VM implementation for Itanium (you can't boot a VM on the VM here, for testing or debug) and the "moving target" that is the VMS error-decoding tools, makes for a very hairy stack here. You're basically beholden to HP Support with this and similar cases, and across four (VMS, VM, UX, Integrity) HP entities.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-23-2012 09:52 PM - edited 03-23-2012 09:53 PM
03-23-2012 09:52 PM - edited 03-23-2012 09:53 PM
Re: A COM process on OpenVMS guests
Richard,
are you aware of this article from the OpenVMS technical journal V16 ?
OpenVMS Guest Troubleshooting
http://h71000.www7.hp.com/openvms/journal/v16/troubleshooting.html
Maybe you can use some of the troubleshooting guidelines in that article to obtain more and better information.
Volker.