- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: System crashes every 3 weeks.
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-31-2005 04:54 AM
тАО03-31-2005 04:54 AM
Re: System crashes every 3 weeks.
bad luck - OpenVMS V7.1-1H2 did NOT log any machine check entry.
This is the SAME machine/problem as already discussed in previous thread:
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=808549
I keep a database of all crashes, that's why I know ;-)
Could you please try to provide the stack data as requested in the previous thread:
$ ANAL/CRASH SYS$SYSTEM:SYSDUMP.DMP
SDA> READ/EXEC
SDA> SHOW STACK/QUAD 7FFA1FC0;40
It may also be possible to find the machine check logout frame in the dump.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-31-2005 05:17 AM
тАО03-31-2005 05:17 AM
Re: System crashes every 3 weeks.
You're absolutely right. Adrian is my hardware support contact and I'm that "sysadmin is in west coast Canada" he referred to.
In-any-case, I was not aware that they were using this forum to trouble-shoot the problem. I thought I'd try as I'm not getting anywhere following the official channels.
Here's the output from the SHOW STACK/QUAD 7FFA1FC0;40 command:
Specified Stack Range
---------------------
00000000.7FFA1FC0 00000000.0002F030
00000000.7FFA1FC8 00000000.010E0019
00000000.7FFA1FD0 00000000.7AF77A5C
00000000.7FFA1FD8 00000000.7AF78AA0
00000000.7FFA1FE0 00000000.00000001
00000000.7FFA1FE8 00000000.00000003
00000000.7FFA1FF0 00000000.0030F080
00000000.7FFA1FF8 00000000.0000001B
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-31-2005 05:44 AM
тАО03-31-2005 05:44 AM
Re: System crashes every 3 weeks.
Just curious--just how precisely do you mean "every 3 weeks":
1) every 3 weeks, within a few milliseconds
2) every 3 weeks, within a couple of hours
3) Every 3 weeks, within a few days
I'll bet your answer is 3. :-)
To hazard a little speculation around each possibility:
1) would be pretty strange, to me at least. Perhaps a flaw in the fabric of space-time. :-)
2) might suggest a link to some calendar-related activity. Perhaps a procedure or device that is used at every couple of weeks? But you'd probably have noticed that.
3) suggests something a lot more random or at least aperiodic, which is why I guessed you'd pick this answer.
Just a few thoughts which may at least stimulate some thought, if they're of any use at all...
Galen
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-31-2005 07:11 AM
тАО03-31-2005 07:11 AM
Re: System crashes every 3 weeks.
"I keep a database of all crashes, that's why I know"
and I thought you just remembered them all rather than having a private copy of canasta :-)
Purely Personal Opinion
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-31-2005 06:13 PM
тАО03-31-2005 06:13 PM
Re: System crashes every 3 weeks.
the interrupt/exception stack frame shows, that the current PC at the time of the MACHINECHK is in P0 space and the PS shows user-mode IPL 0:
00000000.7FFA1FF0 00000000.0030F080 <<< PC
00000000.7FFA1FF8 00000000.0000001B <<< PS
SDA> eva/ps 0000001B
MBZ SPAL MBZ IPL VMM MBZ CURMOD INT PRVMOD
0 00 00000000000 00 0 0 USER 0 USER
so whatever the instruction is
SDA> EXA/INS 30F080
it CANNOT have caused a MACHINECHK through a programming error (i.e. access into IO-space), because you can't do that in USER mode. It could have caused access to a bad memory page, but that would be pure speculation !!
Please issue the following commands in SDA:
SDA> EXA/INS 30F080-30;40
to examine the instruction stream. If the current instruction include a memory access and you're able to figure out the address, also do
SDA> SHOW PROC/PAGE address;1000
Otherwise, I'll help you to figure out the page number...
To get an overview of the last couple of crashes on this node, just try TYPE CLUE$HISTORY - if there is something timing related, you might be able to spot a pattern.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-31-2005 06:28 PM
тАО03-31-2005 06:28 PM
Re: System crashes every 3 weeks.
If you realy suspect the memory, then try to shut down the machine and bring it to SRM console. Then start 2 memexers per CPU and let them run for a few hours. If there is realy bad RAM it should show on console. To stop the memexer give the kill_diag command (or init the system). To show the status of memexter type show_diag.
(I could be a litle of with the commands, look in the manual or try help or man for exact commands).
It could be possible that the RAM has gone bad. At my current site we have had several issue's with bad RAM.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-01-2005 06:24 AM
тАО04-01-2005 06:24 AM
Re: System crashes every 3 weeks.
SDA> EXA/INS 30F080
00000000.0030F080: BIS R31,#X1D,R7
SDA> EXA/INS 30F080-30;40
00000000.0030F050: CVTDG F3,F3
00000000.0030F054: ADDG F4,F3,F3
00000000.0030F058: CVTGD F3,F3
00000000.0030F05C: STD F3,#X0CF8(FP)
00000000.0030F060: TRAPB
00000000.0030F064: LDA R16,#X0008(FP)
00000000.0030F068: BIS R31,#X01,R25
00000000.0030F06C: LDQ R26,#XFF60(R2)
00000000.0030F070: LDQ R27,#XFF68(R2)
00000000.0030F074: JSR R26,(R26)
00000000.0030F078: JMP R31,(R0)
00000000.0030F07C: TRAPB
00000000.0030F080: BIS R31,#X1D,R7
00000000.0030F084: STL R7,#X0020(FP)
00000000.0030F088: LDL R3,#X0CE0(FP)
00000000.0030F08C: ADDL/V R3,#X01,R3
00000000.0030F090: LDA R16,#X8000(R31)
I looked at the clue$history file and there doesn't appear to be any pattern other than approx every 3 weeks.
e.g. The previous 4 crashes are:
Date Uptime
======== ==========
Dec 29 22 days
Jan 20 25 days
Feb 14 25 days
Mar 29 23 days
Sorry, I don't know what address to put in the SHOW PROC/PAGE address;1000 command.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-01-2005 05:12 PM
тАО04-01-2005 05:12 PM
Re: System crashes every 3 weeks.
the exception PC points to a BIS R31,#X1D,R7 instruction, so there are no memory accesses involved executing this instruction - except access to the page, where this instruction is stored. Please remember to repeat these steps against the next crash(es).
Now let's try to find the machinecheck logout frame in the dump:
SDA> READ SYSDEF
SDA> SHOW STACK @(@smp$gl_cpu_data+CPU$L_PROC_MCHK_ABORT_SVAPTE+4);2F0
You have to enter the command in one line.
(above command only applies to single-CPU system - which this node is).
Try to include the output as a text file attachment in your next reply (or mail it to me - see my forum profile).
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-04-2005 03:30 AM
тАО04-04-2005 03:30 AM
Re: System crashes every 3 weeks.
I've attached a text file with the output.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-05-2005 03:39 AM
тАО04-05-2005 03:39 AM
Re: System crashes every 3 weeks.
thanks for the data:
8A0E0058 00000001.00000205 = mchk code
Could you please compare the data with the same SDA command in the running system ? Sometimes mchk data is left in this buffer from 'expected' machinechecks (like during SYSMAN IO AUTOCONFIGURE when scanning the device configuration).
If the same data exists in the running system, we know that no machine check frame has been logged and need to try to find out, why OpenVMS has crashes with a MACHINECHK crash.
Volker.