- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: system crash
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-11-2005 07:20 PM
07-11-2005 07:20 PM
system crash
Pls suggest..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-12-2005 04:26 AM
07-12-2005 04:26 AM
Re: system crash
It would appear that this Halt_Restart is very similar to the crash submitted by Rajarshi Gupta, back on 01-Jul-2005. In fact, in both Clue-Listings, the node name is the same. (TGEV01)
The K-Stk footprint is similar (but not exactly the same) in both of these crashes. My suspicion, based on the same node-name, and your statement-- "This is the 2nd time this machine has gone down in similar situation" is that both you and Rajarshi are trying to troubleshoot and isolate this problem.
If my previous two paragraphs are correct, and this "IS" the same system/vax-4100A, then we may have to lean towards a hardware failure. I say this because the first Halt that was reported by Rajarshi occurred in the SYSTSG image at appproximately PC=7E07 or 7E08 (updated Pc reflected?); while your second Halt occurred at PC=891EB or 891EA (again not sure if the Halt-Restart-Bugcheck displays the Failing-PC or the Updated-PC) in the SYSDSK image.
In other words, I would find it hard to believe that you have two (2) different executable images with the similar code-threads, that execute Halt instructions while in Kernel-Mode. It would make more sense that if the "same" system has crashed more than once, in different code-streams, that it is likely to be an internal IC-Chip failure (ALU/Mux/Shift-Reg) on the Vax Processor module.
But if the crashes are occurring on two systems, then it is likely to be a problem that is common, but independent of the actual system-boxes. For example you mention that this system checks to see if there is a "duty-machine", and if not, then this system tries to become the "new duty machine". If there are multiple systems that each check for "duty-machine" (via a keep-alive-broadcast over the network?) and the network-concentrator/switch/hub does not forward the broadcast/multicast, you may have a network-filtering problem...
Just a couple of thoughts, not sure if they help or not...
Thanx,
whynot3k
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-12-2005 04:50 AM
07-12-2005 04:50 AM
Re: system crash
would this suggest that you had one memory error somewhere sometime prior the crash.
at least worth checking.
_veli
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-12-2005 06:00 AM
07-12-2005 06:00 AM
Re: system crash
please try to report the instruction at PC = 891EB (or 891EA)
$ ANAL/SYS SYS$SYSTEM:SYSDUMP.DMP
SDA> EXA/INS 891EB
SDA> EXA/INS 891EA
CLUE reported the failing instruction at PC=000C09D4, could you please also examine
SDA> EXA/INS C09d4
SDA> EXA/INS C09D3
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-12-2005 06:08 AM
07-12-2005 06:08 AM
Re: system crash
Thanks for providing the CLUE file, this allows at least some educated guesses on what might have happened.
Please also provide the data from SDA, as this may help make a decision between a software or hardware problem...
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-12-2005 06:12 PM
07-12-2005 06:12 PM
Re: system crash
Volker,
could you please let me know how to get SDA output which you require, so that I can attach that also.
Richard,
you are right, it is the same machine crashing with two different image name in halt crash. Generally, when a system crashes, it gives some application error log pointing a probable reason for crash. But in the two crash, this machine is not giving any application reason.
Pls suggest if you need any other log, which I can attach.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-12-2005 07:08 PM
07-12-2005 07:08 PM
Re: system crash
Could it be that the console-terminal is broke or has a failing connection? Can it be that this is switched off by the application (due to the crash) and therefore crashing VMS?
Willem
OpenVMS Developer & System Manager
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-12-2005 09:28 PM
07-12-2005 09:28 PM
Re: system crash
Purely Personal Opinion
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-13-2005 01:39 AM
07-13-2005 01:39 AM
Re: system crash
SDA>EXA/INS 891EB
000891EB: XFC
SDA>EXA/INS 891EA
000891EA: NOP
SDA>EXA/INS C09D4
000C09D4 : RET
SDA>EXA/INS C09D3
000C09D4 : HALT
Pls suggest..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-13-2005 02:20 AM
07-13-2005 02:20 AM
Re: system crash
000C09D4 : HALT
^^^^^^^^ this should be 000C09D3, right ?!
SDA>EXA/INS C09D4
000C09D4 : RET
If this would be the real HALT-PC, it makes sense. The other 2 instructions could not have halted the system.
Could you now also please try to examine the instruction stream leading to 000C09D3 and 000891EA ?
Start with SDA> EXA/INS C09D4-10;10
If this provides a valid instruction stream up to address C09D4, please post it. Otherwise try -11;11 or -A;A - VAX instructions are variable length and you need to find the beginning of a valid instruction to be able to decode the whole instruction stream.
Then please do the same with 891EA-10;10 and so on.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-13-2005 02:45 AM
07-13-2005 02:45 AM
Re: system crash
You've reminded me of a similar situation. The VAX (don't remember what kind, but it wasn't a VAXstation) would just sometimes crash for no reason.
Finally realized that they had a PC as the console, and were using the console to run a data entry application. The Terminal Emulator had F5 mapped to send
I don't remember if there was also a console command that set the break condition, but I moved that operators PC off of the console and put in a dedicated console with break disabled. It never crashed like that again.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-13-2005 04:27 AM
07-13-2005 04:27 AM
Re: system crash
I'm pretty sure that the system will just HALT and display the console prompt >>>
but it will not crash with a HALT restart bugcheck. The HALT restart crash should only happen, if the console detects, that the operating system has issued a HALT instruction in kernel mode (thus halting the CPU) and the HALT console parameter is set to RESTART. The console would then try to restart OpenVMS via the restart entry point.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-13-2005 05:02 AM
07-13-2005 05:02 AM
Re: system crash
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-14-2005 06:47 AM
07-14-2005 06:47 AM
Re: system crash
Volker,
I will get back after trying your suggestions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-18-2005 01:12 AM
07-18-2005 01:12 AM
Re: system crash
Can you perhaps give us a brief overview of the method you are using to trigger the change in system status? As I recall, there is a possible situation in which you could run afoul of one of the Goedel theorems on computability that would prevent this from being a truly reliable process, depending on exactly how you approach it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-18-2005 01:31 AM
07-18-2005 01:31 AM
Re: system crash
http://forums2.itrc.hp.com/service/forums/questionanswer.do?threadId=929654
It may not be possible to obtain the HALT PC from the dump, but ONLY from the halt message on the console.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-19-2005 06:22 PM
07-19-2005 06:22 PM
Re: system crash
For a valid restart crash (see [SYS]POWERFAIL routine EXE$RESTART_ATT), the following input values are expected:
AP - Halt reason code (a value between 3 and 31.)
R10 - HALT PC
R11 - HALT PSL
These values do not match in this crash, which makes a hardware problem even more likely.
Volker.