- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-12-2005 04:26 AM
07-12-2005 04:26 AM
Re: system crash
Good Morning Sk...
It would appear that this Halt_Restart is very similar to the crash submitted by Rajarshi Gupta, back on 01-Jul-2005. In fact, in both Clue-Listings, the node name is the same. (TGEV01)
The K-Stk footprint is similar (but not exactly the same) in both of these crashes. My suspicion, based on the same node-name, and your statement-- "This is the 2nd time this machine has gone down in similar situation" is that both you and Rajarshi are trying to troubleshoot and isolate this problem.
If my previous two paragraphs are correct, and this "IS" the same system/vax-4100A, then we may have to lean towards a hardware failure. I say this because the first Halt that was reported by Rajarshi occurred in the SYSTSG image at appproximately PC=7E07 or 7E08 (updated Pc reflected?); while your second Halt occurred at PC=891EB or 891EA (again not sure if the Halt-Restart-Bugcheck displays the Failing-PC or the Updated-PC) in the SYSDSK image.
In other words, I would find it hard to believe that you have two (2) different executable images with the similar code-threads, that execute Halt instructions while in Kernel-Mode. It would make more sense that if the "same" system has crashed more than once, in different code-streams, that it is likely to be an internal IC-Chip failure (ALU/Mux/Shift-Reg) on the Vax Processor module.
But if the crashes are occurring on two systems, then it is likely to be a problem that is common, but independent of the actual system-boxes. For example you mention that this system checks to see if there is a "duty-machine", and if not, then this system tries to become the "new duty machine". If there are multiple systems that each check for "duty-machine" (via a keep-alive-broadcast over the network?) and the network-concentrator/switch/hub does not forward the broadcast/multicast, you may have a network-filtering problem...
Just a couple of thoughts, not sure if they help or not...
Thanx,
whynot3k
It would appear that this Halt_Restart is very similar to the crash submitted by Rajarshi Gupta, back on 01-Jul-2005. In fact, in both Clue-Listings, the node name is the same. (TGEV01)
The K-Stk footprint is similar (but not exactly the same) in both of these crashes. My suspicion, based on the same node-name, and your statement-- "This is the 2nd time this machine has gone down in similar situation" is that both you and Rajarshi are trying to troubleshoot and isolate this problem.
If my previous two paragraphs are correct, and this "IS" the same system/vax-4100A, then we may have to lean towards a hardware failure. I say this because the first Halt that was reported by Rajarshi occurred in the SYSTSG image at appproximately PC=7E07 or 7E08 (updated Pc reflected?); while your second Halt occurred at PC=891EB or 891EA (again not sure if the Halt-Restart-Bugcheck displays the Failing-PC or the Updated-PC) in the SYSDSK image.
In other words, I would find it hard to believe that you have two (2) different executable images with the similar code-threads, that execute Halt instructions while in Kernel-Mode. It would make more sense that if the "same" system has crashed more than once, in different code-streams, that it is likely to be an internal IC-Chip failure (ALU/Mux/Shift-Reg) on the Vax Processor module.
But if the crashes are occurring on two systems, then it is likely to be a problem that is common, but independent of the actual system-boxes. For example you mention that this system checks to see if there is a "duty-machine", and if not, then this system tries to become the "new duty machine". If there are multiple systems that each check for "duty-machine" (via a keep-alive-broadcast over the network?) and the network-concentrator/switch/hub does not forward the broadcast/multicast, you may have a network-filtering problem...
Just a couple of thoughts, not sure if they help or not...
Thanx,
whynot3k