- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- System Crash: ES40
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-11-2011 10:22 AM
07-11-2011 10:22 AM
An unusual event occurred this weekend with the XFC. At first glance, one could suggest a bug for which a patch was created... however, this is first such event since the OS was installed in 2002. Should I bet the cause is memory/hardware related and not XFC image? I ran Diag and only the crash event was posted.
SYSGEN> VCC_FLAGS = 2
Bugcheck Type: SSRVEXCEPT, Unexpected system service exceptionNode:
ES40 (Standalone)CPU Type:
AlphaServer ES40VMS Version: V7.3-1
Failing PC: FFFFFFFF.80330EBC SYS$XFCACHE+1CEBC
CPU ID 00 CPU State rc,pa,pp,cv,pv,pmv,plCPU Type
EV67 Pass 2.2.3/.5 (21264A)
PAL Code 1.98-104
w/ 9 gb MEMORY
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-11-2011 11:14 AM
07-11-2011 11:14 AM
Re: System Crash: ES40
Tim,
before considering to swap hardware, ALWAYS try to find out, what exactly caused a crash. Assume it's software first, only if the crash cannot be explained from a software point-of-view, then try to assume it may be hardware.
Could you post the CLUE file of this crash as an attachment in your next reply ? Or mail it to me as a private mail.
You'll find the CLUE file in CLUE$COLLECT:CLUE$node_ddmmyy_hhmm.LIS
Volker.
-----
An OpenVMS crashdump analysis a day
makes the Windows headaches go away.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-12-2011 04:10 AM
07-12-2011 04:10 AM
Re: System Crash: ES40
Hi Tim,
If its a inline BUGCHECK from XFC then you would have a crash type as
>> ASSERTFAIL, System ASSERT failure detected
In such cases, you can get some additional data related to the crash from XFC trace buffers
SDA>XFC SHOW TRACE/RAW
In this case, its not a inline BUGCHECK
>>Bugcheck Type: SSRVEXCEPT, Unexpected system service exception
Please check if the XFC buffers has any information that can give any clues. Execute the following command in the crash dump
SDA>XFC SHOW TRACE/RAW
Also,
>> AlphaServer ES40VMS Version: V7.3-1
Looks like you are running a unsupported version of OpenVMS. OpenVMS version supported on Alpha is V7.3-2 onwards.
Regards,
Murali
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-12-2011 04:28 AM
07-12-2011 04:28 AM
SolutionTim,
I've had a short look at your CLUE file. Here is the most important footprint information:
Bugcheck Type: SSRVEXCEPT, Unexpected system service exception
Node: ES40 (Standalone)
CPU Type: AlphaServer ES40
VMS Version: V7.3-1
Failing PC: FFFFFFFF.80330EBC XFCREADSTART_C+00ACC
Failing PS: 00000000.00000201
Module: SYS$XFCACHE (Link Date/Time: 18-JUL-2002 19:55:19.70)
Offset: 0001CEBC
Signal Array: 64-bit Signal Array:
Arg Count = 00000005 Arg Count = 00000005
Condition = 0000000C Condition = 00000000.0000000C
Argument #2 = 00000000 Argument #2 = 00000000.00000000
Argument #3 = 00000054 Argument #3 = 00000000.00000054
Argument #4 = 80330EBC Argument #4 = FFFFFFFF.80330EBC
Argument #5 = 00000201 Argument #5 = 00000000.00000201
Failing Instruction:
XFCREADSTART_C+00ACC: LDL R16,#X0054(R0)
Instruction Stream (last 20 instructions):
XFCREADSTART_C+00A7C: BEQ R1,#X000008
XFCREADSTART_C+00A80: BIS R31,R9,R16
XFCREADSTART_C+00A84: BIS R31,R4,R17
XFCREADSTART_C+00A88: JSR R26,(R26)
XFCREADSTART_C+00A8C: LDL R21,#X0108(R6)
XFCREADSTART_C+00A90: STL R31,#X010C(R6)
XFCREADSTART_C+00A94: ADDL R21,R4,R4
XFCREADSTART_C+00A98: STL R4,#X0108(R6)
XFCREADSTART_C+00A9C: BR R31,#X000027
XFCREADSTART_C+00AA0: CMPULE R5,#X00,R22
XFCREADSTART_C+00AA4: CMPULE R4,#X00,R23
XFCREADSTART_C+00AA8: BIS R31,R31,R24
XFCREADSTART_C+00AAC: BIS R22,R23,R22
XFCREADSTART_C+00AB0: BLBS R22,#X000022
XFCREADSTART_C+00AB4: S8ADDL R24,R31,R25 <<< sets up R25...
XFCREADSTART_C+00AB8: LDAH R19,#X0001(R31)
XFCREADSTART_C+00ABC: ZAPNOT R25,#X0F,R25 <<< ...
XFCREADSTART_C+00AC0: ADDQ R6,R25,R25 <<< R25 set up
XFCREADSTART_C+00AC4: LDA R19,#X8000(R19)
XFCREADSTART_C+00AC8: LDQ R0,#X0190(R25) <<< load R0 here
XFCREADSTART_C+00ACC: LDL R16,#X0054(R0) <<< ACCVIO here due to R0=0
XFCREADSTART_C+00AD0: LDQ R1,#X0068(R0)
XFCREADSTART_C+00AD4: SLL R16,#X09,R17
XFCREADSTART_C+00AD8: AND R1,#X10,R1
XFCREADSTART_C+00ADC: ADDL R31,R17,R17
R0 = 00000000.00000000
R6 = FFFFFFFF.817F3D50 VCC_CTX
R24 = 00000000.00000001
AI = FFFFFFFF.817F3D58
RA = FFFFFFFF.80330F08 XFCREADSTART_C+00B18
PV = 00000000.00000015
R28 = 0000029B.8FC90000
FP = 00000000.7FFA1820
PC = FFFFFFFF.80330EBC XFCREADSTART_C+00ACC
PS = 00000000.00000201 Kernel Mode, IPL 2
The crash happened in [XFC]XFC_READ routine XfcReadCopyToUserBuffer when trying to obtain ulSize.
The problem is R0=0 as loaded from #X0190(R25). I need to further verify the code stream loading the current value into R25...
From a software-point-of view, this does NOT look like a hardware problem at first glance.
Note that you're running XFC from V7.3-1 SSB-version, no XFC patches installed. And there are some, but none of them directly describes this crash footprint.
I would consider this crash footprint to not be a 'known' problem, I've not seen that exact footprint before and I've seen thousands of crashes...
If this system has been running for 9 years without ever seeing this bugcheck - you should check $ TYPE CLUE$HISTORY about the exact crash history of this node - you might just get lucky for the next 9 years...
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-12-2011 06:48 AM
07-12-2011 06:48 AM
Re: System Crash: ES40
Tim,
Based on the replies you've gotten so far and some of the comments it sounds like you are really in need of, at least, bringing up the patch level on your system. Given that you're using V7.3-1 the best we can suggest would be to cautiously get your patches updated and seriously consider what might keep you from bringing your system up to, at least, V7.3-2. There are still many. many patches for V7.3-2 but the version itself is still seeing some maintenance. If you can't move from V7.3-1 you're really hurting yourself if your patches aren't current...even if your crash isn't an exact match to the footprints published as solved in the patch release notes. I've always held the opinion that a mix-and-match set of patches might not show the same footprints as the latest combination and any customer's situation might trigger (Volker hinted at this) a condition that hadn't been reported to engineering. It just takes the right combination of events, software, configuration, etc.
To add to what Volker said... IF a hardware problem causes a crash you'll *usually* see some entry in the errorlog before the crash itself. You might also receive information on the console shortly before the crash. A hardware failure can be so catastrophic that you'll have little in the errorlog but in those cases you usually would have to glean the gory details from another member of the cluster sharing the system disk or from the stored registers the console saves when the event occurs. Capturing this information requires special steps immediately after the machine halts and, in many cases, there isn't a lot you can do with the system anyway. Power cycling can clear the details of what the console saves so checking with HP is recommended when a system halts like that.
And while the most current bit-to-text errorlog translation tools are recommended you can still get a very good "in the ballpark" read from DECevent so you don't absolutely positively HAVE to use SEA or whatever it is today.
bob
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-12-2011 09:29 AM
07-12-2011 09:29 AM
Re: System Crash: ES40
Many thanks to all who took time to review and comment on this issue.
Since this is the first such event which has occurred in 10 years or so, will make no further changes other than upgrading OpenVMS to a more recent release.
Very best regards,
Tim Peer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-12-2011 09:55 AM - edited 07-12-2011 09:57 AM
07-12-2011 09:55 AM - edited 07-12-2011 09:57 AM
Re: System Crash: ES40
Tim,
the instructions executed immediately preceeding the ACCVIO seem to have been correctly executed:
here_is_a_label: (R24 is ulIndex used in a for loop)
XFCREADSTART_C+00AB4: S8ADDL R24,R31,R25 <<< sets up R25 with 00000008
XFCREADSTART_C+00AB8: LDAH R19,#X0001(R31)
XFCREADSTART_C+00ABC: ZAPNOT R25,#X0F,R25 <<< clear low order 4 bytes in R25
XFCREADSTART_C+00AC0: ADDQ R6,R25,R25 <<< R25 set up finished here
XFCREADSTART_C+00AC4: LDA R19,#X8000(R19)
XFCREADSTART_C+00AC8: LDQ R0,#X0190(R25) <<< load R0 here
XFCREADSTART_C+00ACC: LDL R16,#X0054(R0) <<< ACCVIO here due to R0=0
...
code branches back to beginning of for loop (see above) later.
Combined with the values of the registers as reported by SDA> CLUE REGISTER
R0 = 00000000.00000000
R6 = FFFFFFFF.817F3D50 VCC_CTX
R24 = 00000000.00000001
AI = FFFFFFFF.817F3D58 (=R25)
everything adds up correctly.
If you do a SDA> EXA 817F3D58+190 in the dump, you should see a longword of zeroes.
More detailled analysis would require access to the dumpfile itself and following the for loop in the dump.
There probably is some corruption/invalid link in some XFC data structure, which has caused this crash to happen.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-20-2011 05:27 AM
07-20-2011 05:27 AM
Re: System Crash: ES40
there is a cost running old versions of VMS - sometimes it takes a while to notice this cost.
Purely Personal Opinion
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-20-2011 05:43 AM
07-20-2011 05:43 AM
Re: System Crash: ES40
Ian,
this comment of yours is purely based on speculation, that whatever problem has caused this crash, would not have caused it to happen in 'more recent versions of OpenVMS'.
Without knowing about the real cause of this crash, noone can tell, whether it could have been prevented by a patch or an upgrade.
And think about the downtime and efforts involved during the past 9 years, which would have been needed to keep this system on current patch levels and versions.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-20-2011 08:02 AM - edited 07-20-2011 08:03 AM
07-20-2011 08:02 AM - edited 07-20-2011 08:03 AM
Re: System Crash: ES40
Although not relevant to this issue. Tim mentioned upgrading. I thought I would mention there is a cost in not upgrading which has to be balanced against the cost of upgrading.
Purely Personal Opinion