Operating System - OpenVMS
1827879 Members
1323 Online
109969 Solutions
New Discussion

System Crash of DS 25 after CPU Upgrade

 
SOLVED
Go to solution
C.Eichert
Advisor

System Crash of DS 25 after CPU Upgrade

I have severe system crashes after installation of a second CPU in a DS25 "INVEXCEPTN, Exception while above ASTDEL"
The original supply was with one CPU only. VMS 7.3-2 was installed with one CPU. Due to heavy CPU load of the application, we decided to make the CPU upgrade. I run Autogen after installation of the second CPU. The CPU load has been OK since then, but then the crashes have been occured occassionally. Load tests with both CPU were OK for 10 hours. HP Support replaced already main board and memory without success. Several patches have been installed. The eventlog.sys does not contain any helpfull information. Sysdump.dmp was sent to HP for evaluation. Up to now no answer.
Does anybody has an idea to fix the problem?
26 REPLIES 26
Kris Clippeleyr
Honored Contributor

Re: System Crash of DS 25 after CPU Upgrade

Hi,

What image is causing the crash?

Kris (aka Qkcl)
I'm gonna hit the highway like a battering ram on a silver-black phantom bike...
Ian Miller.
Honored Contributor

Re: System Crash of DS 25 after CPU Upgrade

There are a couple of people here who are experts on crashes. Can you post the clue file (from SYS$ERRORLOG) and the results of PRODUCT SHOW HISTORY as text file attachments here.
____________________
Purely Personal Opinion
C.Eichert
Advisor

Re: System Crash of DS 25 after CPU Upgrade

Attached are the last CLUE file and the product overview. I forgot to mention, that also the code SSRVEXCEPT and XQPERR were given as bugcheck. But mostly INVEXCEPTN.
The current process during crash is always different.

Thanks for your reply

Christoph
Karl Rohwedder
Honored Contributor

Re: System Crash of DS 25 after CPU Upgrade

Is the newest firmware (V6.9) installed and are
the CPU's identical in speed and revision?

mfg Kalle
C.Eichert
Advisor

Re: System Crash of DS 25 after CPU Upgrade

The two CPU are identically in speed and revision. The firmware was upgraded to V6.9
end of december.

Christoph
Mobeen_1
Esteemed Contributor

Re: System Crash of DS 25 after CPU Upgrade

Christoph,
Your logs show that the INVEXCEPTN is in the TCPIP (Module : TCPIP$INTERNET_SERVICE)
I was looking through the TCPIP site and thought the following URL could be of help to you

http://ftp.support.compaq.com.au/pub/patches/vms/vax/v7.3/tcpip/5.3/dec-vaxvms-tcpip_eco-v0503-181-4.README

Since you are saying that you are having this issue since CPU upgrade, this thread is of interest to me now :).

I would be as interested as you to follow this thread and see how people arrive at a solution to your problem....

regards
Mobeen
Ian Miller.
Honored Contributor

Re: System Crash of DS 25 after CPU Upgrade

there is a VMS732_UPDATE considated patch kit which is recommended.
____________________
Purely Personal Opinion
C.Eichert
Advisor

Re: System Crash of DS 25 after CPU Upgrade

Mobeen,

I made already an update DEC-AXPVMS-TCPIP_ECO-V0504-154-4. Does it contain the patches from VAXVMS-TCPIP_ECO-V0503-181-5 that you recommends? The product details you can find in the Clue-File attached to answer from 11:39.

Regards
Christoph
Mobeen_1
Esteemed Contributor

Re: System Crash of DS 25 after CPU Upgrade

Christoph,
The one that you have applied should have all of those 5.3 ... and that should not be an issue

rgds
Mobeen
C.Eichert
Advisor

Re: System Crash of DS 25 after CPU Upgrade

Ian,

I made already an update VMS732_UPDATE-V0300.
Please have a look to the Clue-File attached to answer from 11:39.

Regards
Christoph
Mobeen_1
Esteemed Contributor

Re: System Crash of DS 25 after CPU Upgrade

Christoph,
Have all of your crashes so far been revolving around the PC : TCPIP ? If so, i would suggest that you first rule that out by hopping on to the latest or seek help from the vendor on your situation. I understand that in this case the vendor is HP :)

regards
Mobeen
Ian Miller.
Honored Contributor
Solution

Re: System Crash of DS 25 after CPU Upgrade

I had missed the UPDATE patch was already there. What is the value of the system parameter MULTITHREAD? If its now > 1 then you could try setting it to 1 which would prevent more than one kernel thread from being active in any process.
____________________
Purely Personal Opinion
C.Eichert
Advisor

Re: System Crash of DS 25 after CPU Upgrade

Actually I am not connected any longer to the DS25. I finished work for today (I stay in Thailand). I will continue the reply tomorrow. Thank you so far.

Christoph
John Travell
Valued Contributor

Re: System Crash of DS 25 after CPU Upgrade

Please excuse the lengthy reply. Included extracts from the CLUE file to support my conclusion.

Crash Time: 10-FEB-2005 00:51:27.29
Bugcheck Type: INVEXCEPTN,
CPU Type: AlphaServer DS25
VMS Version: V7.3-2

Crash/Primary CPU: 01/00 <<< crash on CPU #1

Signal Array:
Arg Count = 00000005
Condition = 0000000C <<< access violation
Argument #2 = 00000000
Argument #3 = 00000020 <<< failed Virtual address
Argument #4 = 8062BFE4 <<< Error PC.
Argument #5 = 00000800


Failing Instruction:
TCPIP$INTERNET_SERVICES+97FE4: SUBL R7,R17,R7

This instruction does not involve ANY memory access at all. It is NOT POSSIBLE for an ACCESS VIOLATION to occur if this code is executed correctly.
(Other types of exceptions, maybe, but not an ACCVIO!)

Current Registers: PCB: 8164E480 (CPU 1)

R7 = 00000000.00000028
R17 = 00000000.00000014

Nothing exceptional in the registers. no reason for an exception.

Cpu#1 did not correctly execute the code present in its Istream.
Looks like hardware to me.

System Information:
System Type AlphaServer DS25
Cycle Time 1.0 nsec (1000 MHz)
CPU ID 01
CPU Type EV68CB Pass 2.4 (21264C)
PAL Code 1.98-42
CPU Revision ....
Serial Number JA40701068
Console Vers V6.9-2

Based on this crash I would place CPU#1 under suspicion.
However, while unlikely please bear in mind that the problem just could be on the main system board in the slot support logic.
Try swapping over the CPU's. If the crashes then occur on CPU#0 you have a diagnosis.

Alternatively, show us some more clue files.

As always, if you cannot get HP to investigate the crashes and are willing to pay a moderate sum, send them to me.

John Travell, john@jomatech.com, http://www.jomatech.com
Volker Halle
Honored Contributor

Re: System Crash of DS 25 after CPU Upgrade

I fully agree with John !

Before you try any wild guesses about patches etc., just have a look at the crash information in the CLUE file.

If it's an exception-related crash and the failing instruction, the register contents and the data in the signal array do NOT MATCH, it must be hardware or firmware.

As John pointed out, a SUBL instruction only referencing registers CANNOT generate any access violation.

Look at the crash information in the other CLUE files from the other crashes with the same focus. Does the failing instruction make sense, given the register contents etc. ? If not, which CPU was the crashing one ?

Volker.
C.Eichert
Advisor

Re: System Crash of DS 25 after CPU Upgrade

Mobeen,

some of the crashes show the module TCPIP, but not all. Please have a look to the attached CLUE$HISTORY.


Ian,

the system parameter MULTITHREAD was set to 2. According your recommendation I changed it to 1. Does this has an influence to our application? Fortran linker flags = /THREADS_ENABLE=(MULTIPLE_KERNEL_THREADS,UPCALLS)/noinfo.


John,
Volker,

the crashes happend mostly on CPU 01, but sometimes on CPU 00. Following your proposal I will exchange them today. The complete history in attached file. If you need certain CLUE-files please let me know.


To all,

HP recommends to install following patches:
VMS732_BACKUP-V0300
VMS732_CPU270F-V0100
VMS732_MQ-V0100
VMS732_PTHREAD-V0200
VMS732_TRACE-V0200

What are your opinions?


Regards
Christoph

Volker Halle
Honored Contributor

Re: System Crash of DS 25 after CPU Upgrade

Christoph,

I have a tool to collect and evaluate CLUE files. If you don't mind, you can mail me all your CLUE$COLLECT:CLUE*.LIS files from this server (just put them in a ZIP archive).

You can find my mail address, if you carefully look at my forum profile ;-)

Volker.
Ian Miller.
Honored Contributor

Re: System Crash of DS 25 after CPU Upgrade

MULTITHREAD=1 may affect your application performance however does a system crash.

Most of the crashes are CPU1 - I would go with John's plan of swapping the CPU modules around or even just replace it.
____________________
Purely Personal Opinion
C.Eichert
Advisor

Re: System Crash of DS 25 after CPU Upgrade

Ian,

MULTITHREAD has been changed to 1, CPU boards just swapped. I will keep you informed about any news. Replacement of CPU is quite difficult. Spares are here in Thailand not available. I have to get them from overseas.
May I have your opinion whether it makes sense to make a new (fresh) installation of VMS on a spare disk?

Thanks
Christoph
Volker Halle
Honored Contributor

Re: System Crash of DS 25 after CPU Upgrade

Christoph,

the idea of 'try a new installation' is NOT the way to troubleshoot OpenVMS problems...

Volker.
Ian Miller.
Honored Contributor

Re: System Crash of DS 25 after CPU Upgrade

Christoph, wait and see the results of what you have done so far. Changing too many things at once only causes confusion.

If the system is covered by warrenty or hardware service contract then the lack of parts is HP's problem not yours so don't let them forget that.
____________________
Purely Personal Opinion
Volker Halle
Honored Contributor

Re: System Crash of DS 25 after CPU Upgrade

Christoph,

I had a look at all of the 25 CLUE files. 12 of them clearly show inconsistencies regarding the failing instruction, register contents and failing virtual address (in the signal array). Those 'inconsistent' crashes have all happened on CPU 1 - as early as 5-OCT-2004, the first crash after adding the new CPU.

The inline crashes (XQPER, NETDLLERR, NOTFCBFCB) cannot be diagnosed from a CLUE file.

As a general rule, log a call for every system crash - if you can. OpenVMS does not crash that often and you (or your service provider) can learn something from every crash. This will also allow repeat problems to be identified as early as possible.

And if you can't log a call, post the CLUE information in ITRC, there are a few people around, that can help with the first steps during crash analysis.

Volker.
C.Eichert
Advisor

Re: System Crash of DS 25 after CPU Upgrade

Ian,

the DS25 has been running now for 10 days without crash. It looks like "MULTITHREAD=1" solved the problem. By default it is set to 1, I did not changed the setting to 2. Did AUTOGEN changed it to 2? (I ran it after installation of second CPU.) Is there a relationship between MULTITHREAD and MULTIPROCESSING?

Christoph
Ian Miller.
Honored Contributor

Re: System Crash of DS 25 after CPU Upgrade

setting to MULTITHREAD to 1 is a workaround.
I expect AUTOGEN set it to 2 when you added the second CPU. MULTITHREAD > 1 allows multiple threads in the same process to execute concurrently on multiple CPUs.

____________________
Purely Personal Opinion