Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Latest version of netacp on vms 7.1-2 ?

Latest version of netacp on vms 7.1-2 ?

Hello,

 

One of my customers has regurarly crashes on a ds20 running Openvms 7.1-2. (I don't know since when)

 

I would like to have the version of the latest netacp available on 7.1-2

 

For those interested here is some information of the crash, current netacp version and prod show hist.

 

Thanks and regards,

Rudy

 

Here is the crash information:

 

System crash information
------------------------
Time of system crash:  7-FEB-2013 18:15:00.60


Version of system: OpenVMS (TM) Alpha Operating System, Version V7.1-2 

System Version Major ID/Minor ID: 3/0


VMScluster node: LUNA, a AlphaServer DS20

Crash CPU ID/Primary CPU ID:  00/00

Bitmask of CPUs active/available:  00000003/00000003


CPU bugcheck codes:
 CPU 00 -- BADQHDR, Interlocked queue header corrupted
 1 other -- CPUEXIT, Shutdown requested by another CPU


CPU 00 Processor crash information
----------------------------------


CPU 00 reason for Bugcheck: BADQHDR, Interlocked queue header corrupted


Process currently executing on this CPU: NETACP


Current image file: LUNA$DKC0:[SYS0.SYSCOMMON.][SYSEXE]NETACP.EXE;1


Current IPL: 4  (decimal)


CPU database address:  80C26000

 
CPU 00 Processor crash information
----------------------------------

CPUs Capabilities:    PRIMARY,QUORUM,RUN

General registers:

R0   = FFFFFFFF.FFFFFFFF  R1   = 00000000.03250200  R2   = 00000000.00950540
R3   = FFFFFFFF.83542E20  R4   = 00000000.00000024  R5   = 00000000.00000010
R6   = 00000000.000102D0  R7   = FFFFFFFF.FFFFFFFF  R8   = 00000000.038C0204
R9   = 00000000.12010010  R10  = 00000000.000965A0  R11  = 00000000.00059124
R12  = 00000000.00000000  R13  = FFFFFFFF.83542E20  R14  = 00000000.00000000
R15  = FFFFFFFF.83505000  R16  = 00000000.0000047C  R17  = 00000000.00000003
R18  = 00000000.00000000  R19  = 00000000.0000BE44  R20  = 00000000.0000000E
R21  = 00000000.00000000  R22  = 00000000.00000000  R23  = 00000000.03250200
R24  = FFFFFFFF.83505828  AI   = FFFFFFFF.FDC2BB19  RA   = 00000000.00000000
PV   = 00000000.00000000  R28  = 00000000.0001C9C4  FP   = 00000000.7FFA1E50
PC   = FFFFFFFF.80088E24  PS   = 18000000.00000404

 


CPU 00 Processor crash information
----------------------------------

Processor Internal Registers:


ASN  = 00000000.000000FD                     ASTSR/ASTEN =          0000000F
IPL  =          00000004  PCBB = 00000000.03F2E080  PRBR = FFFFFFFF.80C26000
PTBR = 00000000.00001FB1  SCBB = 00000000.000004A8  SISR = 00000000.00000000
VPTB = FFFFFFFC.00000000  FPCR = 00000000.00000000  MCES = 00000000.00000008


 KSP    = 00000000.7FFA1C98
 ESP    = 00000000.7FFA6000
 SSP    = 00000000.7FFAE000
 USP    = 00000000.7AFE7B50


CPU 00 Processor crash information
----------------------------------

                No spinlocks currently owned by CPU 00


CPU 01 Processor crash information
----------------------------------


CPU 01 reason for Bugcheck: CPUEXIT, Shutdown requested by another CPU


Process currently executing on this CPU: SYSTEM_12


Current IPL: 31  (decimal)


CPU database address:  80C53B00


CPUs Capabilities:    QUORUM,RUN


CPU 01 Processor crash information
----------------------------------

General registers:

R0   = FFFFFFFF.80C53B00  R1   = FFFFFFFF.80C53B00  R2   = 00000000.00950610
R3   = FFFFFFFF.835379A0  R4   = 00000000.005D0001  R5   = FFFFFFFF.81106940
R6   = 00000000.005D0001  R7   = FFFFFFFF.80FFF140  R8   = FFFFFFFF.81131340
R9   = FFFFFFFF.80FFF140  R10  = FFFFFFFF.80FFF140  R11  = FFFFFFFF.81106940
R12  = FFFFFFFF.81119100  R13  = FFFFFFFF.835379A0  R14  = FFFFFFFF.80C53B18
R15  = 00000000.00000204  R16  = 00000000.000006AC  R17  = 00000000.00000047
R18  = 00000000.00000000  R19  = 00000000.00000001  R20  = 00000000.00000001
R21  = FFFFFFFF.81119100  R22  = 00000000.00000081  R23  = FFFFFFFF.83505C40
R24  = FFFFFFFF.80C53B00  AI   = FFFFFFFF.80C53B00  RA   = 00000000.00000000
PV   = FFFFFFFF.83505B90  R28  = FFFFFFFF.83505000  FP   = 00000000.7FFA1CD0
PC   = FFFFFFFF.80068774  PS   = 18000000.00001F04

 


CPU 01 Processor crash information
----------------------------------
Processor Internal Registers:


ASN  = 00000000.000000B6                     ASTSR/ASTEN =          0000000F
IPL  =          0000001F  PCBB = 00000000.06684080  PRBR = FFFFFFFF.80C53B00
PTBR = 00000000.00016D8A  SCBB = 00000000.000004A8  SISR = 00000000.00000000
VPTB = FFFFFFFC.00000000  FPCR = 00000000.00000000  MCES = 00000000.00000008


 KSP    = 00000000.7FFA1BD8
 ESP    = 00000000.7B0230F8
 SSP    = 00000000.7FFAB980
 USP    = 00000000.7AEE1AF0


CPU 01 Processor crash information
----------------------------------

                Spinlocks currently owned by CPU 01

MAILBOX                            Address   83538400
Owner CPU ID       00000001        IPL       0000000B
Ownership Depth    00000001        Rank      0000000C
CPUs Waiting       00000000        Index     0000002C
Timeout Interval   000186A0

 

Version of netacp installed:

 

Image Identification Information

  image name: "NETACP"
  image file identification: "X-A18"
  image file build identification: "X6PE-0040100000"
  link date/time: 19-NOV-1998 12:25:46.36
  linker identification: "A11-39"

 

$ Prod sh hist

 

PRODUCT                             KIT TYPE    OPERATION   DATE AND TIME
----------------------------------- ----------- ----------- --------------------
DEC AXPVMS VMS712_LAN V3.0          Patch       Install     26-MAR-2011 13:14:16
DEC AXPVMS H22AGENT V2.0-5          Full LP     Install     18-JAN-2005 14:22:37
DEC AXPVMS H22AGENT V2.0-5          Full LP     Install     28-DEC-2004 09:46:11
DEC AXPVMS TCPIP_ECO V5.1-153       Patch       Install     03-MAY-2002 07:27:02
DEC AXPVMS VMS62TO71U2_PCSI V2.0    Patch       Install     03-MAY-2002 07:14:20
DEC AXPVMS VMS712_UPDATE V3.0       Patch       Install     03-MAY-2002 07:13:22
DEC AXPVMS TCPIP V5.1-15            Full LP     Install     12-APR-2002 07:06:43
DEC AXPVMS UCX V4.2-21              Full LP     Remove      12-APR-2002 07:05:05
DEC AXPVMS VMS712_SCSI V1.0         Patch       Install     17-AUG-1999 20:12:34
DEC AXPVMS VMS712_UPDATE02 V1.0     Patch       Install     17-AUG-1999 20:11:57
DEC AXPVMS DECNET_PHASE_IV V7.1-2   Full LP     Install     17-AUG-1999 20:03:16
DEC AXPVMS OPENVMS V7.1-2           Platform    Install     17-AUG-1999 20:03:16
DEC AXPVMS UCX V4.2-21              Full LP     Install     17-AUG-1999 20:03:16
DEC AXPVMS VMS V7.1-2               Oper System Install     17-AUG-1999 20:03:16

12 REPLIES
Volker Halle
Honored Contributor

Re: Latest version of netacp on vms 7.1-2 ?

Rudy,

 

for OpenVMS crashes, please always report the output of the SDA> CLUE CRASH command (or the contents of the CLUE$COLLECT:CLUE$node_ddmmyy_hhmm.LIS file) . I need the instruction stream, in which the BADQHDR crash is happening.

 

To literally answer your question: what is the lastest NETACP on OpenVMS V7.1-2 ?

 

The old CSC website from Australia keeps an image list for the older OpenVMS versions:

 

ftp://ftp.hp.com.au/pub/ecoinfo/ecoinfo/a712i.htm

 

But - according to this list - it looks like there NEVER was a new image provided for NETACP.EXE for OpenVMS Alpha V7.1-2.

 

Volker.

 

 

abrsvc
Respected Contributor

Re: Latest version of netacp on vms 7.1-2 ?

I have a system running V7.1-2 and the image info posted here is the "standard" one unpatched.

Dan

Re: Latest version of netacp on vms 7.1-2 ?

Hello Volker.

 

Forgot about the clue crash. Anyway attached you will find some clue files, should you fine anything.

 

According to the customer, the crashes happened since a month ago and since then there is an increase in decnet connections.

 

This is what i was afraid of concerning the netacp version :-(

 

Thanks and regards,

Rudy

 

 

Volker Halle
Honored Contributor

Re: Latest version of netacp on vms 7.1-2 ?

Rudy,

 

these BADQHDR crashes have nothing to do with the currently executing process/image. These crashes have happened in system context.

 

The crash footprints show BADQHDR crashes in various different parts of the OpenVMS executive. These type of crashes can be caused by pool corruption or hardware/firmware problems.

 

To find possible common symptoms, each crash would need to be more closely examined to find out, exactly why the system crashed with BADQHDR.

 

Could you run with just ONE CPU ? This may at least rule out SMP synchronization problems.

 

Volker.

Re: Latest version of netacp on vms 7.1-2 ?

Hello Volker,

 

After i replied, i looked at the clue files and indeed they are not image specific.

 

Fyi, this ds20 is running as a charon instance. I can ask to stop 1 cpu and see what happens.

They have another ds20 running on the same host without problems (same vms version)

 

Regards,

Rudy

Volker Halle
Honored Contributor

Re: Latest version of netacp on vms 7.1-2 ?

Rudy,

 

then it may be time to start saving ALL crashdumps on this system. And carefully examine EVERY crash to find out, exactly WHY the BADQHDR crash had been reported. There are a couple of different scenarios for this type of crash.

 

If pool corruption is diagnosed as the underlying problem, consider setting SYSTEM_CHECK=1, this would enable additional checks inside the OpenVMS executive and may lead to more crashes, but may be provide clearer symptoms of the problem.

 

And 'hardware/firmware' problems in this configuration may mean CHARON-AXP emulator problems. Look at the CHARON log files as well. Are the CHARON systems running on physical servers (Windows or Linux) ? Or VMware ? Current version of CHARON-AXP ?

 

Volker.

 

PS: The company I'm working for (Invenate) is a Stromasys VAR and I may help you with CHARON as well.

Re: Latest version of netacp on vms 7.1-2 ?

Volker,

 

I'm working for Avitor Belgium, var for Stromays as well :-)

 

I did setup the charon environment at this customer. Charon log file didn't show any errors except the tape mounts and dismounts. See attachment.

 

I'll check with the customer when i can go set up the system with the SYSTEM_CHECK=1

 

Thanks for your help and advices.

 

Rudy

Volker Halle
Honored Contributor

Re: Latest version of netacp on vms 7.1-2 ?

Rudy,

 

if you carefully look at the instruction streams of those 4 BADQHDR crashes, you'll notice, that the code preceeding the bugcheck always decrements some register (either R22 or R0) and branches 'back', if the resulting value is greater than 0 (e.g. BGT R22,#XFFFFDA). In all those crashes, the value of the register being DECREMENTED is ZERO at the time of the crash !

 

This code sequence may be from checking, whether the 'interlocked' instructions succeeded. These BADQHDR bugchecks are typically used with self-relative queues to prevent an endless loop, if the conditional Load or Store are failing after lots of attempts ! As an example, look at the $REMQHI_R macro in LIB.MLB

 

One might want to look at what the 'other' CPU in those crashes is doing at the time of the crash. This may really be a synchronization issue here !

 

And you're not using the latest and greatest version of CHARON-AXP/DS20, which is now V4.4.148-02

 

Volker.

Re: Latest version of netacp on vms 7.1-2 ?

Volker,

 

Ok, i didn't see that (though i have the kit here, shame on me). I'll plan an upgrade of charon.

 

I'll reply how it turned out.

 

Thanks for your help.

 

Rudy

Volker Halle
Honored Contributor

Re: Latest version of netacp on vms 7.1-2 ?

Rudy,

 

could you also provide the 'crash history' of this node ($ TYPE CLUE$HISTORY) ?

 

Does the 'other' CHARON instance on this Windows Server also have 2 CPUs ?

 

Volker.

Re: Latest version of netacp on vms 7.1-2 ?

Volker,

 

My mistake, my memory was failing, the other system is a ds10 and not a ds20, so only one cpu. So indeed, big difference.

 

Looking at the charon config file of the ds20, I did set the number of io cpus to 2, I might set it to 1 and see if it behaves better.

 

Clue$history is attached.

 

Rudy

 

Volker Halle
Honored Contributor

Re: Latest version of netacp on vms 7.1-2 ?

Rudy,

 

thanks for the CLUE$HISTORY file. Let me guess, CHARON-AXP/DS20 has been implemented shortly before 13-MAY-2012. What happened on this system between 7-JUN-2012 and 22-DEC-2012 ? Why were there no BADQHDR crashes ?

 

If the customer agrees, change 'set session n_of_cpus=2' to 1 in the CHARON config file or just issue a STOP/CPU 01 under OpenVMS and see if the frequency of crashes drops.

 

Changing the numer of 'IO CPUs' won't help. Those are the additional Windows CPUs/cores reserved by the CHARON emulator to support IO.

 

Note that these BADQHDR crashes can ONLY happen on SMP-systems, i.e. systems with MORE THAN 1 active CPU ! This specific crash needs 2 CPUs accessing the same memory location simultaneously.

 

Please consider to escalate this problem to Stromasys for additional analysis. The underlying reasons may still be pool corruption, but why should that start after migration to CHARON-AXP ?

 

Volker.