HPE Community read-only access December 15, 2018
This is a maintenance upgrade. You will be able to read articles and posts, but not post or reply.
Hours:
Dec 15, 4:00 am to 10:00 am UTC
Dec 14, 10:00 pm CST to Dec 15, 4:00 am CST
Dec 14, 8:00 pm PST to Dec 15, 2:00 am PST
Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

SYSINIT SSRVEXCEPT, Unexpected system service exception

 
SOLVED
Go to solution
Mario Dhaenens
Frequent Advisor

SYSINIT SSRVEXCEPT, Unexpected system service exception

Hi,

I created a new root on a I64 system disk.
I joined a new I64 server to an existing VMS Cluster but after a reboot it did a bugcheck dump (SYSINIT)during the startup.
(I use LAN as cluster interconnect PE-driver)

After the second reboot it started OK.

I thought it was hardware related. I replaced the server but agian the same result (first reboot with bugcheck dump, after second reboot it was OK.)

Is this a known issue on I64?

Below you see some details.

HP OpenVMS Industry Standard 64 Operating System, Version V8.3-1H1

%CNXMAN, Sending VMScluster membership request to system NVS
%CNXMAN, Now a VMScluster member -- system NVO
%MSCPLOAD-I-CONFIGSCAN, enabled automatic disk serving

**** OpenVMS I64 Operating System V8.3-1H1 - BUGCHECK ****

** Bugcheck code = 000003C4: SSRVEXCEPT, Unexpected system service exception
** Crash CPU: 00000000 Primary CPU: 00000000 Node Name: NVO
** Highest CPU number: 00000001
** Active CPUs: 00000000.00000003
** Current Process: SYSINIT
** Current PSB ID: 00000001
** Image Name: SYSINIT.EXE

PGQBT-I-INIT-UNIT, boot driver, PCI device ID 0x2422, FW 4.00.90
PGQBT-I-BUILT, version X-30, built on Oct 31 2007 @ 16:57:02
PGQBT-I-LINK_WAIT, waiting for link to come up
PGQBT-I-TOPO_WAIT, waiting for topology ID

**** Starting compressed selective memory dump at 28-APR-2009 13:13...
................................................................................
................................................................................
.................................................................
** System space, key processes, and key global pages have been dumped.
** Now dumping remaining processes and global pages...
.
...Complete ****

Installed patches on OpenVMS 8.3-1H1.

SYS_NVO$ product show history
------------------------------------
PRODUCT
------------------------------------
HP I64VMS VMS831H1I_ACRTL V3.0
HP I64VMS VMS831H1I_DEBUG V2.0
HP I64VMS VMS831H1I_ICXXL V2.0
HP I64VMS VMS831H1I_JOBCTL V2.0
HP I64VMS VMS831H1I_PPPD V1.0
HP I64VMS VMS831H1I_SHADOWING V1.0
HP I64VMS VMS831H1I_SYS V3.0
HP I64VMS VMS831H1I_UPDATE V4.0
HP I64VMS DPLUSECO01 V8.3-1H1
13 REPLIES
John Gillings
Honored Contributor

Re: SYSINIT SSRVEXCEPT, Unexpected system service exception

Mario,

There's not enough information in your post to identify a crash footprint, and most ordinary users don't see enough crashes to be able to identify footprints anyway (that's why we run OpenVMS...)

The people who collect crash footprints are inside HP, so please log a case with HP Customer Support. Make sure you have saved your system dump file, and send your CLUE crash summary with the problem report.
A crucible of informative mistakes
Mario Dhaenens
Frequent Advisor

Re: SYSINIT SSRVEXCEPT, Unexpected system service exception

I have added clue crash information.

Regards,

/Toine
Volker Halle
Honored Contributor
Solution

Re: SYSINIT SSRVEXCEPT, Unexpected system service exception

Toine,

the immediate reason for the crash is an access violation, as R4 contains an invalid address FFFFFFFF.F55BE8C4, which causes the instruction st4 [r4] = r8 to try to write to an inaccessible address.

Try the SDA> CLUE REGISTER command, it may provide clues about the context, as it tries to decode the data structure types pointed to by the various registers.

The system uptime makes me wonder: 2 hours in SYSINIT ? Or was there an immediate time change, when the node joined the cluster ?

You're running the most recent SYSINIT.EXE, but the problem most likely is in the circumstances or environment making SYSINIT just a victim.

In the 'good old' OpenVMS days, every crash should have been escalated to HP OpenVMS engineering to enable them to solve those kinds of problems and prevent re-occurences. Access to the current source code listings and the dump is required to analyze the underlying problem leading to this crash.

I still do collect crash footprints and I've never seen this specific crash.

Volker.
Mario Dhaenens
Frequent Advisor

Re: SYSINIT SSRVEXCEPT, Unexpected system service exception

Hi,

This is the output of Clue Register

Current Registers: Process index: 0002 Process name: SYSINIT PCB: A1A3D700 (CPU 0)
------------------------------------------------------------------------------------------
R0 = 00000000.00000000
GP = 00000000.00310000
R2 = 00000000.00000000
R3 = FFFFFFFF.A20E9AD8
R4 = FFFFFFFF.F55BE8C4
R5 = 00000000.00010464
R6 = 00000000.00000006
R7 = 00000000.00000000
R8 = 00000000.534E4114
R9 = 00000000.534E4180
R10 = 00000000.00000000
R11 = 00000000.00000000
SP = 00000000.7FF678F0
TP = 00000000.0105A1C8
R14 = 00000000.00000000
R15 = 00000000.00120768
R16 = 00000000.00010464
R17 = FFFFFFFF.FFFFFFF8
R18 = 00000000.00000008
R19 = 00000000.00000010
R20 = 00000000.00010460
R21 = 00000000.00000000
R22 = 00000000.00000020
R23 = 00000000.00000103
R24 = 00000000.00120760
R25 = 00000000.00000002
R26 = 00000000.00000000
R27 = 00000000.00015663
R28 = 00000000.00010460
R29 = 00000000.7FF43EB0
R30 = 000007FD.C0000258
R31 = 00000000.00010460

Volker Halle
Honored Contributor

Re: SYSINIT SSRVEXCEPT, Unexpected system service exception

Doesn't help. No registers point to any data structures, which CLUE could decode.

Log a call with HP. Send them the full CLUE file from CLUE$COLLECT:CLUE$NVO_ddmmyy_hhmm.LIS and be prepared to also provide the system dump file.

Volker.
Volker Halle
Honored Contributor

Re: SYSINIT SSRVEXCEPT, Unexpected system service exception

Toine,

does this happen on the first reboot after AUTOGEN after creating a new root ? Look at the creation time of SYS$SYSTEM:SYS$ERRLOG.DMP. Did that file got created after the initial boot and before the reboot, which caused the crash ?

SYSINIT will read that file - if it exists - and try to extract the errorlog entries, so that they could be logged by ERRFMT. This file does not exist for the initial boot and is empty at the time of the first reboot, which crashes. It will be written to during that crash, so on the next boot, it will have valid entries...

You could try to delete the newly created SYS$ERRLOG.DMP file in that root (from another node in that cluster) before the first reboot after AUTOGEN. If the reboot then succeeds, you have isolated a piece in the puzzle...

Note that the above is pure speculation, what may be happening, but it at least would explain what you are seeing.

Volker.
Volker Halle
Honored Contributor

Re: SYSINIT SSRVEXCEPT, Unexpected system service exception

... and a general advice on crashes:

FIRST assume, it's a software problem. Look at the errlog entries immediately preceeding the crash with:

SDA> CLUE ERRLOG

If there are no entries within less than 1 second before the crash itself, don't bother to think about hardware.

You should only immediately assume the problem to be hardware, if the crash is one of MACHINECHK, IOMACHINECHK or MCHECKPAL

Volker.
H.Becker
Honored Contributor

Re: SYSINIT SSRVEXCEPT, Unexpected system service exception

The crash is in "restore error log". SYSINIT found error log information, copied them in into the pool and now tries to prepare that for ERRFMT.

If you followed the recommended procedures to set up the new member of the cluster, then file a problem report.
Volker Halle
Honored Contributor

Re: SYSINIT SSRVEXCEPT, Unexpected system service exception

Toine,

do you happen to have high-water marking DISABLED on your system disk ? Does SHOW DEV/FULL SYS$SYSDEVICE show the string 'file high-water marking' in the Volume Status: line ?

If file high-water marking is disabled, the newly created SYS$ERRLOG.DMP file may contain garbage from previous contents of the allocated disk blocks. This could confuse SYSINIT when trying to read/copy errlog entries from that file.

Consider to use $ SET VOLUME/HIGH SYS$SYSDEVICE to enable high-water marking - at least during the creation of SYS$ERRLOG.DMP.

Volker.
Mario Dhaenens
Frequent Advisor

Re: SYSINIT SSRVEXCEPT, Unexpected system service exception

Hello,

The highwater_marking is disabled on the system disk.

Volume Status: ODS-5, subject to mount verification, protected subsystems
enabled, write-through caching enabled, special files enabled.
Volume is also mounted on NVO, NVJ, NVL.


I will try your proposed solution.
Thank you.

/Toine
Volker Halle
Honored Contributor

Re: SYSINIT SSRVEXCEPT, Unexpected system service exception

Toine,

if you have high-water marking disabled, then just DUMP [SYSn.SYSEXE]SYS$ERRLOG.DMP before the first reboot. If there are any blocks with NON-zero contents, this may cause a crash.

If you've enabled high-water marking, all blocks will dump as all-zeroes before the first reboot.

Please report back, if the analysis was correct and this solution has worked.

Volker.
Mario Dhaenens
Frequent Advisor

Re: SYSINIT SSRVEXCEPT, Unexpected system service exception

Hello,

The crash was caused by SYS$ERRLOG.DMP.

Thank you all for the trouble shooting

P.S.
Now I have a Cluster with 6 Alpha and 4 I64 servers.

/Toine
Mario Dhaenens
Frequent Advisor

Re: SYSINIT SSRVEXCEPT, Unexpected system service exception

Issue solved and closed