Operating System - OpenVMS
1752292 Members
4804 Online
108786 Solutions
New Discussion юеВ

Kernel stack not valid - after updates

 
SOLVED
Go to solution
Willem Grooters
Honored Contributor

Kernel stack not valid - after updates

Environment: OpenVMS 8.4 on Alpha (PWS500, Firmware 7.2-1).

I installed OpenVMS 8.4 which runs. Applied patches UPDATE 1, SYS and UPDATE 4, reboot in between, without a problem. Next all that were superseded (accoring ITRC) in UPDATE 4, inclusing Decnet and TCPIP ECO 2 - in one go.
That may have been a mistake, because now startup ends with:

halt code = 2
Kernel stack not valid halt
PC = 0

My guess is that it happens when loading the bootsrtap loader - because it's the first next thing shown after "jumping to bootstrtap code". Even booting conversationally, or any flag for that matter, fails.

It's just my test box and re-installing the OS is always possible. But I would like to know what causes the problem.

(Because the system doens't boot, I cannot tell the patch history in detail...)
Willem Grooters
OpenVMS Developer & System Manager
11 REPLIES 11
John Gillings
Honored Contributor

Re: Kernel stack not valid - after updates

A patch has possibly caused a serious issue. Please don't guess. Log an urgent case with HP customer support.
A crucible of informative mistakes
Walt McGaw
Occasional Advisor

Re: Kernel stack not valid - after updates

Hello Willem,

John is correct, you should log a case with us here at HP support. The issue you ran into recently was noted in a customer advisory concerning the Update V4 patch. In a nutshell, the patch replaced the VMB.EXE image, but a write boot command was not issued to point to the correct VMB.EXE on disk. When you added other patches and the "undo" data was over written, the old VMB.EXE that was moved to the PCSI$UNDO directory was removed causing the boot failure.

There is an UPDATE V5 kit for Alpha V8.4 being release in the very near future that will correct that issue. In the mean time, if you can boot another drive (even the install CD for 8.4) you can run writeboot to fix the issue.

Best regards,
Walt McGaw
Hoff
Honored Contributor
Solution

Re: Kernel stack not valid - after updates

That should be APB.EXE and not VMB.EXE; the former is Alpha and the latter is for VAX.

Boot the CD distro or a spare disk, get to DCL, and issue

SET BOOTBLOCK ddcu:

or

RUN SYS$SYSTEM:SYS$SETBOOT
...answer the questions...

or

RUN SYS$SYSTEM:WRITEBOOT
...answer the questions...

with the device targeting your system disk.

The first command is usually the easiest.
Ian Miller.
Honored Contributor

Re: Kernel stack not valid - after updates

CUSTOMER ADVISORY: VMS84A_UPDATE-V0400: Boot failure on Alpha - ID: c02720117
____________________
Purely Personal Opinion
Willem Grooters
Honored Contributor

Re: Kernel stack not valid - after updates

The problem did not occur on the installation of UPDATE 4, since I could reboot without a problem. It was after I next installed the patches that were superseded, according the master ECO list. Not one at a time, followed by a reboot after each install, but all in one PROD INSTALL - and the next reboot failed.

I'l try and recover like Hoff stated, and try to gain some evidence, and log a message to programs@hp.com (I cannot issue a call since I don't have a support contract....)
Willem Grooters
OpenVMS Developer & System Manager
P Muralidhar Kini
Honored Contributor

Re: Kernel stack not valid - after updates

Hi William,

>> and log a message to programs@hp.com
The email id of Office of OpenVMS programs is - OpenVMS.Programs@hp.com.
Please route your query to HP via this email id.

Regards,
Murali
Let There Be Rock - AC/DC
Walt McGaw
Occasional Advisor

Re: Kernel stack not valid - after updates

Sorry Hoff, you are correct. I knew this was apb.exe since I went through this on my alpha as well, but for some reason I typed vmb.exe. My apologies.
Hoff
Honored Contributor

Re: Kernel stack not valid - after updates

>The problem did not occur on the installation of UPDATE 4, since I could reboot without a problem.

Expected.

When this update happens, APB.EXE gets archived or gets deleted, but the contents of the disk blocks (unless you have erase on delete enabled) aren't immediately overwritten.

But the BTBDEF boot block pointers are aimed at what will eventually be overwritten storage.

Until the blocks containing the old APB.EXE contents are overwritten, it'll still work. (Well, delta whatever was fixed in the newer APB.EXE image, given it's the older version that's still running. For now.)

One path toward long-term remediation would be the implementation of a flag in PCSI to spawn a WRITEBOOT or (as BACKUP does) via calling the sys$setbootshr API when the boot file gets replaced. That automates the process of updating the boot block, and avoids a kitting-level mistake that has arisen occasionally over the years.

> It was after I next installed the patches that were superseded, according the master ECO list.

Which then overwrite some or all of the storage from the deleted APB.EXE, and which triggered what amounts to the bootstrap leaping into hyperspace.

Here concludes the lesson in Alpha bootstrap blocks. If you're interested in more detail:

http://labs.hoffmanlabs.com/node/343
http://labs.hoffmanlabs.com/node/28
Hoff
Honored Contributor

Re: Kernel stack not valid - after updates

ps: if you use the PCSI archive/rollback/preserve option, it'll be an update or three before the rollback copies will get nuked. So this error can lurk for a while, and the box will boot nicely until you install something that causes PCSI to trigger the deletion of the rollback copies and then something (else) overwrites the blocks of APB.EXE.