Operating System - OpenVMS
1839216 Members
4230 Online
110137 Solutions
New Discussion

Re: "Kernel stack not valid" during system startup's SYSMAN IO AUTO

 
Galen Tackett
Valued Contributor

"Kernel stack not valid" during system startup's SYSMAN IO AUTO

I have a DS10L 617MHz running VMS V7.3-1 is getting a Kernel Stack Not Valid fault. See the attachment for configuration details, and for an explanation of why we're pretty much tied to VMS V7.3-1 at present.

The error occurs pretty early in system startup when right near the beginning of VMS$DEVICE_CONFIG.COM, the SYSMAN command IO AUTO is executed.

I increased KSTACKPAGES as high as 20 with no effect and then started digging into the startup process.

After a lot of experimenting I finally pinned this down. I worked around it as follows.

In SYCONFIG.COM
----------------
$ MCR SYSMAN IO SET EXCLUDE=GFA0

In SYLOGICALS.COM
-----------------
$ MCR SYSMAN IO SET EXCLUDE=""
$ MCR SYSMAN IO AUTO/SELECT=GFA0


A probably unrelated problem, though it also involves GFA0:, is that the graphics display can't be changed from 1024 x 1024 with 60 Hz refresh no matter what I try to set in DECW$PRIVATE_SERVER_SETUP.COM.

This system was formerly problem free running VMS V7.3-2 for another application. Before installing V7.3-1 a DELETE/ERASE was done on DQA0:, so there's no chance of some bit the V7.3-2 environment still being there.

The floor is now open for discussion :-)
34 REPLIES 34
Wim Van den Wyngaert
Honored Contributor

Re: "Kernel stack not valid" during system startup's SYSMAN IO AUTO

Had the same resolution problem on my AS500. Had to open the box and change a switch. May be check the doc of your graphics card ?

Wim
Wim
Wim Van den Wyngaert
Honored Contributor
Joseph Huber_1
Honored Contributor

Re: "Kernel stack not valid" during system startup's SYSMAN IO AUTO

I think VMS731_Update 6 does not include
VMS731_GRAPHICS-V0400

Maybe it cures (some of) the problems.
http://www.mpp.mpg.de/~huber
Galen Tackett
Valued Contributor

Re: "Kernel stack not valid" during system startup's SYSMAN IO AUTO

I assigned few or zero points to the responses above because they didn't really help at all. However, do be aware that I value the time and thought that went into them. Thanks, Joseph and Vim.

Joseph,

Graphics V4.0 is listed as one of the installed ECOs. Perhaps you overlooked it somehow.

All,

I forgot to mention that the graphics card is a PBXGF-AB (PCI Oxygen VX1.) It does not have a switch of any kind. The resolution and refresh rates are supposed to be settable by software.

I took a look at DECW$DEVICE_CONFIG_GF.COM. The default resolution for this thing is 1024x768 and the default refresh is 70 Hz.

xdpyinfo shows the resolution as 1280x1024, which is what I specified in DECW$PRIVATE_SERVER_SETUP.COM.

I have the same model monitor (L1925) on my desk working at 1280x1024 with 75 Hz refresh, so I know the monitor supports this.
Galen Tackett
Valued Contributor

Re: "Kernel stack not valid" during system startup's SYSMAN IO AUTO

Also, my main problem is the kernel stack not valid error. I don't want to leave my workaround in place indefinitely. But we can get by indefinitely with the incorrect resolution. It just makes things a little ugly.
Wim Van den Wyngaert
Honored Contributor

Re: "Kernel stack not valid" during system startup's SYSMAN IO AUTO

Galen Tackett
Valued Contributor

Re: "Kernel stack not valid" during system startup's SYSMAN IO AUTO

System firmware is V7.2-1. (This was listed as "Console firmware" in my attachment but I guess the two terms are pretty interchangeable.)
Wim Van den Wyngaert
Honored Contributor

Re: "Kernel stack not valid" during system startup's SYSMAN IO AUTO

I would do the test with lower firmware version (cd 6.2 is the lowest allowed) to see if the problem is firmware related.

fwiw

Wim
Wim
Volker Halle
Honored Contributor

Re: "Kernel stack not valid" during system startup's SYSMAN IO AUTO

Galen,

to find out more about the reason for the KRNLSTAKNV crash, could you post the full CLUE file (see CLUE$COLLECT:CLUE$node_ddmmyy_hhmm.LIS) ?

Whereas it may not be possible to actually solve this type of problem by looking at the data in a CLUE file, it may give additional hints.

Note that you need to set AUTO_ACTION RESTART to obtain a crash and not just a kernel stack not valid HALT.

Volker.
Galen Tackett
Valued Contributor

Re: "Kernel stack not valid" during system startup's SYSMAN IO AUTO

I tried setting AUTO_ACTION to RESTART, but it looks like CLUE doesn't get started until after the point in startup where the crash occurs.

The crash happens right at the top of SYS$STARTUP:VMS$DEVICE_STARTUP.COM, where there's an invocation of SYSMAN that does this:
IO AUTOCONFIGURE FTA0...
IO AUTOCONFIGURE MPA0...
IO AUTOCONFIGURE ALL
The third IO AUTOCONFIGURE is where the crash occurs.

CLUE doesn't get started until later in this command procedure.

I'll see if I can [temporarily] edit the appropriate startup .COM files so that I get something from CLUE.

Volker Halle
Honored Contributor

Re: "Kernel stack not valid" during system startup's SYSMAN IO AUTO

Galen,

once you've rebooted your system with the workaround in place, you could run CLUE$STARTUP.COM manually to look at the most recent dumpfile:

$ SET PROC/NAME=STARTUP
$ @SYS$STARTUP:CLUE$STARTUP
$ SET PROC=NAME=

Otherwise, you could issue the following command manually:

$ ANAL/CRASH SYS$SYSTEM:
SDA> CLUE HISTORY
if it complains about 'already analyzed', use
SDA> CLUE HISTORY/OVER

Volker.
Galen Tackett
Valued Contributor

Re: "Kernel stack not valid" during system startup's SYSMAN IO AUTO

CLUE didn't produce any kind of file when it was run.

ANA/CRASH SYS$SYSTEM: gives me:
%SDA-E-NOTALPHADUMP

I'm not too surprised at that.

When it halted I wrote down the contents of the registers, pc, psl, sp, etc. but I doubt that it's worth listing them here.
Volker Halle
Honored Contributor

Re: "Kernel stack not valid" during system startup's SYSMAN IO AUTO

Galen,

when AUTO_ACTION is set to RESTART, the system should write a crashdump and reboot, if any unexpected CPU HALT occurs (like for a kernel stack not valid). Did this happen ? Did you see those bugcheck messages on the console ?

Without a dump, further analysis may not be possible.

Volker.
Galen Tackett
Valued Contributor

Re: "Kernel stack not valid" during system startup's SYSMAN IO AUTO

Volker,

Although AUTO_ACTION is set to REBOOT, the system doesn't write a crash dump. There's no VMS bugcheck output at all. I just abruptly get:

Halted CPU 0

halt code = 2
kernel stack not valid halt


So we may not be able to go much further with this.

I did a register dump myself and tediously by hand copied the register contents to paper. But I doubt it would be worth anything to post them.

Any ideas on my other problem?

> A probably unrelated problem, though it also involves GFA0:, is that the
> graphics display can't be changed from 1024 x 1024 with 60 Hz refresh
> no matter what I try to set in DECW$PRIVATE_SERVER_SETUP.COM.
Volker Halle
Honored Contributor

Re: "Kernel stack not valid" during system startup's SYSMAN IO AUTO

Galen,


Although AUTO_ACTION is set to REBOOT...


Do you really meant to write REBOOT ? It should be RESTART - and in that case, VMS should write a dump and reboot afterwards.

Volker.
Galen Tackett
Valued Contributor

Re: "Kernel stack not valid" during system startup's SYSMAN IO AUTO

OOPS!

Yes, I did mean RESTART instead of REBOOT.

But there is definitely no sign of a dump getting written. And ANA/CRASH SYS$SYSTEM: tells me:

%SDA-E-NOTALPHADUMP

Is this perhaps possible for kernel stack not valid to be uncatchable or impossible to handle as a normal bugcheck, in some circumstance?
Andy Bustamante
Honored Contributor

Re: "Kernel stack not valid" during system startup's SYSMAN IO AUTO


What is the value of the system parameter DUMPSTYLE?

Andy

If you don't have time to do it right, when will you have time to do it over? Reach me at first_name + "." + last_name at sysmanager net
Galen Tackett
Valued Contributor

Re: "Kernel stack not valid" during system startup's SYSMAN IO AUTO

DUMPSTYLE=9
Volker Halle
Honored Contributor

Re: "Kernel stack not valid" during system startup's SYSMAN IO AUTO

Galen,

I have seen enough KRNLSTAKNV crashdumps to believe, that this mechanism generally works. Once the CPU halts unexpectedly, the SRM console firmware is responsible for restarting OpenVMS (if AUTO_ACTION = RESTART) at the restart entry point (if memory is still valid) and OpenVMS then decides that it's being restarted and writes a 'restart' crashdump (e.g. KRNLSTAKV, MCHECKPAL, HALT etc.).

You should at least get the bugcheck output written on the console terminal (blue screen). Any errors writing the dump should also be output to the console terminal.

Could you record the console output after switching to a serial console ?

Volker.
Galen Tackett
Valued Contributor

Re: "Kernel stack not valid" during system startup's SYSMAN IO AUTO

OK, here's what I did:

[at graphics console]
>>> set console serial
>>> set auto_action restart
>>> init

[a few moments later, at serial console]
>>> boot

And, guess what--are you sure you're ready for this?

IT DIDN'T CRASH.

That's right. No "kernel stack not valid" message, no nothin' :-)

I booted it up this way again a couple of times and saw the same.

So I set the console back to graphics (leaving AUTO_ACTION at RESTART), and the "kernel stack not valid" (with no crash dump) came right back.

???
Galen Tackett
Valued Contributor

Re: "Kernel stack not valid" during system startup's SYSMAN IO AUTO

To clarify a glaring ambiguity in my previous reply, I must state that "no nothin'" means that the system booted normally.

:-)
Volker Halle
Honored Contributor

Re: "Kernel stack not valid" during system startup's SYSMAN IO AUTO

Galen,

looks like you've found another workaround ;-)

So when booting with the graphics console and AUTO_ACTION = RESTART, you're saying that you get a 'kernel stack not valid' console message (with PC and PSL) and the system just drops to the console prompt >>>

If that's true, I would consider this a bug in the firmware. But as the graphics controller is also involved here (in the display of these messages), things may be more complicated.

I'll try to write a little MACRO program to produce a clean KRNLSTAKNV crash and you may want to give it a try...

Volker.
Galen Tackett
Valued Contributor

Re: "Kernel stack not valid" during system startup's SYSMAN IO AUTO

> you're saying that you get a 'kernel stack not valid' console message (with PC
> and PSL) and the system just drops to the console prompt >>>

Yep.

> If that's true, I would consider this a bug in the firmware.

I'm thinking along the same lines.

I've thought about backing the firmware down to, say, V6.4 or 6.5. Maybe I'll have a chance to do so today.

(I wish I could also reinstall VMS V7.3-2 to check my memory on whether that version crashed or not. I'm not sure I ever booted it with firmware V7.2.)
Volker Halle
Honored Contributor

Re: "Kernel stack not valid" during system startup's SYSMAN IO AUTO

Galen,

please find attached a little Macro-32 program to force a clean KRNLSTAKNV crash and the console output from running this program. Your DS10L should behave in the same way.

NOTE: this program will crash your system immediately.

Volker.