Operating System - Tru64 Unix
1748139 Members
3931 Online
108758 Solutions
New Discussion юеВ

Re: crash alpha 4100

 
SOLVED
Go to solution
Michael Schulte zur Sur
Honored Contributor

crash alpha 4100

Hi,

has anyone seen this error?
Feb 2 17:42:20 fra10d vmunix: pmap_update_send: missing ack from cpu 3
Feb 2 17:42:20 fra10d vmunix: panic (cpu 0): tb_shoot ack timeout

I have been doing a
consvar -s sys_serial_num xxxxxxxx

thanks for all input,

Michael
11 REPLIES 11
Michael Schulte zur Sur
Honored Contributor

Re: crash alpha 4100

Oh,

the machine is running 5.1A pk6

Michael
Ann Majeske
Honored Contributor

Re: crash alpha 4100

Hi Michael,

I asked the kernel developers if they'd heard about this. The summary is, if doing that consvar command causes your system to panic, don't do that consvar command.

They're going to look into this further and talk to the firmware engineers to see if they can narrow it down a little. It may just be that it's not allowed to set the sys_serial_num using consvar. It may be that Tru64 should be funnelling this particular request to the master cpu (since the panic was on cpu 3 it looks like it isn't).

There have been previous problems with the 4100 when trying to set console variables using consvar that are not allowed to be set. Sounds like the bottom line is that you should avoid using consvar on a 4100 if possible.

Ann
Ann Majeske
Honored Contributor
Solution

Re: crash alpha 4100

I've got the "official" answer, they're pretty quick!

Official answer is: "the sys_serial_num should only be set from the console command, it is not allowed to set the sys_serial_num using consvar".

Apparently there's a long, highly technical, explanation why this is true, but I'll only push them for it if you're interested in all the gory details.

Ann
Michael Schulte zur Sur
Honored Contributor

Re: crash alpha 4100

Hi,

thanks for your answers. I found it out by myself using google. Well, this does not make much sense too me. I am using consvar to set other parameters.
I used it on another machine and yes with the same result. I am interested in details because I am certainly going to be asked why this? I can't help but think it must be a bad implementation.

thanks,

Michael
Michael Schulte zur Sur
Honored Contributor

Re: crash alpha 4100

Ann,

can you imagine how embarassing this is to shoot down a production machine with a simple command? I have set the serial number on other machines without a problem. How could I have anticipated that? Figuring out it would be a hardware problem I used that command on another 4100 with the same result last night.

Michael
Johan Brusche
Honored Contributor

Re: crash alpha 4100


Michael,

"consvar -s" is only SUPPORTED for parameters you can show with "consvar -l" (look for word supported in manpage consvar).

About the gory details.....The setting of some parameters makes that the CPU has a lot of instructions to execute in console firmware context mode, leaving no chance for the kernel to do it's job. To guarantee CPU-cache coherency that kernel has some maximum time interval in which the look-aside translation buffer has to be update on all CPU's. If not updated in time ==> panic.

You probably can get away with these kind of consvar commands on systems with only one CPU, but on multi-CPU systems the panic risc is high, especially if the consvar is executed by CPU0.

So... the manpage told you so, and no, NOT an impementation issue.

__ Johan ;-)

_JB_
Michael Schulte zur Sur
Honored Contributor

Re: crash alpha 4100

Johan,

thank you for your explanation. It explains why it does not happen on a single cpu machine. However I do not see where man consvar tells me of any danger.

greetings,

Michael
Johan Brusche
Honored Contributor

Re: crash alpha 4100


The manpage does not explicitely tell you there is a danger, but the text with the "-l"-switch tells you how to know the supported variables. If the setting of one of the parameters in that output causes a panic, then there is reason to log a case.

__ Johan.

_JB_
Michael Schulte zur Sur
Honored Contributor

Re: crash alpha 4100

Johan,

if something is disabled then it should be rejected and not crash the machine. For example, the first 16 blocks of a disk are read only and any attempt to write to it result in an error. So this is after all a bad implemenation.

thanks for your time,

Michael