- Community Home
- >
- Servers and Operating Systems
- >
- Legacy
- >
- Operating System - Tru64 Unix
- >
- crash alpha 4100
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-02-2005 03:57 AM
02-02-2005 03:57 AM
has anyone seen this error?
Feb 2 17:42:20 fra10d vmunix: pmap_update_send: missing ack from cpu 3
Feb 2 17:42:20 fra10d vmunix: panic (cpu 0): tb_shoot ack timeout
I have been doing a
consvar -s sys_serial_num xxxxxxxx
thanks for all input,
Michael
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-02-2005 04:02 AM
02-02-2005 04:02 AM
Re: crash alpha 4100
the machine is running 5.1A pk6
Michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-02-2005 06:14 AM
02-02-2005 06:14 AM
Re: crash alpha 4100
I asked the kernel developers if they'd heard about this. The summary is, if doing that consvar command causes your system to panic, don't do that consvar command.
They're going to look into this further and talk to the firmware engineers to see if they can narrow it down a little. It may just be that it's not allowed to set the sys_serial_num using consvar. It may be that Tru64 should be funnelling this particular request to the master cpu (since the panic was on cpu 3 it looks like it isn't).
There have been previous problems with the 4100 when trying to set console variables using consvar that are not allowed to be set. Sounds like the bottom line is that you should avoid using consvar on a 4100 if possible.
Ann
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-02-2005 06:20 AM
02-02-2005 06:20 AM
SolutionOfficial answer is: "the sys_serial_num should only be set from the console command, it is not allowed to set the sys_serial_num using consvar".
Apparently there's a long, highly technical, explanation why this is true, but I'll only push them for it if you're interested in all the gory details.
Ann
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-02-2005 06:37 PM
02-02-2005 06:37 PM
Re: crash alpha 4100
thanks for your answers. I found it out by myself using google. Well, this does not make much sense too me. I am using consvar to set other parameters.
I used it on another machine and yes with the same result. I am interested in details because I am certainly going to be asked why this? I can't help but think it must be a bad implementation.
thanks,
Michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-02-2005 11:01 PM
02-02-2005 11:01 PM
Re: crash alpha 4100
can you imagine how embarassing this is to shoot down a production machine with a simple command? I have set the serial number on other machines without a problem. How could I have anticipated that? Figuring out it would be a hardware problem I used that command on another 4100 with the same result last night.
Michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-03-2005 03:20 AM
02-03-2005 03:20 AM
Re: crash alpha 4100
Michael,
"consvar -s" is only SUPPORTED for parameters you can show with "consvar -l" (look for word supported in manpage consvar).
About the gory details.....The setting of some parameters makes that the CPU has a lot of instructions to execute in console firmware context mode, leaving no chance for the kernel to do it's job. To guarantee CPU-cache coherency that kernel has some maximum time interval in which the look-aside translation buffer has to be update on all CPU's. If not updated in time ==> panic.
You probably can get away with these kind of consvar commands on systems with only one CPU, but on multi-CPU systems the panic risc is high, especially if the consvar is executed by CPU0.
So... the manpage told you so, and no, NOT an impementation issue.
__ Johan ;-)
_JB_
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-03-2005 03:37 AM
02-03-2005 03:37 AM
Re: crash alpha 4100
thank you for your explanation. It explains why it does not happen on a single cpu machine. However I do not see where man consvar tells me of any danger.
greetings,
Michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-03-2005 03:57 AM
02-03-2005 03:57 AM
Re: crash alpha 4100
The manpage does not explicitely tell you there is a danger, but the text with the "-l"-switch tells you how to know the supported variables. If the setting of one of the parameters in that output causes a panic, then there is reason to log a case.
__ Johan.
_JB_
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-03-2005 04:11 AM
02-03-2005 04:11 AM
Re: crash alpha 4100
if something is disabled then it should be rejected and not crash the machine. For example, the first 16 blocks of a disk are read only and any attempt to write to it result in an error. So this is after all a bad implemenation.
thanks for your time,
Michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-03-2005 07:53 AM
02-03-2005 07:53 AM
Re: crash alpha 4100
Let me first of all say that I am truly sorry that you ran into this on a production machine. You have my sympathy. You also have my full attenion.
I agree that we should prevent this particular crash from occurring. The manpage explicitly states that an exception database is used for purposes such as this - when some fw versions don't behave. The problem here might be that the exception database might not be up to date. I promise to look into it (I will file an internal development problem report).
As for the fix:
If we know ahead of time that setting some particular variable, on some particular platform, with some particular version of fw (or any version of fw, in this case), will cause a panic, it should be an easy change to make. I think we can say that these condtions are true, and it is now a matter of imlpementing a fix. I think I can just
update the existing exception database.
This solution adopts a reactive approach to the generic problem (i.e. that some variables might be unsafe to set). It involves taking known panic cases (such as yours) and entering them into the database so they won't happen with future versions of the OS.
so far, the only two instances I've ever heard of that can crash a system are both on the 4100 family, and they are the variables:
-> sys_serial_num
-> ewa0_mode (or any other ew*_mode)
Don't try these at home, kids.
However, a proactive solution, in which no variable is allowed to be set unless it is on a "good" list, would be unrealistic. There are too many platform/fw-version/variable combinations to test every one.
Also, we risk breaking binary compatibility for unsupported uses of consvar, in case someone is using consvar to store some info in console variables.
Best regards, and be as careful with consvar as possible.
Aaron Biver
Tru64 Kernel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-03-2005 07:46 PM
02-03-2005 07:46 PM
Re: crash alpha 4100
thanks for your sympathy. So far I tend to believe Johan that it happens more likely on multi processor machines. I have tried it on single processor 4100 and it worked. To me sys_serial_num was just a text stored in one place in the nvram. Why this could cause a panic was a mystery to me. To me all these parameter were things only important at startup.
The idea with the positive list may not be good also because you can create your own parameter.
Who has a 4100 at home? Me not! LOL!!
Although I wonder if I could get our 4cpu 4gb 4100 when they are sorted out. It would be a nice machine to have. ;-)
Can you shed some light on what is done with the sys_serial_num that the change could cause a panic?
thanks,
Michael