Operating System - Tru64 Unix
1754392 Members
2033 Online
108813 Solutions
New Discussion

crash alpha 4100

 
SOLVED
Go to solution
Aaron Biver_2
Frequent Advisor

Re: crash alpha 4100

Michael,

Let me first of all say that I am truly sorry that you ran into this on a production machine. You have my sympathy. You also have my full attenion.

I agree that we should prevent this particular crash from occurring. The manpage explicitly states that an exception database is used for purposes such as this - when some fw versions don't behave. The problem here might be that the exception database might not be up to date. I promise to look into it (I will file an internal development problem report).

As for the fix:
If we know ahead of time that setting some particular variable, on some particular platform, with some particular version of fw (or any version of fw, in this case), will cause a panic, it should be an easy change to make. I think we can say that these condtions are true, and it is now a matter of imlpementing a fix. I think I can just
update the existing exception database.

This solution adopts a reactive approach to the generic problem (i.e. that some variables might be unsafe to set). It involves taking known panic cases (such as yours) and entering them into the database so they won't happen with future versions of the OS.


so far, the only two instances I've ever heard of that can crash a system are both on the 4100 family, and they are the variables:
-> sys_serial_num
-> ewa0_mode (or any other ew*_mode)
Don't try these at home, kids.

However, a proactive solution, in which no variable is allowed to be set unless it is on a "good" list, would be unrealistic. There are too many platform/fw-version/variable combinations to test every one.
Also, we risk breaking binary compatibility for unsupported uses of consvar, in case someone is using consvar to store some info in console variables.

Best regards, and be as careful with consvar as possible.

Aaron Biver
Tru64 Kernel
Michael Schulte zur Sur
Honored Contributor

Re: crash alpha 4100

Aaron,

thanks for your sympathy. So far I tend to believe Johan that it happens more likely on multi processor machines. I have tried it on single processor 4100 and it worked. To me sys_serial_num was just a text stored in one place in the nvram. Why this could cause a panic was a mystery to me. To me all these parameter were things only important at startup.
The idea with the positive list may not be good also because you can create your own parameter.
Who has a 4100 at home? Me not! LOL!!
Although I wonder if I could get our 4cpu 4gb 4100 when they are sorted out. It would be a nice machine to have. ;-)
Can you shed some light on what is done with the sys_serial_num that the change could cause a panic?

thanks,

Michael