Operating System - Tru64 Unix
1752735 Members
5645 Online
108789 Solutions
New Discussion юеВ

Alpha DS20 shuts down unexpectedly?

 
SOLVED
Go to solution

Alpha DS20 shuts down unexpectedly?

OS: Compaq Tru64 UNIX V5.1B (Rev. 2650)
Platform: Alpha DS20
LOG: See attached file please

Dear all,
We have a Alpha server which shutsdown unexpectedly. I checked both the logs and crashed data but still not sure the cause of the problem.

I feel like it's either a cpu or memory problem (or may be both) but I cannot differentiate at this point?

Any help on this issue is welcome.!

Belo you may find an abstract of log
Feb 3 11:26:46 ds20-2 vmunix:
Feb 3 11:26:46 ds20-2 vmunix: simple_lock: time limit exceeded
Feb 3 11:26:47 ds20-2 vmunix:
Feb 3 11:26:47 ds20-2 vmunix: pc of caller: 0xffffffff00526154
Feb 3 11:26:47 ds20-2 vmunix: lock address: 0xfffffc003fdaf900
Feb 3 11:26:47 ds20-2 vmunix: lock info addr: 0xfffffc0000ff3790
Feb 3 11:26:47 ds20-2 vmunix: lock class name: cam_pd_device3
Feb 3 11:26:47 ds20-2 vmunix: current lock state: 0xc400017800581455 (cpu=?,pc=0xffffffff00581454,busy)
Feb 3 11:26:47 ds20-2 vmunix:
Feb 3 11:26:47 ds20-2 vmunix: panic (cpu 0): simple_lock: time limit exceeded
Feb 3 11:26:47 ds20-2 vmunix: syncing disks... 29
Feb 3 11:26:47 ds20-2 vmunix: Memory trolling not supported, cpu Major id 8, Minor id 4
Feb 3 11:26:47 ds20-2 vmunix: Alpha boot: available memory from 0x2c7a000 to 0x3ff40000

Thanks in advance,
CET
9 REPLIES 9
Ivan Ferreira
Honored Contributor

Re: Alpha DS20 shuts down unexpectedly?

You should first search the patch database. I have found this:

http://www11.itrc.hp.com/service/patch/patchDetail.do?patchid=TCRKIT1000454-V51BB26-E-20060317&sel={tru:tru64:5.1b,}&BC=main|search|

Also, ensure to install WEBES, that will help you to analyze hardware problems and system crash.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?

Re: Alpha DS20 shuts down unexpectedly?

Thanks for the feedback Ivan.
But are you sure that this patch is applicable to my system because it says;
PRODUCT: HP TruCluster Server [R] V5.1B-3
TITLE: HP Tru64 UNIX - Version 5.1B-3
which don't seem to be very much the same of mine?

Regards,
CET
Martin Moore
HPE Pro

Re: Alpha DS20 shuts down unexpectedly?

The problem is that "simple_lock timeout" and its variations is one of the most generic panic messages, second only to "kernel memory fault" IMO. Tru64 UNIX uses simple locks to synchronize data structure access in multiple CPU systems. If the kernel tries to take out such a lock on a data structure, but it doesn't get the lock (because it's already held by another thread) within the timeout period -- which is far longer than a lock is normally held -- then it indicates a serious problem and the system panics.

The tricky part is to determine why the lock didn't get released in the first place; there are numerous reasons, both hardware and software, why this could happen. If you have a support contract with HP, you can get the crash dump analyzed to try and determine the cause. Otherwise, about all you can do is to look for hardware problems and make sure your patches are up to date.

The patch cited above may or may not be applicable to your crash. First, it's possible that you ARE at V5.1B-3; all flavors of V5.1B report the same revision level in uname -a. You'd have to use "dupatch -track -type kit" to see which patches are installed. But since there are so many possible causes of a simple lock timeout, there is no single patch or group of patches that can prevent all of them. The patch cited corrects one specific problem that leads to this type of panic. It would require analysis of the crash dump to see if your panic has the same cause. If it does, the patch will help; but if it has a different cause, the patch won't help.

Hope this helps,
Martin
I work for HPE
A quick resolution to technical issues for your HPE products is just a click away HPE Support Center
See Self Help Post for more details

Accept or Kudo

Martin Moore
HPE Pro
Solution

Re: Alpha DS20 shuts down unexpectedly?

I looked a little further at your log, and realized that the patch cited by Ivan won't help in this case. That patch is for clustered systems. Based on your log, your system appears to be a standalone.

Martin
I work for HPE
A quick resolution to technical issues for your HPE products is just a click away HPE Support Center
See Self Help Post for more details

Accept or Kudo

Re: Alpha DS20 shuts down unexpectedly?

Thanks for this detailed information Martin. Yes this server is standalone that's why I hesitate to install that patch at first place. I will follow your recommendations and hopefully give you an update if I do any progress.

Just being curious; where can I find this crash dump file?

Regards,
CET
Martin Moore
HPE Pro

Re: Alpha DS20 shuts down unexpectedly?

Crash dumps are located in /var/adm/crash by default; you can configure the location by setting SAVECORE_DIR in /etc/rc.config. The crash dump files are named vmunix.N (the kernel file at the time of the crash) and vmzcore.N (the contents of memory at the time of the crash) where N increments with each crash. The crash-data.N file, which gives a summary of crash information, is generated from these files.

Martin
I work for HPE
A quick resolution to technical issues for your HPE products is just a click away HPE Support Center
See Self Help Post for more details

Accept or Kudo

Uwe_9
Advisor

Re: Alpha DS20 shuts down unexpectedly?

Hi,
I just stumbled across your crash-data and had a look at it.
Here is what I found:

Systeminfo:
-----------
AlphaServer DS20 500 MHz
physical memory = 1024.00 megabytes
ncpus: 2
Firmware revision: 7.2-1


Crashtime: struct {
tv_sec = 1233656593
--> Tue Feb 3 11:23:13 2009


from startup we see this:
-------------------------
FCA-2384 : Driver Rev 2.17 : F/W Rev 1.91X6(1.40A0) : wwn 1000-0000-c93b-f4a4
and can conclude with it: the system is BL26 = Patchkit 5 = v5.1B-3



last gasp message:
------------------
simple_lock: time limit exceeded

pc of caller: 0xffffffff00526154
lock address: 0xfffffc003fdaf900
lock info addr: 0xfffffc0000ff3790
lock class name: cam_pd_device3
current lock state: 0xc400017800581455 (cpu=?,pc=0xffffffff00581454,busy)

panic (cpu 0): simple_lock: time limit exceeded



the stacktrace:
---------------
1 panic
2 simple_lock_fault
3 simple_lock_time_violation
4 cdisk_bbr_comp
5 xpt_callback_thread

that is:
--------
PD lock currently held, when a BadBlockReplacement procedure wanted to
complete and needed that lock.

been there, done that :)
------------------------
There is a patch for this scenario that made it into patchkit 6. (OSFPAT02703500540)
and it made it as well already into the the latest patchkit 7.

So you might want to get either patchkit 6 or 7.
T64V51BB27AS0006-20061208.tar / T64V51BB28AS0007-20090312.tar and install that.

Beware, there are likely some ERPs that you need to install additional to the Patchkit.

regards,
--Uwe.

Re: Alpha DS20 shuts down unexpectedly?

I appreciate your kind support on this issue.
I will follow your suggestions for sure and thanks once more...

Re: Alpha DS20 shuts down unexpectedly?

Thanks for everyone who spent sometime to give support on this issue...