cancel
Showing results for 
Search instead for 
Did you mean: 

L2000 keeps crashing

OscarL
Occasional Contributor

L2000 keeps crashing

Hi all,

I have a L2000 running 11.11 that unfortunately is no longer under support with hp that keeps crashing every other day.

Checking the tombstones I see a repeating pattern every time the system crashes: CPU #0 logs a date in the past (Oct 22nd) and CPU #3 does not log a valid time stamp. Does it mean that one of the CPUs is faulty?

See ts99 attached.

TIA,

OscarL



8 REPLIES
Avinash20
Honored Contributor

Re: L2000 keeps crashing

Please provide us the output of

cat /etc/shutdownload

Also check if you have any crash generated in

# ll -R /var/adm/crash

if u hav the crashinfo.bin script, please post the output of

# ./crashinfo -v -c

## crashinfo is a script which could be obtained from HP
"Light travels faster than sound. That's why some people appear bright until you hear them speak."
OscarL
Occasional Contributor

Re: L2000 keeps crashing

Thanks for the reply:

1. shutdownlog:

# more /var/adm/shutdownlog
12:51 Mon Dec 15, 2008. Reboot:
12:33 Mon Dec 15, 2008. Reboot: (by SAM)
12:33 Mon Dec 15, 2008. Reboot: (by L2000!root)
12:56 Mon Dec 15, 2008. Reboot: (by L2000!root)
13:36 Mon Dec 15, 2008. Reboot:
14:04 Mon Dec 15, 2008. Reboot: (by SAM)
14:04 Mon Dec 15, 2008. Reboot: (by L2000!root)
16:08 Mon Dec 15, 2008. Reboot:
14:48 Wed Dec 31, 2008. Reboot: (by L2000!root)
14:56 Tue Jan 13, 2009. Reboot: (by L2000!root)
15:06 Tue Jan 13, 2009. Halt: (by L2000!root)
15:25 Tue Jan 13, 2009. Reboot: (by L2000!root)

and tombstones:

# cd /var/tombstones
# ll
total 864
-rw-r--r-- 1 root root 22913 Dec 15 11:43 ts82
-rw-r--r-- 1 root root 22913 Dec 15 11:58 ts83
-rw-r--r-- 1 root root 22913 Dec 15 12:40 ts84
-rw-r--r-- 1 root root 22913 Dec 15 13:04 ts85
-rw-r--r-- 1 root root 22913 Dec 15 13:42 ts86
-rw-r--r-- 1 root root 22913 Dec 15 14:11 ts87
-rw-r--r-- 1 root root 22913 Dec 15 16:15 ts88
-rw-r--r-- 1 root root 22913 Dec 31 10:07 ts89
-rw-r--r-- 1 root root 22913 Dec 31 14:54 ts90
-rw-r--r-- 1 root root 22913 Jan 5 11:29 ts91
-rw-r--r-- 1 root root 22913 Jan 8 10:02 ts92
-rw-r--r-- 1 root root 22913 Jan 9 09:18 ts93
-rw-r--r-- 1 root root 22913 Jan 12 08:19 ts94
-rw-r--r-- 1 root root 22913 Jan 13 15:02 ts95
-rw-r--r-- 1 root root 22913 Jan 13 15:21 ts96
-rw-r--r-- 1 root root 22913 Jan 13 15:31 ts97
-rw-r--r-- 1 root root 22913 Jan 13 15:49 ts98
-rw-r--r-- 1 root root 22913 Jan 13 16:00 ts99

2. Nothing under /var/adm/crash

3. Sorry, I couldn't not find that crashinfo.bin script.

Regards,
OscarL
Mel Burslan
Honored Contributor

Re: L2000 keeps crashing

The data you provided does not give a consistent image of what happened since you did not specify if those Jan 13 reboot was actually the last time this server has restrted or something else happened say Jan 14 th which did not get logged into shutdownlog. If everytime you get a reboot line entered into the shutdownlog when the server goes down and comes back up, it may very well mean that, some application or person with root privileges is deliberately rebooting this server, for reasons unknown. But if you see reboot after panic, or no information about your reboot, it is the time that you should start worry about the health of your hardware.

I am wondering, if you are not under support anymore, what kind of a server is this ? Sandbox maybe ? If so, just go to fleabay and llok for used L2000s. They are cheap. You can buy one and cannibalize it for parts. If it is more than a sandbox, i.e. some people are depending on its operation to do their work, i.e. developers, analysts etc. I think it is time to reevaluate that "no support" policy.

My 2 cents.
________________________________
UNIX because I majored in cryptology...
TTr
Honored Contributor

Re: L2000 keeps crashing

There are people in this forum that can read the ts99 and help. I hope one replies.
I only noticed the last two lines of ts99, one CPU is PA8600 and the other is PA8500. Don't know if this is ok or the two have to be the same.
Did you add the second CPU?
Why is the second cpu shown as cpu#3? Shouldn't it be cpu#1? Is it installed in the wrong socket?
Maybe it is worth removing one cpu at a time (I think you need to have a cpu at socket 0) and see if the system becomes stable. Do you run diagnostics (STM) on this server? Are there any hardware errors in the syslog?
OscarL
Occasional Contributor

Re: L2000 keeps crashing

@Mel: Thank you for your reply. The reboots on the 13th (yesterday) were the result of some testing I performed. I interchanged positions between the CPUs to see if the ts file looked something different. Nothing changed, though.

The following reboots were not intended:

-rw-r--r-- 1 root root 22913 Jan 5 11:29 ts91
-rw-r--r-- 1 root root 22913 Jan 8 10:02 ts92
-rw-r--r-- 1 root root 22913 Jan 9 09:18 ts93
-rw-r--r-- 1 root root 22913 Jan 12 08:19 ts94

@TTr: Thanks to you too. If I am not mistaken, for an L class CPU load order is 0,3,1 and 2, so I guess is it the correct socket. I have also noticed the mismatch between the CPUs, but it looked like the same when I swapped CPU position, so I am start thinking that the system board may be the one to blame ....

OscarL
Occasional Contributor

Re: L2000 keeps crashing

Also add that I exercised the CPUs using STM and it finished successfully.

Regards,
OscarL
Andrew Rutter
Honored Contributor

Re: L2000 keeps crashing

hi,

what is the date of the system?

have you checked the coin cell batteries for life? they may need replacing. Also check the one on the gsp card.

can you login to the gsp and post the logs from there

login to gsp
sl
e
n for filter

there maybe something in here we can look at.

Also what is the version of the gsp? get this from typing he at the gsp prompt.

your pdc firmware is also quite an old version.

the cpu slight mismatch shouldnt be aproblem, I have seen this many times. they will have the same part number, and will be compatible.

Andy

cnb
Honored Contributor

Re: L2000 keeps crashing

I'm surprised it's working at all. Several issues CPU, Memory and system bus are reporting multiple Fatal Errors. Suggest you pull CPU 3 and insert a known good PA8600 3.1 module or a runway terminator in CPU Slot 3 to see if it stabilizes, then run cstm & logtool to check the rest of the system components.


#echo "sel dev all;info;wait;il"|/usr/sbin/cstm

#cstm
cstm> ru logtool
logtool> rs
logtool> fl
logtool> vd

Your PDC needs to be updated:
readme:
ftp://ftp.itrc.hp.com/firmware_patches/hp/cpu/PF_CRHW4428.txt
Update file:
ftp://ftp.itrc.hp.com/firmware_patches/hp/cpu/PF_CRHW4428.tar.gz


HTH,