Operating System - Tru64 Unix
1752598 Members
5354 Online
108788 Solutions
New Discussion юеВ

GS80 System Crash

 
SOLVED
Go to solution
Emad Omar
Regular Advisor

GS80 System Crash

Hi,

I have got a crash issue at Alpha Server GS80 running on Tru64 V5.1A. I got the crash, decevent and /var/adm/messages files. You can find attached these files hoping can help to find the reasons.
Note : The crash happened at 12May,2005 . Also while I'm writing I have got another system crash.
7 REPLIES 7
amrelsayed
Frequent Advisor

Re: GS80 System Crash

Hi Emad,

looks like it is a memory problem issue, if you can to send me the binary.errlog file to analyze it or if you are using SEA try to analyze this file.

also i can advice you to reconfigure your memory, by interchange between MMB on your system, for example if your system working with 8GB, try with 2GB and wait one or two days and so on until you reach the complate set, i think this will help as i try it before.

Note: it is important to send the binary.errlog.

Best Regards,
Amr
Try To Be Smart
Emad Omar
Regular Advisor

Re: GS80 System Crash

Hi Amr,

Thank you for your kind reply. The binary.errlog zip file is more than 1MB size.Please let me know how can I provide you with this file???!!!

Emad Omar
amrelsayed
Frequent Advisor

Re: GS80 System Crash

Hi Emad,

you can contact me at the following email address:
aelsayed@ncs.com.kw

So, you can send your file at this email, with all my pleasure.

Best Regards,

Amr
Try To Be Smart
amrelsayed
Frequent Advisor
Solution

Re: GS80 System Crash

Hello Emad,

i had analyzed your binaryerrlog, and i found the following:

firstly, looks like you have only one QBB System Drawer with 4 CPUs, where 4 cpu are in active in your system.

secondly, you have Double Bit Istream Bcache Error generated from CPU1 from QBB0, and this problem generated many correctable processor event and so on uncorrectable processor event, which refered all to CPU module 1 on QBB0.

So, my advice to you is: remove this fault CPU ( CPU1 on QBB0) and start your system with the rest of CPUs, if the system is stable for a while, try to return this CPU again, but in anther slot, if still you face the same problem, so you will replace this CPU.

please, feed me back with the progress.

All my best wishes,
Amr
Try To Be Smart
Emad Omar
Regular Advisor

Re: GS80 System Crash

Dear Amr,

Thank you for your kind help. In fact I swapped between CPU1 & CPU3 yesterday. After I did that I found that CPU3 has a critical status when I issued a command:
#hwmgr -stat comp -ngood
So by this I get in result that CPU1 has not a fault as I think.

So please let me know how can I verify this problem and fix it soon.

Thank you for your help. . .

Emad Omar
amrelsayed
Frequent Advisor

Re: GS80 System Crash

Hi Emad,

i'm very upset that you still stuck in this problem, but don't worry we will overcome it soon.

firstly i need your help to run this procedure in your machine to clear any confussions:

# /sbin/init.d/syslog stop
# mv /var/cluster/members/{memb}/adm/binary.errlog /var/cluster/members/{memb}/adm/binary.errlog.8june05
# /sbin/init.d/syslog start

now you will have a new binary.errlog file.

Then shutdown the machine and return CPU3 back to its oraginal place and remove CPU1 from the machine.

Then boot the machine with the rest 3 CPUs only without CPU1, and put the machine under test for a week, if the machine crash before the week finish, send me the binary.errlog directly to my email, if not send it also to me after that week to analyze it to you.

with my all best wishes,
Amr
Try To Be Smart
Johan Brusche
Honored Contributor

Re: GS80 System Crash


The method described in above reply to reset the file binary.errlog migth have been valid in the V4.x stream, but for V5.1x systems, I strongly encourage you to use the method described in the binlogd manpage:

kill -USR1 `cat /var/run/binlogd.pid`

This method will create a clean binary.errlog with (if available) a FRU log as first entry so other tools like WEBES/CA can tell what module partnumber is most suspect.

rgds,

___ Johan.

_JB_