Operating System - Tru64 Unix
1747992 Members
5055 Online
108756 Solutions
New Discussion юеВ

Re: messages - CPU error

 
Karthik S S
Honored Contributor

messages - CPU error

Hi,

I get the following error repeated ly on our alpha server 4100 (Tru64 OSF1 v5.1). What could be the problem?
------------------------------------
Mar 15 12:37:08 kyle last message repeated 2 times
Mar 15 12:37:08 kyle vmunix: WARNING: too many System corrected errors detected
on cpu 24. Reporting suspended.
Mar 15 12:37:08 kyle vmunix: WARNING: too many System corrected errors detected
on cpu 16. Reporting suspended.
Mar 15 12:37:08 kyle last message repeated 2 times
Mar 15 12:37:08 kyle vmunix: WARNING: too many Processor corrected errors detect
ed on cpu 24. Reporting suspended.
Mar 15 12:37:08 kyle vmunix: WARNING: too many Processor corrected errors detect
ed on cpu 24. Reporting suspended.
Mar 15 12:37:08 kyle vmunix: datalink: links=128, macs=6
Mar 15 12:37:09 kyle vmunix: /var: file system full
Mar 15 12:37:09 kyle vmunix: WARNING: too many Processor corrected errors detect
ed on cpu 24. Reporting suspended.
Mar 15 12:37:25 kyle vmunix: Environmental Monitoring Subsystem Configured.
Mar 15 12:37:52 kyle vmunix: SuperLAT. Copyright 1994 Meridian Technology Corp.
All rights reserved.
Mar 15 12:38:00 kyle vmunix: netbeui_configure(CFG_OP_CONFIGURE)
------------------------------------

psrinfo reports no error.

Pl. help.

Thanks,
Karthik S S
For a list of all the ways technology has failed to improve the quality of life, please press three. - Alice Kahn
22 REPLIES 22
Karthik S S
Honored Contributor

Re: messages - CPU error

uerf -r 100 shows,

(o/p truncated)

# uerf -r 100 | more
uerf version 4.2-011 (122)


********************************* ENTRY 1. *********************************
----- EVENT INFORMATION -----

EVENT CLASS ERROR EVENT
OS EVENT TYPE 100. CPU EXCEPTION
SEQUENCE NUMBER 2.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Wed May 28 18:18:10 2003
OCCURRED ON SYSTEM kyle
SYSTEM ID x00070016
SYSTYPE x00000002
PROCESSOR COUNT 4.
PROCESSOR WHO LOGGED x00000000

----- UNIT INFORMATION -----

UNIT CLASS CPU

********************************* ENTRY 2. *********************************



-----------

-Karthik S S
For a list of all the ways technology has failed to improve the quality of life, please press three. - Alice Kahn
Michael Schulte zur Sur
Honored Contributor

Re: messages - CPU error

Hi,

besides the /var full problem, which I hope, you have corrected by now, I would assume, it is a cpu problem. Use decevent to look into the binary errorlog. It is much more detailed. Anyway, this is a case for opening a call with HP.

greetings,

Michael
Karthik S S
Honored Contributor

Re: messages - CPU error

Hi Michael,

I am new to Tru64 and I am not in front of the system. Infact I am helping another user with this problem. After reading your reply, I just realized that /var is full ..!! But, I wonder how did you assume that I corrected this problem?? :-))

I will try to free up some space. By the way what is the realtion b/w /var filesystem and the cpu error messages?

Thanks,
Karthik S S
For a list of all the ways technology has failed to improve the quality of life, please press three. - Alice Kahn
Karthik S S
Honored Contributor

Re: messages - CPU error

Oh my ...

that info. is right there in the messages file :-( ... I didn't go through it properly ..

Thanks,
Karthik S S
For a list of all the ways technology has failed to improve the quality of life, please press three. - Alice Kahn
Michael Schulte zur Sur
Honored Contributor

Re: messages - CPU error

Hi Karthik,

you were so kindly to post it! ;-))
Mar 15 12:37:09 kyle vmunix: /var: file system full

And I didn't want to insult you by assuming you would not see it! ;-)

Now to your question:
There is no relation between file system full and cpu errors.

I have seen these errors more than once.

call HP.

greetings,

Michael
Hein van den Heuvel
Honored Contributor

Re: messages - CPU error

> Now to your question:
> There is no relation between file system full and cpu errors.

Other then /var/adm/messages, syslog.dated and other stuff filling up recording those error messages :-).

Why does it report "cpu 16" and "cpu 24"?
Are those the hw-id's for you cpu's?
Maybe check with 'hwmgr -v h" ?

If this would happen to a box of mine I would give it one chance for a 'hardware reset'. Power down, re-seat the cpu modules, power up. If it comes back, the it was a serious problem, like a cpu cache failure.

fwiw,
Hein.

Michael Schulte zur Sur
Honored Contributor

Re: messages - CPU error

Hi,

I think, Hein might be right with the hardware id.

If it comes back, then it was a serious problem. If not, then it is a serious problem.
Oh, now I see, you meant the problem and not the machine. ;-))

If you can shutdown the machine for 30min, you can run a test from the console prompt.

greetings,

Michael

Dawn Urey
Occasional Advisor

Re: messages - CPU error

I just recently had the same problem with my ES40. I had to replace memory dimms on my system. The memory was actually reporting problems which showed up in my error logs as machine checks. You may want to get HP to diagnose your binary.errlog.
Karthik S S
Honored Contributor

Re: messages - CPU error

hwmgr o/p (truncated)

----------
HWID: hardware hierarchy
-----------------------------------------------------------------
1: platform AlphaServer 4100 5/600 8MB
2: cpu CPU0
3: cpu CPU1
4: cpu CPU2
5: cpu CPU3
9: bus mcbus0
10: connection mcbus0slot5
11: bus pci1
12: connection pci1slot1
22: scsi_adapter psiop0
23: scsi_bus scsi0
52: disk bus-0-targ-5-lun-0 cdrom0
14: connection pci1slot2
24: scsi_adapter isp0
25: scsi_bus scsi1
53: disk bus-1-targ-1-lun-0 dsk0
54: disk bus-1-targ-3-lun-0 dsk1
55: disk bus-1-targ-5-lun-0 dsk2
16: connection pci1slot3
----------

no idea why it reports CPU 16 and 24.

-Karthik S S
For a list of all the ways technology has failed to improve the quality of life, please press three. - Alice Kahn