Operating System - Tru64 Unix
1753496 Members
4611 Online
108794 Solutions
New Discussion юеВ

Re: vmunix: WARNING: too many Processor corrected errors detected on cpu 0. Reporting suspended

 
Azim_3
Occasional Advisor

vmunix: WARNING: too many Processor corrected errors detected on cpu 0. Reporting suspended

Hi

My Server ( workstation 400a) crashes in 2-3 hours giving the above error. O/s is digial unix 4.0F

After browsing the similar threads in the forum, i assume the problem may be with memory/cpu/cache.

Can anybody pin point what could be actual problem - memory, cpu or cache. Attaching the binary.errorlog, /var/adm/messeges and kern.log files.

any help is appreciated

Kind regards
AZIM
12 REPLIES 12
Ralf Puchner
Honored Contributor

Re: vmunix: WARNING: too many Processor corrected errors detected on cpu 0. Reporting suspended

Ok read the other threads and open a call within the HP support center, I'm not sure if anyone do have a machine check decoder outside of HP ;-)
Help() { FirstReadManual(urgently); Go_to_it;; }
Mobeen_1
Esteemed Contributor

Re: vmunix: WARNING: too many Processor corrected errors detected on cpu 0. Reporting suspended

Azim,
Time to call your friendly HP Tech Support.

regards
Mobeen
Vladimir Fabecic
Honored Contributor

Re: vmunix: WARNING: too many Processor corrected errors detected on cpu 0. Reporting suspended

Hello
Yes, it can be memory, cpu or cache. But, from my experience, I give 80% that problem is memory. What you can do is:
1. Put server in "heavy load mode" (run application that uses much memory) and wait the machine to crash.
2. Do not turn off the machine, halt it (get >>> prompt)
3. Run memory tests (somethimes it takes few hours to get result) and be patient
If it does not help, call HP support.

In vino veritas, in VMS cluster
Mobeen_1
Esteemed Contributor

Re: vmunix: WARNING: too many Processor corrected errors detected on cpu 0. Reporting suspended

Azim,
Probably it went unnoticed, but there is no log attached in your initial message

I agree with the previous post, some times there will be some correctable errors logged in memory modules etc, those odd could be ignored.

But if your server stays up and crashes, its certainly an indication of some failure on the CPU board.

rgds
Mobeen
David_854
Frequent Advisor

Re: vmunix: WARNING: too many Processor corrected errors detected on cpu 0. Reporting suspended

That would be difficult since this is a 4.0F and It is not supported anymore.

Based on my crash analysis experience, you can look at the following files: binary.errlog
/var/adm/messages
vmunix
vmzcore
And from that make up the error.
Also a sys_check -escalate would be very helpfull to help to identify the problem.

Good luck or send me an email with any links and will give you an idea.

David
Ralf Puchner
Honored Contributor

Re: vmunix: WARNING: too many Processor corrected errors detected on cpu 0. Reporting suspended

Funny, why not simple using a machine check decoder, but these tools are only available at HP.... so we are at the beginning of the thread: open a call within HP.

Btw. if it is memory related, also replace the memory, because a correction depending on the checksum slows down machine.

Send binary.errlog to HP that's it!

Help() { FirstReadManual(urgently); Go_to_it;; }
Stuart Fuller
New Member

Re: vmunix: WARNING: too many Processor corrected errors detected on cpu 0. Reporting suspended

What kind of machine is this? "(workstation 400a)" doesn't show up on my list of system types.
BIJU P K
Occasional Contributor

Re: vmunix: WARNING: too many Processor corrected errors detected on cpu 0. Reporting suspended

I have faced similar problem with Digital Alpha 4100 server having Digital unix 4.0d.
That time problem was due to Power Supply. Please check up the Power supply connector to Board and fix it properly . And from system console u can check the power supply status using "show power" command.

Regards
biju
Azim_3
Occasional Advisor

Re: vmunix: WARNING: too many Processor corrected errors detected on cpu 0. Reporting suspended

Finally I have replaced both the memory modules and kept under observation.
It is up on half duplex since last 19:00 hrs. and no further error messages.
I have noticed,when i changed tu0 from half to full duplex mode,it's start giving "FIFO error" and break the connectivity for few seconds.
any one know the procedure to break the SRM login password?
any help appreciated.
Workstation is Digital 500au.

Regards
AZIM