System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

vmunix: WARNING: too many Processor corrected errors detected on cpu 0. Reporting suspended

Azim_3
Occasional Advisor

vmunix: WARNING: too many Processor corrected errors detected on cpu 0. Reporting suspended

Hi

My Server ( workstation 400a) crashes in 2-3 hours giving the above error. O/s is digial unix 4.0F

After browsing the similar threads in the forum, i assume the problem may be with memory/cpu/cache.

Can anybody pin point what could be actual problem - memory, cpu or cache. Attaching the binary.errorlog, /var/adm/messeges and kern.log files.

any help is appreciated

Kind regards
AZIM
12 REPLIES
Ralf Puchner
Honored Contributor

Re: vmunix: WARNING: too many Processor corrected errors detected on cpu 0. Reporting suspended

Ok read the other threads and open a call within the HP support center, I'm not sure if anyone do have a machine check decoder outside of HP ;-)
Help() { FirstReadManual(urgently); Go_to_it;; }
Mobeen_1
Esteemed Contributor

Re: vmunix: WARNING: too many Processor corrected errors detected on cpu 0. Reporting suspended

Azim,
Time to call your friendly HP Tech Support.

regards
Mobeen
Vladimir Fabecic
Honored Contributor

Re: vmunix: WARNING: too many Processor corrected errors detected on cpu 0. Reporting suspended

Hello
Yes, it can be memory, cpu or cache. But, from my experience, I give 80% that problem is memory. What you can do is:
1. Put server in "heavy load mode" (run application that uses much memory) and wait the machine to crash.
2. Do not turn off the machine, halt it (get >>> prompt)
3. Run memory tests (somethimes it takes few hours to get result) and be patient
If it does not help, call HP support.

In vino veritas, in VMS cluster
Mobeen_1
Esteemed Contributor

Re: vmunix: WARNING: too many Processor corrected errors detected on cpu 0. Reporting suspended

Azim,
Probably it went unnoticed, but there is no log attached in your initial message

I agree with the previous post, some times there will be some correctable errors logged in memory modules etc, those odd could be ignored.

But if your server stays up and crashes, its certainly an indication of some failure on the CPU board.

rgds
Mobeen
David_854
Frequent Advisor

Re: vmunix: WARNING: too many Processor corrected errors detected on cpu 0. Reporting suspended

That would be difficult since this is a 4.0F and It is not supported anymore.

Based on my crash analysis experience, you can look at the following files: binary.errlog
/var/adm/messages
vmunix
vmzcore
And from that make up the error.
Also a sys_check -escalate would be very helpfull to help to identify the problem.

Good luck or send me an email with any links and will give you an idea.

David
Ralf Puchner
Honored Contributor

Re: vmunix: WARNING: too many Processor corrected errors detected on cpu 0. Reporting suspended

Funny, why not simple using a machine check decoder, but these tools are only available at HP.... so we are at the beginning of the thread: open a call within HP.

Btw. if it is memory related, also replace the memory, because a correction depending on the checksum slows down machine.

Send binary.errlog to HP that's it!

Help() { FirstReadManual(urgently); Go_to_it;; }
Stuart Fuller
Occasional Visitor

Re: vmunix: WARNING: too many Processor corrected errors detected on cpu 0. Reporting suspended

What kind of machine is this? "(workstation 400a)" doesn't show up on my list of system types.
BIJU P K
Occasional Contributor

Re: vmunix: WARNING: too many Processor corrected errors detected on cpu 0. Reporting suspended

I have faced similar problem with Digital Alpha 4100 server having Digital unix 4.0d.
That time problem was due to Power Supply. Please check up the Power supply connector to Board and fix it properly . And from system console u can check the power supply status using "show power" command.

Regards
biju
Azim_3
Occasional Advisor

Re: vmunix: WARNING: too many Processor corrected errors detected on cpu 0. Reporting suspended

Finally I have replaced both the memory modules and kept under observation.
It is up on half duplex since last 19:00 hrs. and no further error messages.
I have noticed,when i changed tu0 from half to full duplex mode,it's start giving "FIFO error" and break the connectivity for few seconds.
any one know the procedure to break the SRM login password?
any help appreciated.
Workstation is Digital 500au.

Regards
AZIM
Azim_3
Occasional Advisor

Re: vmunix: WARNING: too many Processor corrected errors detected on cpu 0. Reporting suspended

My system again crashes and hang up with the following errors

I/O timeout error,I/O read write failed to execute in one second
While CIA/PYXIS_ERR_CSR was locked,and I/O timeout occured.

CIA/PYXIS ERR STAT = 0000000000000010
CIA/PYXIS ERR SYNC = 00XXXXXXXXXXXXXX
CIA/PYXIS MEM ERRO = 00XXXXXXXXXXXXXX
CIA/PYXIS MEM ERR1 = 00XXXXXXXXXXXXXX
CIA/PYXIS PCI ERR0 = 00XXXXXXXXXXXXXX
CIA/PYXIS PCI ERR1 = 00XXXXXXXXXXXXXX
CIA/PYXIS PCI ERR2 = 00XXXXXXXXXXXXXX

panic(cpu0): system uncorrectable machine check (retry set ????

Is this problem occured due to power supply? becoz my system start giving problem after power failure(ie invertor supply failure)

Regards
AZIM

Re: vmunix: WARNING: too many Processor corrected errors detected on cpu 0. Reporting suspended

The machine check errors codes can be found in the binary.errlog. These codes and other useful info from the binary.errlog can hep to determine the actual failure.

Most of the support people have a script to put the machine check codes in the script to determine the exact cause.

The first of this thread said the binary.errlog was included. I don't see it.

Se if the code is a 660 (cpu) or a 620 (memory)
Azim_3
Occasional Advisor

Re: vmunix: WARNING: too many Processor corrected errors detected on cpu 0. Reporting suspended

Hello All,

Find the attached binary.errlog file.
Any help is appreciated.

Regards
AZIM