Operating System - Tru64 Unix
1820882 Members
3531 Online
109628 Solutions
New Discussion юеВ

Re: ES40 CPU compatibility

 
SOLVED
Go to solution
Graham Allan
Advisor

ES40 CPU compatibility

This is a hardware question rather than Tru64 but I'm still hoping someone might be able to help me!

Our main ES40 has four KN610-CA 833MHz CPUs of which one has gone bad (I think it was a cache failure error, at any rate the system fails it).

Our "spare parts" ES40 also has an 833MHz CPU.

However, if I try to install the spare CPU in the first ES40 (alongside the three remaining good ones), I get an immediate error on the OCP, "Bad CPU ROM data", which supposedly means "Invalid data in EEROM on the CPU".

I'm guessing this may be because the CPUs seem to be at different revisions; the originals in the main server are 54-30362-B3 C01, while the spare is 54-30362-B3 B01. Although the error message doesn't really seem to relate to this but I haven't found any further explanation anywhere)

Is this an invalid configuration or is there some way to make it work? Both servers have been updated to current firmware, SRM V7.3-1.
I have searched and cannot find any public reference to CPU matching. The only CPU configuration rules I see are that CPUs should be of identical speed and cache size.

Thanks for any help!

Graham
13 REPLIES 13
Steven Schweda
Honored Contributor

Re: ES40 CPU compatibility

I know nothing, but the upgrade instructions
do mention a Firmware Update CD-ROM:

http://h18002.www1.hp.com/alphaserver/archive/es40/es40_tech.html
http://h18002.www1.hp.com/alphaserver/download/es40_cpu_upgrade.pdf

The instructions for the slower CPUs are more
scary:

http://h18002.www1.hp.com/alphaserver/download/es40_500mhz_cpu.pdf

Does ">>>show config" say anything
interesting on your box?
Vladimir Fabecic
Honored Contributor

Re: ES40 CPU compatibility

Does this "spare" cpu works OK if this is the only cpu in machine?
In vino veritas, in VMS cluster
Rob Leadbeater
Honored Contributor

Re: ES40 CPU compatibility

Hi Graham,

The options list doesn't mention any restrictions for the 833MHz processors...

Could you try removing all of the original CPUs and just try booting up off the single spare one.

That could help narrow things down...

Cheers,

Rob
Graham Allan
Advisor

Re: ES40 CPU compatibility

I haven't tried this "spare" cpu as the only cpu in the "real" server (yet), but it does work fine as the only CPU in the "spare" server.

"show config" does show a difference between the CPUs in the two servers though, which is what makes me wonder if the "B0" / "C0" designations are important.

The "B0" cpu in the spare server shows this:

Processors
CPU 0 Alpha EV68A pass 2.1 or 2.1A or 3.0 833 MHz 8MB Bcache

while the three "C0" cpus in the main server shows this:

Processors
CPU 0 Alpha EV68A pass 2.2 833 MHz 8MB Bcache
CPU 1 Alpha EV68A pass 2.2 833 MHz 8MB Bcache
CPU 2 Alpha EV68A pass 2.2 833 MHz 8MB Bcache

The PDFs you gave me above did refer to different CPU passes being incompatible with each other, for 500MHz models, so perhaps the same thing is going on here.
Graham Allan
Advisor

Re: ES40 CPU compatibility

Sorry, in the above message I mean "B01" instead of "B0" and "C01" instead of "B0"!
Steven Schweda
Honored Contributor

Re: ES40 CPU compatibility

> [...] so perhaps the same thing is going on
> here.

I still know nothing, but it's suspiciously
similar, I'd say. Perhaps when the original
"same speed" requirement was written, they
believed that that was good enough. Then
some clever engineer(s) came along and found
a way to make things incompatible at the same
speed. Everything's complicated.
Rob Leadbeater
Honored Contributor

Re: ES40 CPU compatibility

Hi,

I managed to dig out an old copy of an OARS CD. This does reference incompatibility problems with 2.1 and 2.2 pass CPUs but only for OpenVMS rather than Tru64.

"667 and 833 MHZ CPU modules with the same clock speed and part number may be of different manufacturing passes. Unpatched Open VMS versions 7.2-1 and 7.1-2 will not run these processors correctly in a multiprocessing environment."


The only reference to Tru64 is this:

"The recognition of specific CPU chip rev is done by the PALcode. It knows what features/fixes are in the revision, and what workarounds it can avoid. PALcode also converts the chip rev info into something that the OS understands, and puts it in the HWRPB. The OS, especially Tru64, also makes some exception handling decisions based on chip rev. If the PAL code doesn't understand a chip rev it will put some bugus number in the HWRPB. If Tru64 doesn't know what the rev is, it assumes worst case, which was a prototype EV6, which was not able to handle correctable errors. Tru64 issues a warning during boot. Net result is that Tru64 will turn all errors into uncorrectables, and therefore panics on a correctable ecc error."


As the PALcode is upgraded by the firmware process, I would probably sanity check that everything is up to the latest version (7.3 IIRC).

Cheers,

Rob
Rob Leadbeater
Honored Contributor

Re: ES40 CPU compatibility

Hi Graham,

Did a little more searching...

There is an engineering advisory that looks relevant:

AE020508_EW01_0 ES40 Bad CPU ROM DATA Issue

Unfortunately I can't seem to find the CD with the details of the advisory at the moment. If I manage to dig it out, I'll see what it says, otherwise you might want to ask Support...

Cheers,

Rob
cnb
Honored Contributor

Re: ES40 CPU compatibility

Hi Graham,

...
"├В┬╖ There are two pass versions of the 833 MHz CPU for the ES40, mixing the two revisions is supported."
...
"P00> Show Config
Output of the Command
CPU 0 Alpha EV68A pass 2.2 833 MHz 8MB Bcache
CPU 1 Alpha EV68A pass 2.1 (or 2.1A or 3.0) 833 MHz 8MB Bcache"

~

They are supported together, but have you made sure that abios, srom, rmc and tig have been manually updated (after the UPD> Update all) on your spare to match the existing production versions before inserting the spare CPU?

See manual firmware update:
ftp://ftp.hp.com/pub/alphaserver/firmware/current_platforms/v7.3_release/DOC/es40_v73_fw_relnote.pdf

UPD> exit
Do you want to do a manual
update? [y/(n)] yes
UPD> update
Confirm update on:
Abios
SRM
rmc
tig [Y/(N)]Y


Check your versions on both systems to make all bios revisions are the same and update to the latest codes. This 'usually' resolves srom incompatibility issues when introducing supported spare parts.


HTH,


Graham Allan
Advisor

Re: ES40 CPU compatibility

Thanks for all the information - and thanks cnb for confirming that this configuration really SHOULD work together.

I did try the manual firmware updates, with no improvement. I have since tried the lone B01 CPU in the main server, replacing the C01 cpus, and it works fine there by itself. Also tried yet another complete manual firmware update while in this mode, but adding the C01 cpus back in gives the same immediate fault. This was using the most recent (7.3) firmware CD.

It's interesting that the system halts right away - in fact it even cuts power immediately on displaying the error - so there is no chance of a Tru64 incompatibility at this point - the system doesn't even get as far as loading SRM.

Another hint I tried was the "clear_error all" command which clears entries from EEROM - didn't sound very likely to help though, and didn't!

Rob, that engineering advisory does sound really interesting if you can find any more details! I do have access to a really ancient PROSIC CD which came my way sometime, but it has nothing (it dates from 2001, before the 833MHz CPUs were released, I think).

Graham

Rob Leadbeater
Honored Contributor
Solution

Re: ES40 CPU compatibility

Hi Graham,

I found details of the advisory...

Some KN610-C CPUs had their EPROM setup incorrectly which causes the error you're seeing when mixed with CPUs which are correct.

You should be able to fix it with this:

P00>>> buildfru -s SMB0.CPU0 6 2
P00>>> buildfru -s SMB0.CPU0 8 62
P00>>> buildfru -s SMB0.CPU0 9 79

As that referemces CPU0, run this on your spare parts ES40, with just the spare CPU.

Hope this helps,

Regards,

Rob
Graham Allan
Advisor

Re: ES40 CPU compatibility

> You should be able to fix it with this:
>
> P00>>> buildfru -s SMB0.CPU0 6 2
> P00>>> buildfru -s SMB0.CPU0 8 62
> P00>>> buildfru -s SMB0.CPU0 9 79
>
> As that references CPU0, run this on your
> spare parts ES40, with just the spare CPU.

THANK YOU!

I tried this at first on the spare CPU; it made no difference, and my heart sank.

But of course that just meant it was the three "C01" CPUs which had the incorrect values. After updating each of them with the above commands, all four CPUs are now coexisting:

P00>>>show conf
hp AlphaServer ES40

Firmware
SRM Console: V7.3-1
ARC Console: v5.71
PALcode: OpenVMS PALcode V1.98-104, Tru64 UNIX PALcode V1.92-105
Serial ROM: V2.22-G
RMC ROM: V1.0
RMC Flash ROM: V2.8

Processors
CPU 0 Alpha EV68A pass 2.2 833 MHz 8MB Bcache
CPU 1 Alpha EV68A pass 2.2 833 MHz 8MB Bcache
CPU 2 Alpha EV68A pass 2.1 or 2.1A or 3.0 833 MHz 8MB Bcache
CPU 3 Alpha EV68A pass 2.2 833 MHz 8MB Bcache



Graham
Graham Allan
Advisor

Re: ES40 CPU compatibility

I thought I should mention that Tru64 also booted up again just fine with the four processors installed, after the above fix.

Thanks again for all the help.