ProLiant Servers (ML,DL,SL)
1752511 Members
5281 Online
108788 Solutions
New Discussion юеВ

Re: Found the problem! edit- :(

 
SOLVED
Go to solution
TheWalkingTree
Occasional Advisor

ML350 G5: Trying to add second processor problem

Hey guys, I have been hacking away on this problem for days with no success.  It just lights up an amber next to CPU2 and gives me a 212 processor 2 failure.  I've found all I could and exhausted manuals and instructions and am about to give up on upgrading this server.. it has the latest firmware and I've tried clearing out CMOS, NVRAM, SmartStart>Clear IML, swapping PPM, swapping processors.. funny part is we have an identical server and installed the same type of PPM and processor into the other one and that one had the same error but clearing NVRAM worked for that one.  Not so on this one.  They are the same model CPU's but I have not had a chance to identify which steppings they are.  The motherboard diagnostic of the CPU merely shows the second CPU installed but failed while the diagnostic in SmartStart shows 100% okay for both CPU's when they are installed yet I cannot get around the 212 error.  It does this exact same thing no matter which CPU is in which socket, with one stick of RAM...  is my suspiscion/fear that it is the mobo valid?

 

Thank you in advance!

8 REPLIES 8
waaronb
Respected Contributor

Re: ML350 G5: Trying to add second processor problem

The IML shows fan failures... are the extra fans installed?

I might be wrong, but I thought that when you're installing a 2nd CPU in most models, it's required to have the extra fans installed for proper cooling.

I'm also not sure about this, but wouldn't each CPU need at least one memory module installed in it's banks? You mentioend 1 stick of RAM but seems like you'd need at least two, one for each processor bank?

I don't know if any of that would cause the types of errors you're seeing, but I'm pretty sure the fans and memory things have to be there for it to work at all?
TheWalkingTree
Occasional Advisor

Re: ML350 G5: Trying to add second processor problem

Thank you, I hadn't really thought about it!  Is there a mechanism requiring extra fans?  I don't see any housings for adding fans but there are headers that I think it is referring to in the IML.  I can't recall but I think the other ML350 has the identical fans and headers attached which is why I didn't put much thought into it but perhaps this server wants those installed...? 

 

As far as the RAM goes, we had all slots occupied but I read that just having one in for troubleshooting purposes might help, still no dice.

waaronb
Respected Contributor
Solution

Re: ML350 G5: Trying to add second processor problem

Well, I guess I'm not familiar enough with the ML350 G5... I found the user guide for it:
http://h20566.www2.hp.com/portal/site/hpsc/template.BINARYPORTLET/public/kb/docDisplay/resource.process/?spf_p.tpst=kbDocDisplay_ws_BI&spf_p.rid_kbDocDisplay=docDisplayResURL&javax.portlet.begCacheTok=com.vignette.cachetoken&spf_p.rst_kbDocDisplay=wsrp-resourceState%3DdocId%253Demr_na-c00709040-5%257CdocLocale%253Den_US&javax.portlet.endCacheTok=com.vignette.cachetoken

I guess each CPU has a fan built into the heatsink, so there is that, but the system does also support an extra 2 fans for redundancy. I didn't see it mention that the extra fans are required though.

And even more about what I don't know about the ML350 G5 (it's older than I thought), it doesn't have separate mem banks for each CPU I guess? In my mind I was comparing the ML350 G5 to a similar DL3x0 G5 model, but seems that they're very different. The ML model G5 is a lot older than the DL of the same generation. :)

Anyway, with that said, the errors in the IML about the fans not being redundant is technically true... I wonder if it gives the same caution messages if the 2nd CPU slot is empty? Does it only bug you if it sees a processor in the 2nd socket?

And now that I see it's an older model that uses a PPM for the 2nd CPU power regulation, I do recall some older DL3xx models that also used PPM's, and I've had several of those go bad. Just to make sure, you could try removing that PPM and re-inserting it to make sure it's nice and snug in the slot. You mentioned having another server that you upgraded successfully... did you try swapping the PPM from that one with this system, just to try?

You're probably fine with just the one memory module... you're right, it's definitely easier to troubleshoot when you strip a system down to the bare minimum. The memory is fine, otherwise it probably wouldn't have booted up at all.

I'd also check that the latest firmware is there. Your new CPU might have a different stepping than the original... sometimes that can cause problems. The latest BIOS can help by making sure any Intel errata is fixed and that can help the different microcode steppings play nice with each other, sometimes.

If you called HP though, they'd probably tell you that if the steppings are different then they can't support it. Just FYI.

If all else fails, it is an old server... might be time to let it limp along on one processor and save for a new one. Hello ebay! :)
TheWalkingTree
Occasional Advisor

Re: ML350 G5: Trying to add second processor problem

Thank you so much!  Yes, it has no problem or logs besides the fans with one processor installed and it runs fine with either CPU only installed into socket 1 and I did try swapping PPM's with the working one with the same result.  I did notice the other server has this blue retention brackets for the PPM and although the PPM is pretty snug without it in the non-tandem-working server I wonder if the server can sense if the bracket is not installed or something.  It does have the latest BIOS which was my first thought.

 

I did read that the server is sensitive to stepping and sub-revision differences so that is what I'm leaning towards being the problem as well.  I'll have to run CPU-Z with it installed to check it out but for now we are going to use the working server for our Spiceworks/WDS VM's :)  Thank you for you suggestions!

TheWalkingTree
Occasional Advisor

Found the problem! edit- :(

The CPU's have identical steppings but the VID is different!  The working server has the identical VID for both processors so since that is the less intensive server (core with file storage role)  we are going to pop those in the other Hyper-V server!

 

Thanks for your help man, didn't realize how sensitive the system is to processor differentials!

 

edit- After cleaning off the identical-in-every-way processors from the server that likes the pair, the problem server still throws up a 212 error.  So that will be the one to house the file storage and the working guy gets the pair for VM action.  Is there an inexpensive motherboard replacement?

waaronb
Respected Contributor

Re: Found the problem! edit- :(

The different VID readings might be something weird in the way that one system is picking up the processor specs... I just checked and the E5430 says it has an operating voltage range of 0.85 - 1.35 V, probably depending on load, if it's throttling down to save power.

Then again, there's an L5430 as well with a TDP of 50W... no idea if that might be identical enough that it's reported as an E5430 or something.

Well, it's too bad you couldn't get them playing nice in that one system after all.

You can probably find used motherboards on ebay... servers of that vintage tend to start having other parts die off, or businesses replace them with newer systems so they sell them to resellers who part them out.

I have a handful of DL3x0 G5's that are in that situation... they're in a rack plugged in but powered off... I'm scavenging them for replacements here and there, but another group at our company had some 3rd party come in, pick up an old equipment, and then I guess the recycle or part them out.

I guess it's a thriving enough business... a G4 motherboard might not be too hard to find.
TheWalkingTree
Occasional Advisor

Re: Found the problem! edit- :(

Oh I see, LOL it was the only difference I saw but guess what, the working CPU's do not work in tandem in the problem server either but look at what I spotted on the second socket......

 

I managed to reposition the pin with a jeweler screwdriver but it's still throwing the 212 error so there must be something wrong with the socket.  Anywho!  We might look into replacing the mobo but at least the first socket works for now.

waaronb
Respected Contributor

Re: Found the problem! edit- :(

Ooooh, yeah. That socket might be blown if the pins shorted something even though you bent it back. Hard to know for sure.

After my last post, I was curious so I checked ebay and there were a couple of ML350 G4 mainboards up there... $40 - $50, so that might be something to check if you're happy with the system overall and don't mind spending that much.

Proliant main boards are pretty easy to swap out, in general. Hardest part is just unplugging and removing everything first, but otherwise they tend to slide right out once that's all done.

I'll have to remind myself to have a jeweler's loupe or something handy next time I'm troubleshooting a misbehaving board. Those sockets get smaller and smaller (or I'm getting older and older) and hit's harder to see if something got munged like that.