- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Failed CPUs, nPars and vPars
Operating System - HP-UX
1824987
Members
3174
Online
109678
Solutions
Forums
Categories
Company
Local Language
юдл
back
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Forums
Discussions
юдл
back
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Blogs
Information
Community
Resources
Community Language
Language
Forums
Blogs
Go to solution
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-01-2009 05:06 AM
тАО06-01-2009 05:06 AM
General question for you all before I potentially stick my foot in my mouth and raise a stink over an issue I'm seeing.
I've got about a dozen rx8640's, all running HPUX11iv2, and all are running vPars - anywhere from 4 to 8 vPars per 8640. Nothing out of the ordinary there.
Lately I've had a string of failed CPUs, and each time it's caused an HPMC and the whole nPar gets reset, which of course pulls the rug out from under all the running vPars. Ugh.
Unless I'm way off base - servers such as the 8640 are built with lots of fault-tolerance. They can 'handle' a failed CPU, DIMM, etc., without too much fuss. Or at least that's how it's supposed to work. I don't recall my PA-RISC machines rebooting upon detecting a failed CPU in the past. Even the technician who's been replacing the CPUs has told me that a failed CPU shouldn't be causing the whole server to reset.
Has anyone else seen this behavior? We're running vPars A.04.04.04, for what it's worth.
I had another failure over the weekend, and I'm waiting on Support to go over the chassis logs. If it's another CPU I'm going to elevate... I really don't think this should be happening.
Am I way off base in my thinking? Feedback appreciated.
I've got about a dozen rx8640's, all running HPUX11iv2, and all are running vPars - anywhere from 4 to 8 vPars per 8640. Nothing out of the ordinary there.
Lately I've had a string of failed CPUs, and each time it's caused an HPMC and the whole nPar gets reset, which of course pulls the rug out from under all the running vPars. Ugh.
Unless I'm way off base - servers such as the 8640 are built with lots of fault-tolerance. They can 'handle' a failed CPU, DIMM, etc., without too much fuss. Or at least that's how it's supposed to work. I don't recall my PA-RISC machines rebooting upon detecting a failed CPU in the past. Even the technician who's been replacing the CPUs has told me that a failed CPU shouldn't be causing the whole server to reset.
Has anyone else seen this behavior? We're running vPars A.04.04.04, for what it's worth.
I had another failure over the weekend, and I'm waiting on Support to go over the chassis logs. If it's another CPU I'm going to elevate... I really don't think this should be happening.
Am I way off base in my thinking? Feedback appreciated.
Solved! Go to Solution.
2 REPLIES 2
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-01-2009 05:23 AM
тАО06-01-2009 05:23 AM
Solution
Eric,
Sorry, but you are wrong here...
We absolutely had HPMCs on PA-RISC systems and they would absolutely always result in the rebooting of a whole nPar. All HP-UX boxes have always had something called "Dynamic Processor Resiliency", which can detect a "failing" CPU and take it offline, but you have to "detect" that failure... some CPU failures can be caught and flagged (from which we can do a DPR deactivation), and some can't - seems like you've been getting unlucky to me.
The problem is that some folks read the "DPR" material and interperet it as "HP UNIX servers can withstand CPU failures *under all conditions*", and that isn't and won't be the case. The same is true for Sun UNIX systems, IBM UNIX systems, in fatc pretty much every "open system" that's out there in the market. There are exceptions like the HP NonStop server, but they get round the issue by "lock stepping" the same instrauctions through 2 CPUs for resiliency.
And if the technician who's swapping bthe CPUs (presumably a HP employee) thinks that HP UNIX servers can catch all CPU failures without a reboot, he needs to speak with some of the internal labs folks to get that straight.
HTH
Duncan
I am an HPE Employee
Sorry, but you are wrong here...
We absolutely had HPMCs on PA-RISC systems and they would absolutely always result in the rebooting of a whole nPar. All HP-UX boxes have always had something called "Dynamic Processor Resiliency", which can detect a "failing" CPU and take it offline, but you have to "detect" that failure... some CPU failures can be caught and flagged (from which we can do a DPR deactivation), and some can't - seems like you've been getting unlucky to me.
The problem is that some folks read the "DPR" material and interperet it as "HP UNIX servers can withstand CPU failures *under all conditions*", and that isn't and won't be the case. The same is true for Sun UNIX systems, IBM UNIX systems, in fatc pretty much every "open system" that's out there in the market. There are exceptions like the HP NonStop server, but they get round the issue by "lock stepping" the same instrauctions through 2 CPUs for resiliency.
And if the technician who's swapping bthe CPUs (presumably a HP employee) thinks that HP UNIX servers can catch all CPU failures without a reboot, he needs to speak with some of the internal labs folks to get that straight.
HTH
Duncan
I am an HPE Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-01-2009 05:30 AM
тАО06-01-2009 05:30 AM
Re: Failed CPUs, nPars and vPars
Perhaps it's just a string of bad luck in my datacenter then... pretty frustrating though to see what may be our fourth failed Integrity CPU in a two-week time span!
It's both our Montvales and Monticetos - so I can't really blame a particular CPU.
Guess I'll just grin and bear it. And tell my management to do the same. :-)
Thanks!
It's both our Montvales and Monticetos - so I can't really blame a particular CPU.
Guess I'll just grin and bear it. And tell my management to do the same. :-)
Thanks!
The opinions expressed above are the personal opinions of the authors, not of Hewlett Packard Enterprise. By using this site, you accept the Terms of Use and Rules of Participation.
Company
Learn About
News and Events
Support
© Copyright 2025 Hewlett Packard Enterprise Development LP