- Community Home
- >
- Servers and Operating Systems
- >
- Legacy
- >
- HPE 9000 and HPE e3000 Servers
- >
- Re: Faulty processor on a 2 CPU rp2405
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-01-2007 01:14 AM
тАО02-01-2007 01:14 AM
Faulty processor on a 2 CPU rp2405
I spoke to HP and they said that one of the processors was faulty and wanted to arrange it's replacement. Now as it runs 24x7, it's pretty difficult to take it down to do that sort of maintenance.
We've got a shutdown scheduled for May, but we'd like to try and get a feel for if the problem may occur again.
Is there any sort of (online) diagnostics that I could run that would indicate if we are likely to have a similar episode in the near future.
I've got Online Diagnostics installed (Dec 2006) and have had a look at xstm. For the processors, it allows me to 'Exercise' them (they were both OK), but the 'Diagnose' option is greyed out. When I double-click one of the CPUs it says that the diagnostic tool isn't installed, but I installed the Dec 2006 Online Diagnostics this morning!
So, my questions are:
1) How thorough a test is the 'Exercise' option on the CPU item?
2) What do I need to do to be able to run 'Diagnose' on the CPU?
3) How thorough a test would 'Diagnose' be?
4) I found a document entitled "Dynamic Processor Deallocation" - http://docs.hp.com/en/diag/dynamic.pdf which implies that I can deallocate one of my processors, so that it can't cause the system to panic and crash. The dynamic deallocation doesn't seem to have happened. How can I deallocate a processor manually? (It sounds like I can't if it's the monarch CPU, i.e. the the processor upon which the HP-UX kernel is running.)
How do I tell which is the monarch CPU?
Urgent help would be greatly appreciated.
Thanks,
Gary
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-01-2007 09:09 PM
тАО02-01-2007 09:09 PM
Re: Faulty processor on a 2 CPU rp2405
Thanks in advance of your shared wisdom.
Gary
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-02-2007 02:24 PM
тАО02-02-2007 02:24 PM
Re: Faulty processor on a 2 CPU rp2405
If both are still active, then there is always a chance that something could trigger the error again. If only one processor is active, then the one that failed will definitely NOT cause you a problem again. However, if the other CPU fails for some reason then you are really in trouble since you now have zero active CPUs.
The only way I know of to "deallocate" a processor is via BCH menu, which requires the server to be rebooted.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-03-2007 05:37 AM
тАО02-03-2007 05:37 AM
Re: Faulty processor on a 2 CPU rp2405
If the server is still running ok, and the exercise tests run ok, then it could be that the cpu's are actually ok. It could have been just a hung process that caused the reboot, for some reason, or some IO problem.
The exercise tests are quite good although they are only a 10min test and verify the cpu is working, not like a server with many processes running.
The Diagnose option and other cpu tests along with many other tests within STM are passworded and can only be run by HP engineers, unless you can get it.
I would get HP in and get them to run tests on the server.
The only other diagnostic type software is EMS, part of the STM diagnostics, which can be configured to mail/page when a problem is occuring.
It may be worth considering updating to a service guard configuration if its this critical, to ensure you keep running. Waiting from Jan to may is a long while, with potential problems hanging over.
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-03-2007 12:06 PM
тАО02-03-2007 12:06 PM
Re: Faulty processor on a 2 CPU rp2405
The rp2405 is a great computer but if you are running 24x7 with virtually no down-time allowed, you have the wrong configuration. This processor failure may have been due to a single CPU cache memory access that failed or any of thousands of intermittent failures. When and if a CPU goes bad, your system will be down for a long time (hours, maybe days). A hardware failure will prevent the system from booting up at all so you have to call HP, wait for the service engineer to arrive and repair the unit, then hope that the failure did not corrupt your data on the disk and possibly require a restore or even a reinstall.
For such a critical system, you need a second rp2405 a shareable disk storage cabinet and MC/Service Guard software. With this configuration, a processor (or memory or LAN, etc) failure will transfer the applications to the backup system. Now repair of the failing system can take place without interruption of the applications. Additionally, you can patch one system while the other one is running.
I would not worry about this one reboot but instead concentrate on what is really required for 24x7 operations.
Bill Hassell, sysadmin