- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Lost in possible hardware problem
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2004 01:01 AM
06-17-2004 01:01 AM
Yesterday I had a ioscan -fn hang on this server and was unable to kill it using the kill -9 hammer. Any ioscans done on the system hung. I power cycled the server and found it would panic after the alloc_pdc_pages section of the boot. Unfortunately, it states it was unable to do a dump. I am able to get the server into init state 3 after starting in maintenance mode. Thinking this is a hardware problem I went into XSTM and found everything in the green. I did an information check on the 3 processors and found that one CPU had HPMC codes and unknown errors. No other CPU had these symptoms. I chose to disable this cpu from the BCH and reset. When the server came back up it once again paniced. I again brought it up to init state 3 within maintenance mode and found that the now one of the other CPU's is showing the same errors and HPMC codes. What can account for this shift? Is this a problem with the system board?
Thanks,
Jason
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2004 01:06 AM
06-17-2004 01:06 AM
Re: Lost in possible hardware problem
I'd recommend calling in HP Hardware.
Rgrds,
Rita
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2004 01:10 AM
06-17-2004 01:10 AM
Re: Lost in possible hardware problem
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=405806
Rgrds,
Rita
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2004 06:57 AM
06-17-2004 06:57 AM
Re: Lost in possible hardware problem
Had the hardware guys systematically disassemble this server to test the memory and CPU's each.
Unfortunately, the system panics before it gets to the point where it can create a dump. All tests on these hardware pieces and on all PCI devices came up good. This leaves two big things, software problem or mother board. Does anyone know, if I am getting a system panic right after it reallocates the pdc (alloc_pdc_pages) during boot is this even at a point where software is involved? Could a software corruption cause this or is this definitely at a hardware point?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2004 07:08 AM
06-17-2004 07:08 AM
Re: Lost in possible hardware problem
I might believe the system is panicing when it cant allocate swap at boot time.
Also, a lot of systems are setup with dump/swap on the same device...
Not sure on this but maybe your swap could have corrupted the memory/CPU...
-----------------------------------------------------------
I know this may be hindsight, but I have been burned by rebooting a box that is having trouble as yours did....
I have found that I am better off leaving the box up if it will remain semi-stable. And then troubleshooting it from there.
Sometimes rebooting can add additional problems/symptoms which can hide the true source of the problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2004 07:13 AM
06-17-2004 07:13 AM
Re: Lost in possible hardware problem
At this point two things can happen , either memory H/W problems or it is failing the PDC chksum between ROM and RAM . What version of PDC are you at . either PDC is corrupted , which might need a system board change , or might need a PDC upgrade .
Have HP take a look into these areas as well .
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2004 07:14 AM
06-17-2004 07:14 AM
Re: Lost in possible hardware problem
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2004 07:47 AM
06-17-2004 07:47 AM
Re: Lost in possible hardware problem
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2004 07:53 AM
06-17-2004 07:53 AM
Re: Lost in possible hardware problem
They key is that you said your bdf command was hanging. That is usually indicative of a hard disk or controller or some similar failure where the system can no longer access all of your VG's and LV's.
Now since the system is panic'ing when you try to boot I would try some things:
1) unhook ALL external devices (fibre and scsi both) and try booting the system. If it succeeds hook one device at a time back up until it fails again.
2) If #1 doesn't work, you may have to start pulling I/O cards out and see if that has any effect. If the machine boots after a card is removed then you may have very well found your culprit.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2004 08:04 AM
06-17-2004 08:04 AM
Re: Lost in possible hardware problem
The ioscan command is what hung and prompted us to do a reboot, not a bdf. We have removed all external devices and all PCI cards and the system still fails right after the alloc_pdc_pages. Once again we have one by one removed processors and memory to eliminate them as culprits. Any other ideas.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2004 10:52 PM
06-17-2004 10:52 PM
Re: Lost in possible hardware problem
Thanks for the help,
Jason
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-18-2004 01:50 AM
06-18-2004 01:50 AM
Re: Lost in possible hardware problem
I would set this parm, that might have gotten you past this error. I say might b/c I have never seen this error before.
Usu though, it should work.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-18-2004 01:59 AM
06-18-2004 01:59 AM
Re: Lost in possible hardware problem
sea
Are all your disks that you expect present?
If not the earlier suggestions are good.
A physical inspection of disks/cables power and termination are in order.
There are some diagnostics that the hardware folks can run from the ISL prompt.
SEP
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-18-2004 02:15 AM
06-18-2004 02:15 AM
Solution# echo 2400?20X | adb /dev/dsk/cxtxdx
should return this info ;
2400: 44454645 43543031 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
any other non zero numbers indicate bad blocks and that disk should be changed.
another option would be to try a dd read from both drives, but that could take some time.