Operating System - HP-UX
1753913 Members
9070 Online
108810 Solutions
New Discussion юеВ

Re: memory failure and CPU failure? in HP-UX rp4440

 
SOLVED
Go to solution
sheevm
Regular Advisor

memory failure and CPU failure? in HP-UX rp4440



How to prevent the system crash if any one of the above fails?

Can someone explain this both in serviceguard and non-serviceguard envirionment?

Rajim
be good and do good
9 REPLIES 9
Patrick Wallek
Honored Contributor

Re: memory failure and CPU failure? in HP-UX rp4440

There is not really any way to prevent the system from crashing if the circumstances are just right (or just wrong, I guess).

It is possible to have a CPU fail and the system NOT crash. This is entirely up to the OS though. The system can automatically deconfigure it and keep running, if you have more than 1 CPU. Again, this depends entirely on how it fails.

With RAM failure, again, it depends. You can have LPMC and single-bit memory errors which will not cause a crash. If you run into an HPMC or a DIMM/SIMM fails completely, then there is nothing you can do. The machine will crash.

Serviceguard does not come into play with these. Serviceguard will allow your application to come up on another machine should the primary machine fail. However, there is no real way to prevent a machine from crashing if the right (or wrong) component fails.
Ludovic Derlyn
Esteemed Contributor

Re: memory failure and CPU failure? in HP-UX rp4440

hi,

In service guard, we create a cluster
Cluster will be configured for monitoring hardware only or hardware and software

Cluster will be prevent crash of a server , if a server crash, activity will be transfered on a server to continue service

It's possible to monitoring software ressource for detect crash and transfert activity to seondary node for example

regards

L-DERLYN
Torsten.
Acclaimed Contributor

Re: memory failure and CPU failure? in HP-UX rp4440

If you really want to prevent a crash in case of a failed CPU or double bit memory error, you need redundant hardware like mirrored RAM. Not (yet? I don't know) avaiable for HP-UX systems. But there is a totally different family of systems provided by HP (keyword nonstop).

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Ninad_1
Honored Contributor

Re: memory failure and CPU failure? in HP-UX rp4440

As been already said by everyone you really cannot prevent a crash in some circumstances because if the system encounters say a double-bit memory errors or unrecoverable CPU error , HPMCs then your system will crash.
Only when the system encounters recoverable CPU errors and single-bit memory errors , LPMC then system will recover from the error and not actually crash. But eventually these single bit errors and recoverable CPU errors may convert into unrecoverable errors and system crashes.
So best thing would be to monitor errors for these recoverable errors and check if these are very frequent/repetitive then there could be some hardware problem. But some errors may not be really due to faulty hardware and may have occured occassionally.
So final answer would be same - you cannot do much to prevent a system from crashing.
But again as Torsten has provided a different perspective, as he usually does, - HP Nonstop servers.

Regards,
Ninad
Torsten.
Acclaimed Contributor

Re: memory failure and CPU failure? in HP-UX rp4440

It may be a little bit off topic, but I want to mention there are already computer systems available with really high available components. The family I pointed to is an example. Another is the high end Proliant family with "hot plug Memory RAID" (DL580). Both cannot run HP-UX or HP-UX cannot run on this systems, however you like.
Maybe we will see this in combination with HP-UX someday, but it shows, HA ideas like a RAID1 or RAID5 over the RAM is not only a theory.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Phillip Thayer
Esteemed Contributor

Re: memory failure and CPU failure? in HP-UX rp4440

What would really be needed is a system that would be able to handle double-bit error correction at the chip level with memory raid capability and "CPU shadowing" so that if a CPU in a CPU shadow member fails the shadow copy will take over.

Now THAT would be a truly high availablity system. If HP can come out with something like this that will run HP-UX/Linux/OpenVMS/RHEL/MSWindows/etc... people would be flocking to it.

Phil
Once it's in production it's all bugs after that.
sheevm
Regular Advisor

Re: memory failure and CPU failure? in HP-UX rp4440

Thanks for all the comments.

Another question. In a non-serviceguard env, In case of a LAN failure is there a way to configure automatic failover to secondary LAN card?
be good and do good
Torsten.
Acclaimed Contributor

Re: memory failure and CPU failure? in HP-UX rp4440

Yes, you have to purchase the APA software product.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Torsten.
Acclaimed Contributor
Solution

Re: memory failure and CPU failure? in HP-UX rp4440

more information about APA here:

http://docs.hp.com/en/J4240-90031/index.html



If you find the postings from all of us helpful, consider to assign points to them.




Points are always welcome and a nice gesture.

Have fun!

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!