- Community Home
- >
- Servers and Operating Systems
- >
- NonStop Servers
- >
- How does HP/Tandem NonStop achieve single failure ...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-03-2021 04:27 AM - last edited on тАО08-07-2021 07:13 AM by support_s
тАО08-03-2021 04:27 AM - last edited on тАО08-07-2021 07:13 AM by support_s
How does HP/Tandem NonStop achieve single failure FT without spares?
As far as I could gather from Wikipedia and the mindboggling HPE website, the claim to fame of the NonStop system architecture is that it can achieve a single-failure FT without having to allocate excessive amounts of spare capacity (i.e. in lockstepped architecture you would typically need to overprovision by 3x).
This seems a desirable property, yet I couldn't find more details about the approach they use and the caveats. I.e. what are the assumptions they make about the network, the kind of failures they tolerate, assumed client behavior, the acceptable time to recover, the workflows they run, etc.
Could anybody describe in brief how does the NonStop system solve the typical problems with failure detection and failure correction? Is it a generic magical solution on system level, or does it require that the applications are written to use certain transaction facilities and checkpoint data and communications?
Thanks a lot!
- Tags:
- Alpha server
- English
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-03-2021 09:39 PM
тАО08-03-2021 09:39 PM
Re: How does HP/Tandem NonStop achieve single failure FT without spares?
I was able to get some technical papers..not sure if it'll help
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-11-2021 05:39 AM
тАО08-11-2021 05:39 AM
Re: How does HP/Tandem NonStop achieve single failure FT without spares?
At a very high level, NonStop servers use all the components all the time, without the use of spares, by employing software that will "failover" a failed component to a "Backup" component. All the parts are in use all the time, but you must build in a small amount of excess capacity to accomodate failover in the case of a failure. This concept is used for Processors, Controllers, Disks, Network adapters, and the system bus. There is a concept of Alive messages which are generated by components and monitored by software to enable fast failover in the case of a failure. This is accomplished by a message based operating system, where the OS can redirect messages based on the current state of any component. It is a little more complicated for software components, called processes, which can also fail over (if coded as NonStop) or be recreated quickly in the case of a failure (context free).