- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Server Reboot
Operating System - HP-UX
1753817
Members
8515
Online
108805
Solutions
Forums
Categories
Company
Local Language
юдл
back
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
юдл
back
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Blogs
Information
Community
Resources
Community Language
Language
Forums
Blogs
Go to solution
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-07-2011 10:25 PM
тАО06-07-2011 10:25 PM
Re: Server Reboot
Yes, complete network failure between the 2 nodes in the cluster by the look of it - this should never be able to happen unless the aggregates from both lan900 and lan901 run through the same networking kit. So first port of call is to talk to your network team and ask them why all their network switches failed at the same time...
After the network failed, the remore node (bilprddb) was ejected from the cluster following a race for the cluster lock disk - this is normal cluster behaviour when 2 nodes in a cluster cannot communicate over any LAN interfaces.
bilprdci formed a one node cluster, and attempted to start the dbPRD package, which failed (reason unknown - you would need to look at the package log for this, but most likely due to the complete network failure)
Later bilprddb rejoined the cluster and someone manually stopped and started ciPRD on bilprdci
So my advice here is:
1. Review your cluster package logs as well, as they may throw more light on the nature of the failure(s) here.
2. You need a ground up review of the network design within this cluster - a good cluster design should never be able to lose all network links at the same time.
3. Lots of nasty NFS issues in here too, no doubt caused by the network outage - however you should review that you are following the NFS best practice when used in a cluster
4. You need to check your name resolution standards in /etc/nsswitch.conf. In a cluster you really need to have name resolution handled first by files and only then by DNS, and you need to make sure all the interfaces are consistently named in /etc/hosts on both cluster nodes
HTH
Duncan
I am an HPE Employee
After the network failed, the remore node (bilprddb) was ejected from the cluster following a race for the cluster lock disk - this is normal cluster behaviour when 2 nodes in a cluster cannot communicate over any LAN interfaces.
bilprdci formed a one node cluster, and attempted to start the dbPRD package, which failed (reason unknown - you would need to look at the package log for this, but most likely due to the complete network failure)
Later bilprddb rejoined the cluster and someone manually stopped and started ciPRD on bilprdci
So my advice here is:
1. Review your cluster package logs as well, as they may throw more light on the nature of the failure(s) here.
2. You need a ground up review of the network design within this cluster - a good cluster design should never be able to lose all network links at the same time.
3. Lots of nasty NFS issues in here too, no doubt caused by the network outage - however you should review that you are following the NFS best practice when used in a cluster
4. You need to check your name resolution standards in /etc/nsswitch.conf. In a cluster you really need to have name resolution handled first by files and only then by DNS, and you need to make sure all the interfaces are consistently named in /etc/hosts on both cluster nodes
HTH
Duncan
I am an HPE Employee
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-08-2011 03:08 AM
тАО06-08-2011 03:08 AM
Re: Server Reboot
Thank you all for the support.Its a 2node cluster.And one more thing,If HB lan got failed,is it natural that the other node will get rebooted??Here HB lan fails and my primary node gets rebooted.Is it natural in case when HB lan fails?Or even if HB lan it shuld only swicth over the packages and the server shuld b intact?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-08-2011 03:13 AM
тАО06-08-2011 03:13 AM
Solution
In a 2 node cluster, if all the heartbeat LANs between the 2 nodes fail, then one of the nodes is going to get rebooted... this is to ensure that your data is not corrupted.
If neither node can talk to the other, how do they know whether the other node is running one of the packages in the cluster or not... they can't, so what happens is they both try and obtain the cluster lock and the node that "loses" the race for the cluster lock reboots itself. It could just as easily have been the other node that lost the race for the cluster lock...
HTH
Duncan
I am an HPE Employee
If neither node can talk to the other, how do they know whether the other node is running one of the packages in the cluster or not... they can't, so what happens is they both try and obtain the cluster lock and the node that "loses" the race for the cluster lock reboots itself. It could just as easily have been the other node that lost the race for the cluster lock...
HTH
Duncan
I am an HPE Employee
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-08-2011 03:39 AM
тАО06-08-2011 03:39 AM
Re: Server Reboot
Thanks Duncan....I was luking for the same.
- « Previous
-
- 1
- 2
- Next »
The opinions expressed above are the personal opinions of the authors, not of Hewlett Packard Enterprise. By using this site, you accept the Terms of Use and Rules of Participation.
News and Events
Support
© Copyright 2024 Hewlett Packard Enterprise Development LP