- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - Linux
- >
- Re: Problem runing only One node
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-23-2007 11:02 PM
тАО05-23-2007 11:02 PM
I have a SG cluster over two DL580 with a MSA1000. Using HBA Qlogic.
The cluster works perfectly, all switch, we have made all the battery test, and everithing works.
But, when I have my two nodes down. I switch on only one node and everithing starts well. The cluster waits for one minute because I put in the configuration file AUTO_START_TIMEOUT to one minute.
But after this, after the login screen. I do cmviewcl and I found the cluster in "unknown" state.
If I try to make a cmruncl, sais that the cluster is waiting for the other node to start.
There is another parameter to say to the cluster, not to wait to the other node?
If the cluster and both nodes are up and runing, and I switch off one node... everithing works OK... so I don`t know if ther is a locklun problem. I don`t thing so.
Thanks a lot another time.
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-24-2007 05:16 AM
тАО05-24-2007 05:16 AM
SolutionIf you wish to start this on just one node, you will need to wait for the autostart interval to run out, then use:
cnmruncl -n
on the node you wish to run as a single node cluster
man cmruncl
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-24-2007 09:41 PM
тАО05-24-2007 09:41 PM
Re: Problem runing only One node
At this point, I want to know if there is some script, wich I can put in the init level, and I can make automatic the comand cmviewcl -n
Because this cluster is going to a farm without workers. And I need to take the control of the cluster if the power fails and only one node starts.
I`m sure there is some script wich can solve this problem.
Thanks a lot.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-28-2007 02:42 AM
тАО05-28-2007 02:42 AM
Re: Problem runing only One node
At this point:
#
# Check to see if the daemon is already running
#
findproc cmcld
if [ "$pid" = "" ]
then
#
# The daemon isn't running already
#
+ isnodeup ingrids2
+ if [ "$node_status" = "down" ]
+ then
+ action "El nodo ingrids2 esta abajo, levantamos el cluster solo con el nodo ingrids1"
+ ${SGSBIN}/cmruncl -v -f -n ingrids1
+ exit 0
+ fi
if [ -f ${SGSBIN}/cmrunnode ]
then
#
# Attempt to join the cluster
#
Adding lines begining with + mark.
You know, I ask if the node "indgrids2" is up, and in the other node I ask for the "ingrids1"...
Can I find some problem??
Thanks a lot.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-28-2007 09:02 PM
тАО05-28-2007 09:02 PM
Re: Problem runing only One node
The "cmruncl -n" is intended to be used only when it is *absolutely certain* that the other node is not running.
ServiceGuard cannot tell these two situations apart:
1.) one node is being started while the other node has lost power or has failed in some other way
2.) both nodes are actually starting at the same time, but the network connections between them have completely failed, i.e. the heartbeat of each node has no way of reaching the other node.
In situation 1), you can start the cluster using one node and have the other node join the cluster later when its problems have been fixed.
In situation 2), one or the other node *MUST NOT BE ALLOWED TO START*, since both nodes would assume the other one has failed, would mount the shared disks and start the applications. If two nodes use the same filesystem simultaneously without knowledge of each other, the result is *CERTAIN FILESYSTEM CORRUPTION*.
The idea behind the cluster lock is that whenever the 2-node cluster loses the heartbeat connections, only one node may continue processing while the other does a hard reboot (to stop the use of the package resources *instantly*) and stops in ServiceGuard startup phase to wait until the network connections are restored. The waiting node will not touch the package resources, because it must assume the other node is using them. Your change would allow the rebooting node to avoid this wait and just blindly assume the other node is down - the problem is that your script *cannot know that* for sure.
If your servers are located in an unmanned server farm, you should implement remote consoles. Which generation is your DL580? If it's reasonably modern, it should have iLO remote console functionality built-in. You can even control the server's power switch through iLO.
You could also use Wake-on-LAN to restart your servers after a power interruption, but in my opinion, WOL is a poor substitute for a real remote console.
MK
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-29-2007 05:54 AM
тАО05-29-2007 05:54 AM
Re: Problem runing only One node
Worse, you leave an opening for possible corruption of yhour data.
SG is working the way it was designed.
Meddle with this at your peril
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-29-2007 08:42 PM
тАО05-29-2007 08:42 PM
Re: Problem runing only One node
Finally, the installation has finished.
The customer accepts the reason of the correct functioning of SG.
I`m with you, SG for Linux, works exactly in this way... no other way.
SG is designed to support N-1 posible mistakes. So SG is not designed to support... two nodes down at the same time,and after one node brocken.
Thanks a lot for everithing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-29-2007 11:52 PM
тАО05-29-2007 11:52 PM