1751972 Members
5155 Online
108784 Solutions
New Discussion юеВ

Clustering question.

 
The Brit
Honored Contributor

Clustering question.

A quick question.

I am making changes to the interconnectivity of my (mixed architecture, I64/Alpha) cluster. Basically, I run 3 bl860c blades as the core of my cluster, however I have an alpha (DS10) which is a non-voting member of my cluster. I need the alpha to run some old legacy stuff (Advanced Server, Faxsr, etc.)

Because the alpha is in a different network, I currently have a direct connection from one of the Alpha NICs to the VC module on my c7000. This connection allows the SCS traffic to move between the Alpha and the Blades.

The connect is however a single point of failure and is a problem when Blade enclosure maintenance it taking place. Because of this, I am in the process of connecting the alpha through the network to the blades, using a defined VLAN (801) to overcome the network problems. Since I am not a network person, (and this is not really a network question), I have been trying to come up with a method of determining whether the SCS communication between the Alpha and the blades it good, or not, after moving to the Network connection. The problem is that when the alpha boots, it mounts the itanium sys disk in order to access the QMan files, Sysuaf, etc. Therefore I really need to know that the SCS communications are working before the alpha mounts the itanium disk and partitions the cluster.

I know that I can avoid partition by setting votes to 0 and Expected_Votes to 1 on the Alpha, so boot can only take place if another cluster node supplies the required vote (actual settings are VOTES=0, EXPECTED_VOTES=3, so all three blades are required for the Alpha to boot). (I am considering setting Expected_votes down to 1 so that I can resolve this communication question with only one blade up (time saving.))

So my question is this; If I have one (IA64) cluster node up, supplying 1 vote, and the alpha boots with VOTES=0 and EXPECTED_VOTES greater than the available votes (i.e. > 1), What will happen when the alpha gets to the

"Waiting to form or join an VMSCluster" pointтАЭ??

Will it just hang because it doesn't have enough votes,

And if it does, will the OPCOM on the Blade still get the

"Node Blade received VMScluster membership request from node Alpha" message.

I am thinking that if the blade receives the Membership request from the Alpha, then this confirms that the communication path is OK, even though there are not enough votes for the Alpha to join the cluster.

Is this reasonable? Or am I missing something.

Thanks

Dave.
11 REPLIES 11
labadie_1
Honored Contributor

Re: Clustering question.

Dave

By default, unless you run lavc_stop_bus or stop an interface with scacp, SCS is started on all network interfaces.

If you want to see the SCS traffic, you have several options, among them 2 spring to mind

1) $ sh clu/c
add loc_proc_nam
remove conn/type=noopen

2) $ mc scacp
sh vc
sh chan

among others
The Brit
Honored Contributor

Re: Clustering question.

Basically what I want to do is confirm that SCS traffic can pass between my blades and the alpha, without letting the alpha boot into the cluster.

I can obviously stop the alpha from booting into the cluster, or booting as a standalone by setting its VOTES=0 and EXPECTED_VOTES=3. Now there is only 1 vote available (from the booted blade) and this is not enough for the Alpha to boot.

But I'm thinking that it must have communicated with the existing cluster node in order to find out if there were enough votes.

So I am assuming that the "Membership Request" message will appear on the OPCOM of the existing node.

The point is, does this "prove" that SCS traffic will flow between the alpha and the blade. I am assuming - Yes.

Dave.
Hoff
Honored Contributor

Re: Clustering question.

With the correct setting for VOTES and EXPECTED_VOTES on all hosts in the cluster, the Alpha will wait at the cluster formation during bootstrap, specifically awaiting the arrival of sufficient votes to satisfy the EXPECTED_VOTES setting.
The Brit
Honored Contributor

Re: Clustering question.

Thanks Hoff,

But, will the "Membership Request" show up on the existing cluster nodes OPCOM?

The other question is

Will the Alpha console show anything to indicate that it is "..waiting for Votes.."

Dave.
Bill Hall
Honored Contributor

Re: Clustering question.

The Alpha will "hang" waiting to join the cluster if the sysgen parameters are correct.

Bill
Bill Hall
abrsvc
Respected Contributor

Re: Clustering question.

Dave

The Alpha will show a message akin to:
Waiting to join...

I will test and post the receiving systems messages (if any).

Dan
Hoff
Honored Contributor

Re: Clustering question.

You'll get some OPCOM chatter.

You'll get a wait for votes, too.

This is the disk data integrity and data consistency preservation wait state. Also known as the quorum hang.

I don't know off-hand if the OPCOM messages will repeat at periodic intervals; I usually don't have boxes kicking around in that state for very long, and (when I do) I'm usually looking to fix whatever was triggering that wait...

This requires VOTES and EXPECTED_VOTES to be set correctly on all hosts in the cluster, or (for some specific cluster cases where this is required) that the startup sequence be carefully established and manually followed, lest Bad Things happen.

Bad Things here includes massive disk data corruptions far before the login prompt is reached. If that prompt can be even be reached, given the data corruptions.
Steve Reece_3
Trusted Contributor

Re: Clustering question.

The other option here (thinking out of the box) is to buy a cheap DE500 off Ebay and put that in the DS10 if you have a spare slot. You could then use one built-in port to one c7000 switch and the DE500 to the other. A $20 fix for the single point of failure and one that you know will work.

You're probably spending more than the $20 in time looking at the networking question at the moment...

FWIW, I *think* that the "waiting to form or join a VMScluster..." message repeats but, like Hoff, I don't normally have a system in that state long enough to verify. One thing I do know is that in the days of 6.2-1H3 the cluster formation broadcasts could go across the network that I had between Bracknell and North Carolina but the normal SCS cluster comm's wouldn't. Since I'm not a network person I'm not sure what the config was.
The Brit
Honored Contributor

Re: Clustering question.

Thanks for all your comments. I have decided to test the situation using copies of the two system disks.

Will boot the Blade from a current backup of the I64 system disk.

Will also boot the Alpha from a current backup of the Alpha System Disk. The only difference will be that the Mount of the I64 system disk will be commented out of the sylogicals.com.

Alpha will have Votes=0, Expected_Votes=1, the blade will have Votes=1, Expected_Votes=1.

The alpha will either succeed in joining the cluster, indicating that the network setup is OK. In this case I can shutdown both nodes and boot correctly from the correct disks.

Or it will fail, indicating that the network setup is NOT OK. In which case, "No harm done", back to the drawing boards.

I should say, that with the VOTES and EXPECTED_VOTES set correctly, I have (almost) total confidence that there will not be an issue, (I am just being overly paranoid, having been recently bitten)

thanks

Dave.