Operating System - HP-UX
1771006 Members
2808 Online
109003 Solutions
New Discussion юеВ

Re: Oracle RAC crashing/setup help

 
Marvin Strong
Honored Contributor

Oracle RAC crashing/setup help

Well I'll try to keep this short. I'm not sure if my terminology is correct as far as RAC goes. This is a test environment.

PROBLEM:
If I pull the network cables from the RAC Master, it causes the Slave to crash and reboot. The master is off the network so, we have a failure.
But if I pull the Slave cables the cluster reforms and continues working.

QUESTION:
1) Is it required to have pacakges configured on the nodes for the RAC instances? Or can the oracle startup scripts perform the vgchange -a s?

Does it fail, because the interconnect, is also a heartbeat connection, so I have to pull it aswell?

any help would be appreciated.
9 REPLIES 9
Stephen Doud
Honored Contributor

Re: Oracle RAC crashing/setup help

Prerequisites to operate Oracle RAC on HPux servers:
- both Serviceguard and Serviceguard extension for RAC must be installed on the servers
- a Serviceguard eRAC cluster must be configured, with "shared" activation mode volume group(s).
- the cluster must be running to enable "shared" VGs to be activated.

Although you don't have to use a package control script to activate the VGs in shared mode, since the cluster must be running, take advantage of the pre-written HP-supported code to activate system resources for RAC.

As fas as RAC is concerned, it allows multiple servers to access the same shared storage array concurrently. It is not a prerequisite that you configure multiple nodes to do so though, from my understanding.
However, if you do, the VGs must be configured to activate in shared mode:

# vgchange -c y -S y

To activate such a VG requires cmlvmd to be running before executing:

# vgchange -a s
(which is normally done in the package control script automatically, when the package starts)

-StephenD
Hein van den Heuvel
Honored Contributor

Re: Oracle RAC crashing/setup help

> PROBLEM: If I pull the network cables from the RAC Master

I think the problem i in that last word.
For Oracle RAC there is no master/slave.
There will be a first, second and so on, but all members are equal.
If you perceive one node as a master, and there is a system setup that is responsible for that (so it is not just master 'in your mind') then ou need to address those differences.
Sure, one node can be bigger or smaller and one node can be the target of more more or specific work, but functionally they should all be the same!

fwiw,
Hein.
Marvin Strong
Honored Contributor

Re: Oracle RAC crashing/setup help

As I stated I didnt know if my terminology was correct or not. I know very little about RAC, and was given the project after someone else did the setup.

If there is no master/slave then:
Unplugging the First Node causes Second node to crash, but second node can be unplugged just fine.

Also there are no packages configured on these machines we just have service guard checking the heartbeat.
Marvin Strong
Honored Contributor

Re: Oracle RAC crashing/setup help

Stephen we do have all the prereqs, actually when the project was thrown onto my plate, I noticed right away that the vg's were not in shared mode, and made the cluster modifications and activated them that way.

Still no packages exist. And right now even oracle is started manually, as it is just proof of concept more than anything.

Actually everything works fine, unless I pull the cables out of the first node. Pulling them from the second node causes a cluster reorg, but everything is fine.
JW_8
Occasional Advisor

Re: Oracle RAC crashing/setup help

Hi Marvin,
> If there is no master/slave then:
> Unplugging the First Node causes Second
> node to crash, but second node can be
> unplugged just fine.
A properly configued SG and RAC cluster should be symetrical. Your description of the problem indicate something may not be right. I suggest you to check the wiring diagram and what interconnect(s) are used by RAC DLM traffic.
Stephen Doud
Honored Contributor

Re: Oracle RAC crashing/setup help

Marvin,

In a multinode cluster, Serviceguard requires the use of at least one Heartbeat LAN to transmit "I'm alive" packets to one another. This permits SG to reorganize a cluster and re-distribute packages if one of the nodes should be unreachable due to networking problems or a hang/crash.

In a simple 2-node cluster, with only a single heartbeat path, disconnecting the HB LAN will cause both nodes to think the other node has failed. This scenario requires the use of a cluster arbitration device such as a cluster lock disk or quorum server.

Normally, when a HB network outage occurs in a 2-node cluster, both nodes race to the arbitration device, and if it responds as hoped, one node will get to it first and determine that it has, and reform a 1-node cluster. The node that gets to the arb. dev. last, is required to TOC (reboot, saving a dump) by Serviceguard. This is necessary to insure that only one node is operating a given package (and touching the data).

When a HB LAN cable is pulled, no matter where it is disconnected between the nodes, SG cannot know where the cable was disconnected from, hence, only the arb. dev. race can determine which node get's immunity and survives ;)

After a 2-node cluster has become a 1-node cluster after such a HB path failure, HB is no longer necessary (since failure of a 1-node cluster terminates the cluster). Therefore, removing the HB cable from the last node has no effect on the 1-node cluster.

-StephenD.
Marvin Strong
Honored Contributor

Re: Oracle RAC crashing/setup help

I have pretty much deferred this to the DBAs at this point because I feel my SG configuration is good.

However, attached is my Cluster.ascii file if anyone wants to look it over. And point out any problems.

I changed the machine names to node1 and node2, that is not the real machine names.

There are no packages associated with this cluster.

Other than their being no standby LAN, I don't see a problem with this cluster.
As this is a test environment, management didn't want to get an extra LAN card.
This should be a problem though, and even someone I was talking with at HP, claims this is a valid test environment.
Stephen Doud
Honored Contributor

Re: Oracle RAC crashing/setup help

The cluster ASCII config file looks good.
Noted:
There are 2 HB NICs per node (this is good practice).

The cluster configuration file contains these lines:
# Warning: There are no standby network interfaces for lan1.
# Warning: There are no standby network interfaces for lan2.

The above comments were added by SG when the cmquerycl was performed.

Since HB is transmitted over 2 LANs, then the next-most plausible reason for a node to go down is that the package is configured with SERVICE or NODE FAIL_FAST = ENABLED!
This causes a node to TOC (reboot itself) if a package should be unable to run.

-sd-
Marvin Strong
Honored Contributor

Re: Oracle RAC crashing/setup help

I would agree, Stephen, only there are no packages, which is why I think, that the way RAC is setup might have something flakey in it, or overlooked, from what I can tell all the SG stuff looks just fine.

Thanks