Operating System - HP-UX
1833792 Members
1815 Online
110063 Solutions
New Discussion

serviceguard with rac will cause reboot if both lan cables are out ?

 
SOLVED
Go to solution
Sammy_2
Super Advisor

serviceguard with rac will cause reboot if both lan cables are out ?

SGeRAC s/g test questions.

1) If I manually pull both primary and secondary lan cards on one node, is the node suppose to panic and reboot in a
2 node serviceguard rac cluster ?.if yes, is there is a setting in cluster config file ? My node did not reboot.


2) Is primary path to disks suppose to failover to secondary path on a shared oracle raw vg (shared between 2 nodes) , if I pull the cable
to primary path ? They did not fail although the disks exclusive to a just one node failed from c8 to c9.

Thanks



HPUX 11.11
=======
B3935DA A.11.16.00 Serviceguard
T1859BA A.11.16.00 Serviceguard Extension for RAC
good judgement comes from experience and experience comes from bad judgement.
7 REPLIES 7
Devender Khatana
Honored Contributor

Re: serviceguard with rac will cause reboot if both lan cables are out ?

Hi,

When two nodes are there in a cluster and you are removing both cables from one node. It is not necessary that this node only will reboot. This is because allthough the heartbeat is missing but the node which obtained the clusterlock first will continue to server the cluster and package behaviour will be as configured.

Did you second node rebooted at this point? Also what was the output of "cmviewcl -v" on the node from which cable was removed?

If the configuration is correct and alternate paths are defined in VG's then they should switch over. If it is not happening then it means that you have not added it in VG config.

A "vgdisplay -v" output for the VG will give you the details.

HTH,
Devender
Impossible itself mentions "I m possible"
melvyn burnard
Honored Contributor

Re: serviceguard with rac will cause reboot if both lan cables are out ?

1) If I manually pull both primary and secondary lan cards on one node, is the node suppose to panic and reboot in a
2 node serviceguard rac cluster ?.if yes, is there is a setting in cluster config file ? My node did not reboot.

Well what is your network and heartbeat configuration set up like?
If you have multiple heartbeat paths, then they should maintain contact and stay up



2) Is primary path to disks suppose to failover to secondary path on a shared oracle raw vg (shared between 2 nodes) , if I pull the cable
to primary path ? They did not fail although the disks exclusive to a just one node failed from c8 to c9.

Yes it should, but how long did you give it? There are timeouts set that must expire before a failover.
Check syslog.log for any messages about these failures
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Sammy_2
Super Advisor

Re: serviceguard with rac will cause reboot if both lan cables are out ?

Devender,
The node from which both network cables were pulled did not reboot. Obviously, the other one stayed up as expected.
Attached is current cmviewcl -v output and vgdisplay -v output. The cmviewcl showed node as being up at
the time.

On the SG with Rac, (on shared raw vg)I am not manually allowed to vgreduce the primary path (see below).
Is that normal ?

Does that
mean when the primary path fails in real life, the secondary will not take over since the vg is in shared mode.

# vgreduce /dev/ops /dev/dsk/c9t0d3
vgreduce: The volume group "/dev/ops" is active in Shared Mode.
Cannot perform configuration change.




Mel,
Attached is cmviewcl -v output. I think I have multiple h/beats. i forgot how to check. Will that cause system
to stay up with no network connectivity ?



On failover path to disks, I gave it sufficient time (like 30 secs). I saw the disks exclusive to server failed
over within secs.



# cmviewcl -v
CLUSTER STATUS
escript_prod up

NODE STATUS STATE
inbphes5 up running

Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 0/1/2/0 lan0
PRIMARY up 0/5/2/0 lan3
STANDBY up 0/4/1/0 lan2

PACKAGE STATUS STATE AUTO_RUN NODE
inbphes5sg up running enabled inbphes5

Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual

Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Subnet up 10.108.65.0

Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled inbphes5 (current)

NODE STATUS STATE
inbphes6 up running

Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 0/1/2/0 lan0
STANDBY down 0/4/1/0 lan2
PRIMARY up 0/5/2/0 lan3

UNOWNED_PACKAGES

PACKAGE STATUS STATE AUTO_RUN NODE
inbphes6sg down halted disabled unowned

Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual

Script_Parameters:
ITEM STATUS NODE_NAME NAME
Subnet up inbphes6 10.108.65.0

Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled inbphes6
=================================
# vgdisplay -v


-- Volume groups ---
VG Name /dev/vg00
VG Write Access read/write
VG Status available
Max LV 255
Cur LV 8
Open LV 8
Max PV 16
Cur PV 2
Act PV 2
Max PE per PV 4384
VGDA 4
PE Size (Mbytes) 16
Total PE 8748
Alloc PE 7672
Free PE 1076
Total PVG 0
Total Spare PVs 0
Total Spare PVs in use 0

--- Logical volumes ---
LV Name /dev/vg00/lvol1
LV Status available/syncd
LV Size (Mbytes) 352
Current LE 22
Allocated PE 44
Used PV 2

...
....

...

LV Name /dev/vg00/lvol5
LV Status available/syncd
LV Size (Mbytes) 7008
Current LE 438
Allocated PE 876
Used PV 2

LV Name /dev/vg00/lvol6
LV Status available/syncd
LV Size (Mbytes) 4000
Current LE 250
Allocated PE 500
Used PV 2

...
...


--- Physical volumes ---
PV Name /dev/dsk/c2t1d0
PV Status available
Total PE 4374
Free PE 538
Autoswitch On

PV Name /dev/dsk/c3t0d0
PV Status available
Total PE 4374
Free PE 538
Autoswitch On


VG Name /dev/ops
VG Write Access read/write
VG Status available, shared, client
Max LV 255
Cur LV 73
Open LV 73
Max PV 13
Cur PV 1
Act PV 1
Max PE per PV 36684
VGDA 2
PE Size (Mbytes) 4
Total PE 36678
Alloc PE 12311
Free PE 24367
Total PVG 0
Total Spare PVs 0
Total Spare PVs in use 0

--- ---

inbphes6 Server
inbphes5 Client

--- Logical volumes ---
LV Name /dev/ops/dtiarchivetbl04
LV Status available/syncd
LV Size (Mbytes) 2000
Current LE 500
Allocated PE 500
Used PV 1



LV Name /dev/ops/dtiaidx01
LV Status available/syncd
LV Size (Mbytes) 100
Current LE 25
Allocated PE 25

...
...
..





LV Name /dev/apps/es_apps
LV Status available/syncd
LV Size (Mbytes) 3000
Current LE 750
Allocated PE 750
Used PV 1


--- Physical volumes ---
PV Name /dev/dsk/c9t0d1
PV Name /dev/dsk/c8t0d1 Alternate Link
PV Status available
Total PE 17260
Free PE 4618
Autoswitch On

good judgement comes from experience and experience comes from bad judgement.
Sammy_2
Super Advisor

Re: serviceguard with rac will cause reboot if both lan cables are out ?

Attached is cmviewconf output.
Thanks
good judgement comes from experience and experience comes from bad judgement.
melvyn burnard
Honored Contributor

Re: serviceguard with rac will cause reboot if both lan cables are out ?

Well the answer to question 1 is that there are indeed 2 heartbeat subnets.
So, if you only pull the primary and standby for one subnet, then the heartbeats are still going over the second heartbeat subnet, so the nodes stay formed as a cluster.

As for question 2, you cannot make any CONFIG changes while it is coinfigured as a shareable VG, but the vgdisplay shopws there is an alternate link, so you need to monitor the syslog and check when you pull the PRIMARY link whether there are messages about the VG losing it's primary side.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Kent Ostby
Honored Contributor
Solution

Re: serviceguard with rac will cause reboot if both lan cables are out ?

Sammy --

In a non-RAC setting, if you had lan1 and lan2 configured as a primary and standby for the PACKAGE and both were pulled, it would fail the PACKAGE over to the secondary machine.

With RAC, the "package" is already running on both servers so there is no failover per-se.

As melvin mentions, the node won't go down because you have a second heartbeat.

Now, the one thing that you have to be careful of is that you cannot control which node is going to get the cluster lock disk if you pull both heartbeats unless you are also running with the RS-232 hook up which allows for the proper box to be chosen to stay up in that case.

Also, note, that if you need more insight on future tests to post the syslog.log entries from the time of the test.

Best regards,

kent M. ostby
"Well, actually, she is a rocket scientist" -- Steve Martin in "Roxanne"
Sammy_2
Super Advisor

Re: serviceguard with rac will cause reboot if both lan cables are out ?

Thanks Mel and Kent
Actually, we had problems on the secondary node as well when I pulled both the cables on primary node.
the app.(oracle) on sec. node had issues. Working with Hp to see why.
Does cluster attempts to deactivate vg on shared disks when both cables are pulled out from one node and thus
the problems were seen on other node ?
But, I see why node did not reboot as I have 2 hbeats. Thanks to Mel, kent, Devender.


2nd issue

The entries below (syslog.log) are from when fibre cable was pulled and then put back in.


Is there suppose to be an entry like below for each lun that failed over ? because
I only see one entry.
The mutually exclusive Lun failovered fine but shared lUN did not. It stayed the same and when
I did diskinfo on primary disk path, it spewed errors. I may try this again.


Oct 20 14:40:31 inbphes5 vmunix: LVM: Performed a switch for Lun ID = 0 (pv = 0x
000000004e69e000), from raw device 0x1f090100 (with priority: 0, and current fla
gs: 0x40) to raw device 0x1f080100 (with priority: 1, and current flags: 0x0).
Oct 20 14:40:31 inbphes5 vmunix: LVM: VG 64 0x020000: PVLink 31 0x090100 Failed!
The PV is still accessible.
Oct 20 14:42:11 inbphes5 vmunix: LVM: Performed a switch for Lun ID = 0 (pv = 0x
000000004e69e000), from raw device 0x1f080100 (with priority: 1, and current fla
gs: 0x0) to raw device 0x1f090100 (with priority: 0, and current flags: 0x80).
Oct 20 14:42:11 inbphes5 vmunix: LVM: VG 64 0x020000: PVLink 31 0x090100 Recover
ed.
good judgement comes from experience and experience comes from bad judgement.