Operating System - HP-UX
1826811 Members
3498 Online
109704 Solutions
New Discussion

Re: ServiceGuard Failover Testing Questions

 
SOLVED
Go to solution
Tonya Underwood
Regular Advisor

ServiceGuard Failover Testing Questions

This past weekend, we performed SG testing after upgrading to 11.16.

Anyways, our standard pulling network cables, one then both... these all worked...

But, someone wanted us to test the pvlinks. So, we pulled one fibre cable. Great! We switched over.

So, they said pull both. I've never done this in SG testing and there was no DB running on this server, so fine. Pulled both.

I will include an lvdisplay and vgdisplay. But, here is my question...

We pulled both fibre cables. There is no path to the storage. I looked and saw the PV's unavailable. However, I was still able to do an lvdisplay of the lv's NO PROBLEM. And, they were sync'd. We waited for 20 mins + and SG never even noticed... all the I/O just remained pending.

So, SG doesn't care? We waited for it to failover but nothing... we finally plugged theh cables back in... no hung I/O and everything went on normally.

I have lv IO timeout at default which I THOUGHT was 90 seconds, however, I received this reply from another person I asked:

"As by nature of Serviceguard it dosen't monitor LVM i/o transfers to Switch the package over if there is i/o failure as in removing a fibre path. There are other products that do this outside of the base Serviceguard product one being ISEE and EMS and the EMS monitors of Serviceguard. The nature of Serviceguard is that it dosen't monitor i/o from a disk after the initial activation of the vg. The standard i/o timeout is as by LVM from the man pages for lvchange "forever"."

Either we are talking about two totally different things, OR one of us is wrong.

Would someone kindly give an explanation?

THANKS!
Tonya Underwood

Here is a sample lv and vg...


--- Logical volumes ---
LV Name /dev/vg06d/lvappl01
VG Name /dev/vg06d
LV Permission read/write
LV Status available/syncd
Mirror copies 0
Consistency Recovery MWC
Schedule parallel
LV Size (Mbytes) 23872
Current LE 1492
Allocated PE 1492
Stripes 0
Stripe Size (Kbytes) 0
Bad block NONE
Allocation strict
IO Timeout (Seconds) default

--- Distribution of logical volume ---
PV Name LE on PV PE on PV
/dev/dsk/c16t2d0 982 982
/dev/dsk/c16t2d2 510 510

--- Volume groups ---
VG Name /dev/vg06d
VG Write Access read/write
VG Status available, exclusive
Max LV 255
Cur LV 8
Open LV 8
Max PV 112
Cur PV 3
Act PV 3
Max PE per PV 6468
VGDA 6
PE Size (Mbytes) 16
Total PE 12942
Alloc PE 12370
Free PE 572
Total PVG 0
Total Spare PVs 0
Total Spare PVs in use 0

--- Logical volumes ---

8 REPLIES 8
IT_2007
Honored Contributor

Re: ServiceGuard Failover Testing Questions

correct. SG won't monitor I/O failures to failover package. Either you need to have EMC powerpat if you have EMC symmetrix / Clariion Storage or 3rd party products.

Since I/O's are queued and pending nothing happend to logical volumes, package won't failover.

Tonya Underwood
Regular Advisor

Re: ServiceGuard Failover Testing Questions

Yes, yes... I understand that SG does not monitor I/O. But, maybe my understanding of lv IO Timeout value is incorrect?

Is default "forever", really? I thought it was 90 seconds...

With a 90 second I/O timeout, should the lv not have become unavailable? And would SG then have known about it?

Thanks
IT_2007
Honored Contributor

Re: ServiceGuard Failover Testing Questions

See man page for lvdisplay and default it "forever".

Your confusion for 90 seconds for physical volume timeout. This you can change using pvchange -t command.

==========
The IO timeout used by LVM for all IO to this
logical volume. A value of default, indicates
that the system will use the value of
"forever". (Note: the actual duration of a
request may exceed this timeout value when
the underlying physical volume(s) have
timeouts which either exceed this value or
are not integer multiples thereof.)

=====================
Tonya Underwood
Regular Advisor

Re: ServiceGuard Failover Testing Questions

AHHHHH

Thank you!

So, if I changed LV IO Timeout to uh, what is recommended? And is this recommended?

If I changed this to no longer be forever, and I pulled the fibre cables, THEN would SG see that the lv had timed out and fail the package or no?
IT_2007
Honored Contributor

Re: ServiceGuard Failover Testing Questions

You can set to 90 seconds using lvchange -t command. see man lvchange

Never tested this way for SG. I think it may failover the package if you match lv and pv timeout values. I suggest try this only if you have TEST environment.

========
Set the IO_timeout for the logical
volume to the number of seconds
indicated. This value will be used to
determine how long to wait for IO
requests to complete before concluding
that an IO request cannot be completed.
An IO_timeout value of zero (0) causes
the system to use the default value of
"forever". NOTE: The actual duration of
the request may exceed the specified
IO_timeout value when the underlying
physical volume(s) have timeouts which
either exceed this IO_timeout value or
are not integer multiples of this value.
=================
Tonya Underwood
Regular Advisor

Re: ServiceGuard Failover Testing Questions

Thanks... I understand how to change the value.

My question NOW is, is the change recommended or is it not recommended? And how is SG gonna react?

Are there any SG experts out there who definitely know how SG would handle this?
melvyn burnard
Honored Contributor
Solution

Re: ServiceGuard Failover Testing Questions

Serviceguard NEVER monitors the disk paths by default, it was not designed to, and so it does not react if both links get pulled. This is also against the design as SG is designed to cater for SPOF's and by pulling BOTH FC cables you create a Multiple Point of Failure, or MPOF.
To let SG react to losing both links, you woul dneed to use the EMS monitors and set this up as a resource or service in your main package configuration, and have the NODE_FAIL_FAST set to YES to force it to TOC in the event of both links failing, and therefore forcing a package switch by TOC'ing the node.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Tonya Underwood
Regular Advisor

Re: ServiceGuard Failover Testing Questions

Thank you, yes... I see that.

And we do have the HA EMC package installed.

However, the more I think about it, the less I think this would be a good thing...

Thank you all!