Operating System - HP-UX
1832595 Members
3097 Online
110043 Solutions
New Discussion

Re: Oracle Packages failed to switch

 
SOLVED
Go to solution
Brent W. Moll
Advisor

Oracle Packages failed to switch

We had a network interruption which caused a lan failure. Some of the Oracle packages did not fail over properly and stayed in a "down" state.

I noticed this in the package log:

umount: cannot unmount /u02/oradata/QTM3 : Device busy
ERROR: Function umount_fs
ERROR: Failed to unmount /dev/vg_qtm3/lvol1
Oct 23 11:17:48 - Node "sdatpp02": Deactivating volume group vg_qtm3
vgchange: Couldn't deactivate volume group "vg_qtm3":
Device busy
ERROR: Function deactivate_volume_group
ERROR: Failed to deactivate vg_qtm3

It looks as if the logical volume did not unmount cleanly because there were processes running at the time of the service interruption. What happened is that the package could not fail over to it's alternate server, because the volume group was still active on the primary. The result was a down package that should not have been.

How do I implement the fuser -k command in the shutdown script with oracle databases to to insure a 'clean' failover ?
12 REPLIES 12
Ashwani Kashyap
Honored Contributor

Re: Oracle Packages failed to switch

YOur logical volumes did not unmount at all as they were definetely being occupied by some process .

fuser -k is incorporated in the package control file already . When a shutdown of a package is commenced , it runs the customer defined halt scripts first , then doeas an fuser -ku on all mounted logical volume of that pakage , then unmounts and deactivates the VG .

However , fuser -ku will the process occupying that logical volume , but there might be some ipcs ( inter process communications ) that might have been started from that logical volume , and fuser won't be able to kill them . That would resilt in the logical volume not being able to unmount . This is particularly true for oracle logical volumes .

YOu can write a script that does an ipcrm against those processes and put it in your customer defined halt scripts .
Christopher McCray_1
Honored Contributor
Solution

Re: Oracle Packages failed to switch

Hello,

Assuming you are using the ORACLE.sh from the toolkit, what is the flag that you are using in the customer_defined_halt_commands?

Are you using shutdown or halt?

The shutdown does a shutdown immediate, which is fine for normal operations, but reaks havoc when you have a true failover situation because it tries to wait until all transactions are finished before stopping.

The halt flag issues a shutdown abort, which "kicks" everybody off the database and allows the clean failover you desire.

There is a lot of controversey as to whether you use a shutdown abort or immediate. I chose to use the abort. I have never had a problem and my packages have run without incident for years.

I am assuming you are doing a normal shutdown and let you decide whether or not to change it.

Hope this helps

Chris
It wasn't me!!!!
MANOJ SRIVASTAVA
Honored Contributor

Re: Oracle Packages failed to switch

Hi Brent


You can do like this

fuser -ck /mount point ; umount /mount point in the same line , this will ensure the second command to run immedialtey after the first one so that no porcess grabs the mount point.


Manoj Srivastava
melvyn burnard
Honored Contributor

Re: Oracle Packages failed to switch

Well ethe package would have tried to use fuser, but fuser has it's limitations.
Also, you do not say which version of SG you are on, and there were some changes to later versions to retry file system unmounts, due to time delays in processes closing off.
One thing we do know is that Oracle often has an open socket connection into a directory, which fuser cannot detect.
For thsi you can use the unsupported utility lsof obtainable at ath eporting archives around the world.
This lists open files including those opened by a network socket, and then allows them to be closed off.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Christian Gebhardt
Honored Contributor

Re: Oracle Packages failed to switch

Hi
I agree with Christopher.
We use "shutdown abort" to bring down the database, start it up again and shut it down with "shutdown immediate"
We use this scenario for years and it works fine, we had never problems with umounting filesystems because of processes or ipc's

Chris
Brent W. Moll
Advisor

Re: Oracle Packages failed to switch

Thanks to ALL of you for your assistance with this.

The version of MCSG we are using is 11.09 on these servers in a preproduction cluster.

We use the default toolkit scripts for Oracle also.

Our cntl script uses LV_UMOUNT_COUNT=1

We also issue a shutdown command with the startup and shutdown script. Could we chage this to ".sh abort" in the control script ?

Thank you again for you help :)
Brent W. Moll
Advisor

Re: Oracle Packages failed to switch

I apologize .. the version of MC/SG is 11.13 on this cluster.
melvyn burnard
Honored Contributor

Re: Oracle Packages failed to switch

well have you patched this SG version? do : what /usr/lbin/cmcld to check, else install PHSS_27087

You could also look at changing the variable
FS_UMOUNT_COUNT from 1 to something like 4, and then it will try 4 times to umount the file system on package shutdown.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Brent W. Moll
Advisor

Re: Oracle Packages failed to switch

These servers get patched regulary.
PHSS_25124 1.0 MC/ServiceGuard and SG-OPS Edition A.11.13
PHSS_26928 1.0 MC/ServiceGuard and SG-OPS Edition A.11.13
PHSS_27087 1.0 MC/ServiceGuard and SG-OPS Edition A.11.13
Stephen Doud
Honored Contributor

Re: Oracle Packages failed to switch

Hi Brent,

You indicated:
"Our cntl script uses LV_UMOUNT_COUNT=1 "

Have you thought of increasing that value to give Oracle more time to wrap up and exit? Try it - you might like it :)

-s.
Christopher McCray_1
Honored Contributor

Re: Oracle Packages failed to switch

In response to your last post, Brent?

Could we chage this to ".sh abort" in the control script ?

No, the command you would use to do a shutdown abort is ".sh halt"

I would also look into the possibility of patching your service guard version as necessary, as Melvyn suggested.

The purpose of issuing a shutdown abort is on the premise that you have a node that has/is failed and the database needs to be shut down NOW, which is what the halt flag (shutdown abort) accomplishes.

For your consideration.

Hope this helps

Chris

It wasn't me!!!!
Brent W. Moll
Advisor

Re: Oracle Packages failed to switch

Thank you ALL for your assistance. I'm taking your responses to our group meeting to hash out before we implement them, which will probalby be a combination of several of them.

Warmest Regards

Brent W. Moll
Atlanta GA