Operating System - HP-UX
1827781 Members
2694 Online
109969 Solutions
New Discussion

Re: Filesystem not getting unmounted

 
SOLVED
Go to solution
raiden
Regular Advisor

Filesystem not getting unmounted

Let me explain you the background of this problem .

We have /cbsora filesystem integrated in Cluster package xxx. This FileSystem /cbsora is the home for oracle.

Now the problem is whenever we halt the package xxx this Filesystem doesnt unmount, it says the cannot unmount /cbsora. Device busy.

We have also found the exact cause. This happens when any database administrator has sudo to oracle ID ( home of oracle : /cbsora/ora10g) at the moment when the package is halted the FS /cbsora doesnt get unmounts.

Now the problem to be solved is that why this /cbsora is not getting unmounted. When the package halt script is running fuser -ku this should get unmount but its not getting unmount.

The oracle Database shutdowns properly during package halt so oracle has told that its problem from HP-UX side.

Guys how do we resolve this Issue because this is creating a huge problem because the package failover is not happening in case of any disaster on our primary node because unless the package halt is successful on primary node the package will not start on secondary node. Plzz help
18 REPLIES 18
Lijeesh N G_1
Respected Contributor

Re: Filesystem not getting unmounted

Hi,

1)Check package control log and syslog,what its saying??

2)Is manual unmounting successfull??? What its output??

3)Provide some more details about cluster,
#cmviewcl -v, etc..

Regards,
LIJEESH N G
Ganesh.A
Advisor

Re: Filesystem not getting unmounted

Hi,

Pls run the following command and see any process is still using the FS and kill/stop them as necessaary.

fuser -cu /cbsora
Raj D.
Honored Contributor

Re: Filesystem not getting unmounted

Raiden,
-You need to check in the control file , /etc/cmcluster/pkg.cntl if the oracle shutdown and kill commands are given.

-Also while package shutdown keep an eye on the log file , pkg.cntl.log

-What was the last error before it says unable to unmount the /cbsora filesystem.

-Also keep another session with # fuser -cu /cbsora/ora10g , you will what processes are holding it from geting it unmounted.

- Can you unmount the /cbsora manually , after package halt. If not then to check what process attached to this fs and to troubleshoot further.

Cheers,
Raj.

" If u think u can , If u think u cannot , - You are always Right . "
sen_ux
Valued Contributor

Re: Filesystem not getting unmounted

Being home directory of oracle user, can it be unmounted when the oracle user logged in.????
Logout the oracle user and try a manual switching of package.It should success.
raiden
Regular Advisor

Re: Filesystem not getting unmounted

@ Lijeesh , Raj

Package control log says that the Filesystem /cbsora cannot be unmounted. FileSystem busy
Manual unmounting is also not possible after package halt unless i tell the oracle user to logout.

In package control file I see some some root and oracle named process using that fileSystem and I guess these are process created from user oracle login.

We and Database team are sure that the oracle is getting shutdown properly before the FS unmounting starts during package halt.

@ sen

The FS canot be unmounted when any oracle user is logged in .The only solution we are trying till now during package failover is to tell the oracle user to logout from their sessions.


But what in case if any disaster happens???? . Thats the only concern for us and my management because last week by manual mistake the primary node was shutdown and due to these existing oracle login the package did not halted successfully on primary and hence failed to start on secondary.


How do we take care of this .. please suggest guys.... because all blames are on Unix..... Why the fuser -ku command is not working for that Filesystem /cbsora during package halt..... ... . Please help ...
Raj D.
Honored Contributor

Re: Filesystem not getting unmounted

Raiden,

"Manual unmounting is also not possible after package halt unless i tell the oracle user to logout. "

I am wondering how oracle shutsdown itself during pkg halt,& without killing /closing the 'oracle users' session !!

You mentioned that , manual unmount also not possible unless you tell the oracle user to logout. The oracle shutdown step shuould kill/close any open oracle session , and then it should shutdown/dismount the database. Oracle should take care of this . Make sure you have dbhalt script properly in place in the pkg.cntl file under customer defined function.


hth,
Raj.
" If u think u can , If u think u cannot , - You are always Right . "
sen_ux
Valued Contributor

Re: Filesystem not getting unmounted

Can you paste the fuser command section of the package script.?
raiden
Regular Advisor

Re: Filesystem not getting unmounted

@ Raj

Please understand that these are not database level logins ,, I am talking about system level logins wherein a normal database team users logins with his ID....then does sudo to oracle to connect to SQL prompt... I hope now you understand I am talking about system level logins..

Oracle is doing its job properly .... It is properly disconecting all users who are connected through Application before shutting it down .. The problem is only with this stupid HPUX box who is not able to kill the user level login who are using the /cbsora Filesystem ( the home of oracle ID)
Raj D.
Honored Contributor

Re: Filesystem not getting unmounted

Raiden,

So here you go ,
To put an extra line in the customer_defined_start script like below :

--> After oracle db halt commands in
pkg.cntl file:
#---------
kill -9 `who -u | grep oracle | awk '{print $7}' | xargs`
fuser -cu /cbsora
fuser -ku /cbsora
echo "Checking /cbsora active processes again .."
fuser -cu /cbsora >> fuser.cbsora.final.txt
#---------


Hope this will help to kill the left over prcesses to /cbsora and it should umount fine. Also fuser.cbsora.final.txt file should be blank if all killed properly with the extra script.


Cheers,
Raj.

" If u think u can , If u think u cannot , - You are always Right . "
Raj D.
Honored Contributor

Re: Filesystem not getting unmounted

Correction to the above:

To put an extra line in the "customer_defined_halt " script .

Cheers,
Raj.
" If u think u can , If u think u cannot , - You are always Right . "
raiden
Regular Advisor

Re: Filesystem not getting unmounted

Raj,

The solution which you have given is already implemented in cluster functionality by default.

Whenever the package is halted the fuser -ku is executed by default to kill any process using the Cluster Package Filesystem.

In my Issue also the command fuser -ku is being executed but still the processes using /cbsora are not killed and thts the only concern.

Please guys suggest some alternative.
BUPA IS
Respected Contributor
Solution

Re: Filesystem not getting unmounted

Hello,

I have found that fuser does not always report all the processes using a file system. If you have lsof installed try it using the -t option and using that as input to kill -9 you can use this instead of the fuser -k command .

lsof /cbsora to list them

kill -9 `lsof -t /cbsora`
sleep 30
lsof /cbsora

lsof for hp can be downloaded from the porting and archiving center if you do not have it. Choose a mirror near you .

http://hpux.connect.org.uk/
I hope this miy be of some use
Mike
Help is out there always!!!!!
R.K. #
Honored Contributor

Re: Filesystem not getting unmounted

Hello Raiden,

You can also check:
# bdf /cbsora
# du -sk /cbsora

There will be a good difference, as bdf gives usage counting the unkilled processes while du does not count these processes.

This will give the output of processes which are using deleted files.
# lsof +aL1 /ora_temp

List open files that are using file system /ora_temp
# lsof /ora_temp

As said you need to install lsof.

Regds..
Don't fix what ain't broke
Bill Hassell
Honored Contributor

Re: Filesystem not getting unmounted

> fuser -ku

This is a very dangerous command to use without first running fuser -u. It will kill processes that currently have the filesystem open and without knowing which processes are being killed, you could cause filesystem corruption.

Secondly, fuser is pitifiully inadequate to discover the reason that a moutpoint is busy. It only works about 50% of the time. Your only choice is to download lsof (the right version for your OS). Then check the output carefully. You'll want to script a realiable method to terminate processes that have the mountpoint open. NOTE: a big problem involves users that login and then cd to the mountpoint directory -- not necessary or desirable.

And if the oracle login uses /cbsora/ora10g as the HOME directory, change this immediately. There is no reason that the Oracle DBA has to be in this directory. I am sure that the DBA learned to cd to the directory to avoid typing long pathnames like /cbsora/ora10g/some-Oracle-command, but is is trivial top fix (set the DBA's $PATH correctly).


Bill Hassell, sysadmin
raiden
Regular Advisor

Re: Filesystem not getting unmounted

@ BUPA IS

Thanks for the solution . Its working perfectly as we wanted. Indeed it killed all the processes. Way better that fuser -ku

I will implement this in our cluster halt script.
kill -9 `lsof -t /cbsora`
sleep 30
lsof /cbsora

Hv now a Reason for cheering. Thanks again
Bill Hassell
Honored Contributor

Re: Filesystem not getting unmounted

> kill -9 `lsof -t /cbsora`

Please, never use kill -9, especially on complicated applications and database programs. This is the worst possible way to terminate a program and almost guarantees corrupted data files. Although beginner Unix courses teach kill -9, they always leave out the serious consequences.

There are many, many kill signals (man kill) and most can be handled within the program to perform an orderly shutdown. kill with no value (which is actually kill -15) is always the correct signal to stop a program. If the program needs another signal (and you have no documentation on what is proper), use kill -1 and then kill -2. Each of these signals can be sensed by a program or script to perform a proper close of open files and release shared memory.

kill -9 is very dangerous in that it gives the program no chance at all to perform a proper shutdown. It should only be used manually (never in a script) when you can identify the program and know that a kill -9 will not damage files. kill -9 can leave shared memory badly fragmented and will require a reboot to fix. Your script needs to write the process names that are running into a log, then issue kill -15 for each PID, sleep for 10-20 seconds, then check again. Log any PIDs that still exist and issue kill -1 against those processes. When the script fails to stop all the processes, log the process names with ps -fpPID,PID.. so you can fix the broken code at a later time, then issue the kill -9.

Complicated programs and databases must respond to kill -15, -1 or -2 or they must be repaired for proper operation.


Bill Hassell, sysadmin
raiden
Regular Advisor

Re: Filesystem not getting unmounted

@ Bill

Thanks for the Information Bill. It was valuable.

But the problem which I am facing is on a database Server. During package halt the script is properly shutting down the database.

The only processes utilising the /cbsora FileSystem are from the system level user logins ( which I guess must have been generated when database team users do sudo to oracle).

SO i think i can safely execute kill -9 on these processes . Your advice please.
Bill Hassell
Honored Contributor

Re: Filesystem not getting unmounted

Yes, it is safe to kill these shell processes, but use kill -15 rather than kill -9. Then change the HOME location (/cbsora/ora10g) for the sudo logins. The ability to switch the package is much more important than the convenience of a DBA login. The key to automatically killing these processes is to first identify what they are and then use kill -15.


Bill Hassell, sysadmin