Re: SG(SLES) vs NFS(soft and/or hard mounts)

George Barbitsas · ‎06-15-2009

Hi,

We have a cluster made up of 2 x86 servers with 2 packages configured. Further more the way the servers were configured is nfs server/client on both nodes....no autofs used.

Usually both packages are running on their preferred node, but nonetheless there is an application(not SG package) on the other node that uses one of the shared filesystems(nfs).

My issue arises when I issue a cmhalpkg on the package that has a shared nfs fs to that other application. Here is an extraction of my log.

May 20 21:08:31 - Node "lhsap10": Stoping rmtab synchronization process
May 20 21:08:31 - Node "lhsap10": Unexporting filesystem on lsapepodb:/export/sapmnt/PP7
May 20 21:08:31 - Node "lhsap10": Unexporting filesystem on lhsap10-be:/export/sapmnt/PP7
May 20 21:08:31 - Node "lhsap10": Unexporting filesystem on lhsap20-be:/export/sapmnt/PP7
May 20 21:08:31 - Node "lhsap10": Unmounting filesystem on /usr/sap/PP7/SCS80
May 20 21:08:31 - Node "lhsap10": Unmounting filesystem on /oracle/PP7/mirrlogB
May 20 21:08:31 - Node "lhsap10": Unmounting filesystem on /oracle/PP7/mirrlogA
May 20 21:08:31 - Node "lhsap10": Unmounting filesystem on /oracle/PP7/origlogB
May 20 21:08:31 - Node "lhsap10": Unmounting filesystem on /oracle/PP7/origlogA
May 20 21:08:31 - Node "lhsap10": Unmounting filesystem on /oracle/PP7/oraarch
May 20 21:08:31 - Node "lhsap10": Unmounting filesystem on /oracle/PP7/sapreorg
May 20 21:08:31 - Node "lhsap10": Unmounting filesystem on /oracle/PP7/sapdatatemp
May 20 21:08:31 - Node "lhsap10": Unmounting filesystem on /oracle/PP7/sapdata4
May 20 21:08:31 - Node "lhsap10": Unmounting filesystem on /oracle/PP7/sapdata3
May 20 21:08:31 - Node "lhsap10": Unmounting filesystem on /oracle/PP7/sapdata2
May 20 21:08:31 - Node "lhsap10": Unmounting filesystem on /oracle/PP7/sapdata1
May 20 21:08:31 - Node "lhsap10": Unmounting filesystem on /oracle/PP7
May 20 21:08:32 - Node "lhsap10": Unmounting filesystem on /export/sapmnt/PP7
WARNING: Running fuser to remove anyone using the file system directly.
Cannot stat file /proc/4989/fd/100: Permission denied
Cannot stat file /proc/4990/fd/102: Permission denied
Cannot stat file /proc/4991/fd/102: Permission denied
umount: /export/sapmnt/PP7: device is busy
umount: /export/sapmnt/PP7: device is busy
ERROR: Function umount_fs; Failed to unmount /dev/vgPP7FIXE/lvsapmnt
May 20 21:08:32 - Node "lhsap10": Deactivating volume group vgPP7FIXE
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...

I have to log in to the other node and kill stuff by hand(oracle...sap...java) in order to free up the lock on the filesystem. I have tried soft mount AND hard mounts with no success.

any insight would be greatfull

George Barbitsas · ‎06-15-2009

I would like to add a fix to the typo I made in the TITLE

SG(SLES) vs NFS(soft or hard mounts)

Steven E. Protter · ‎06-15-2009

Shalom,

I think one of the nodes needs to be rebooted.

This issue is not related to the nfs mount options.

SEP

Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

George Barbitsas · ‎06-15-2009

the servers where rebooted many times especially when the umount didn't go through. After killing the programs by hand I rebboted the boxes and same thing happend.

Armin Kunaschik · ‎06-16-2009

This is an architectural problem and has nothing to do with mount options. Never mount/umount NFS shares in the package start/stop script! You are not able to kill processes with pending I/O!
Get HA-NFS (or place the NFS share on a third highly available NFS server like a Netapp filer cluster) and mount statically or via autofs! Don't use overlapping mount points e.g. mount /usr/sap from SAN disks and /usr/sap/trans from NFS.
First make sure the NFS share is back (HA-NFS will do this for you) and then kill/stop leftover processes, it doesn't work the other way around...

My 2 cents,
Armin

And now for something completely different...

George Barbitsas · ‎06-16-2009

hanfs is installed and configured....ill have another look at the configuration....but if we look at the log i posted we clearly see that the processes are not being killed and umount never takes place.

George Barbitsas · ‎06-16-2009

here is the log in full detail....

####### Node "lhsap10": Halting package at Wed May 20 21:07:01 EDT 2009 #######
May 20 21:07:21 - Node "lhsap10": *** Begin: Executing script [/opt/cmcluster/PP7/PP7.cntl] args [stop PP7]
May 20 21:07:21 - Node "lhsap10": (get_source): Found /etc/cmcluster.conf - Source it
May 20 21:07:21 - Node "lhsap10": *** Begin: Executing script [/opt/cmcluster/conf/PP7/sapwas.sh] args [spawn]
May 20 21:07:21 - Node "lhsap10": (get_source): Found /etc/cmcluster.conf - Source it
May 20 21:07:21 - Node "lhsap10": (sapwas_main): Entering SGeSAP stop runtime steps ...
May 20 21:07:21 - Node "lhsap10": (sapwas_main): A.02.00.00
May 20 21:07:21 - Node "lhsap10": (get_source): Found /opt/cmcluster/conf/PP7/sap.config - Source it
May 20 21:07:21 - Node "lhsap10": (get_source): Found /opt/cmcluster/sap/sap.functions - Source it
May 20 21:07:21 - Node "lhsap10": (get_source): Found /opt/cmcluster/conf/sap.functions - Source it
May 20 21:07:21 - Node "lhsap10": (get_source): Found /opt/cmcluster/sap/SID/customer.functions - Source it
May 20 21:07:21 - Node "lhsap10": (checksum_files): Check if files to run source command on are identical
May 20 21:07:21 - Node "lhsap10": (checksum_files): Files to checksum are [ /opt/cmcluster/conf/PP7/sap.config]
May 20 21:07:21 - Node "lhsap10": (checksum_files): Checksums are identical
May 20 21:07:21 - Node "lhsap10": (checksum_files): Check if files to run source command on are identical
May 20 21:07:21 - Node "lhsap10": (checksum_files): Files to checksum are [ /opt/cmcluster/sap/sap.functions /opt/cmcluster/conf/sap.functions]
May 20 21:07:21 - Node "lhsap10": (checksum_files): Checksums are identical
May 20 21:07:21 - Node "lhsap10": *** Begin: Executing script [/opt/cmcluster/conf/PP7/sapwas.sh] MODE [stop]
May 20 21:07:21 - Node "lhsap10": (initialize): TRACE POINT
May 20 21:07:21 - Node "lhsap10": (check_version): TRACE POINT
May 20 21:07:21 - Node "lhsap10": (check_version): SGeSAP: A.02.00.00 Sg: A.11.18.00 Linux: 2.6.16.60-0.33-smp
May 20 21:07:21 - Node "lhsap10": (check_perl): TRACE POINT
May 20 21:07:21 - Node "lhsap10": (check_parameters): TRACE POINT
May 20 21:07:21 - Node "lhsap10": (ip_mapper): TRACE POINT
May 20 21:07:21 - Node "lhsap10": (ip_mapper): TRACE POINT
May 20 21:07:21 - Node "lhsap10": (ip_mapper): TRACE POINT
May 20 21:07:21 - Node "lhsap10": (ip_mapper): TRACE POINT
May 20 21:07:21 - Node "lhsap10": (ip_mapper): TRACE POINT
May 20 21:07:21 - Node "lhsap10": (ip_mapper 10.1.1.230): TRACE POINT
May 20 21:07:21 - Node "lhsap10": (check_parameters): Package will handle SAP J2EE database service
May 20 21:07:21 - Node "lhsap10": (ip_mapper 131.195.119.230): TRACE POINT
May 20 21:07:21 - Node "lhsap10": (check_parameters): Package will handle SAP J2EE system central services
May 20 21:07:21 - Node "lhsap10": (check_access /usr/sap/PP7/SYS/exe/ctrun): TRACE POINT
May 20 21:07:21 - Node "lhsap10": (check_access): WARNING: NFS-Server not specified for /usr/sap/PP7/SYS/exe/ctrun. Skipping step.
May 20 21:07:21 - Node "lhsap10": (check_access /usr/sap/trans/bin): TRACE POINT
May 20 21:07:21 - Node "lhsap10": (check_access): WARNING: NFS-Server not specified for /usr/sap/trans/bin. Skipping step.
May 20 21:07:21 - Node "lhsap10": (app_handler stop): TRACE POINT
May 20 21:07:21 - Node "lhsap10": (check_own_app): TRACE POINT
May 20 21:07:21 - Node "lhsap10": (check_own_app): Starting to check lhsap10-be for instance JC90 ...
May 20 21:07:21 - Node "lhsap10": (is_node_alive lhsap10-be): TRACE POINT
May 20 21:07:21 - Node "lhsap10": (watchdog): Watchdog timer initiated for (PID: 29126 Timeout: 260 secs)
May 20 21:07:21 - Node "lhsap10": (watchdog): Watchdog process itself: WDPID=[29127]
May 20 21:07:23 - Node "lhsap10": (login_check lhsap10-be): TRACE POINT
May 20 21:07:25 - Node "lhsap10": (login_check): ssh -p 224 -o ConnectTimeout=30 access of root to App-Server host lhsap10-be working
May 20 21:07:26 - Node "lhsap10": (login_check): ssh -p 224 -o ConnectTimeout=30 access of pp7adm to App-Server host lhsap10-be working
May 20 21:07:26 - Node "lhsap10": (check_own_app): LINUX App-Server host lhsap10-be for instance JC90 responding
May 20 21:07:26 - Node "lhsap10": (check_own_app): Starting to check lhsap20-be for instance J93 ...
May 20 21:07:27 - Node "lhsap10": (is_node_alive lhsap20-be): TRACE POINT
May 20 21:07:27 - Node "lhsap10": (watchdog): Watchdog timer initiated for (PID: 29350 Timeout: 260 secs)
May 20 21:07:27 - Node "lhsap10": (watchdog): Watchdog process itself: WDPID=[29353]
May 20 21:07:29 - Node "lhsap10": (login_check lhsap20-be): TRACE POINT
May 20 21:07:30 - Node "lhsap10": (login_check): ssh -p 224 -o ConnectTimeout=30 access of root to App-Server host lhsap20-be working
May 20 21:07:31 - Node "lhsap10": (login_check): ssh -p 224 -o ConnectTimeout=30 access of pp7adm to App-Server host lhsap20-be working
May 20 21:07:31 - Node "lhsap10": (check_own_app): LINUX App-Server host lhsap20-be for instance J93 responding
May 20 21:07:31 - Node "lhsap10": (stop_own_app 2): TRACE POINT
May 20 21:07:31 - Node "lhsap10": (treatment_test): (3 /\ 2)
May 20 21:07:31 - Node "lhsap10": (stop_own_app): Instance JC90 on host lhsap10-be not running - skipping step
May 20 21:07:31 - Node "lhsap10": (treatment_test): (1 /\ 2)
May 20 21:07:31 - Node "lhsap10": (stop_own_app): Instance J93 on host lhsap20-be is configured to be excluded - skipping step
May 20 21:07:31 - Node "lhsap10": (ci_remove_shmem normal SCS 80): TRACE POINT
May 20 21:07:31 - Node "lhsap10": (clean_ipc SCS 80 pp7adm): TRACE POINT
May 20 21:07:32 - Node "lhsap10": (clean_ipc): WARNING: shmem has processes attached
May 20 21:07:32 - Node "lhsap10": (app_remove_shmem): TRACE POINT
May 20 21:07:32 - Node "lhsap10": (check_own_app): TRACE POINT
May 20 21:07:32 - Node "lhsap10": (check_own_app): Starting to check lhsap10-be for instance JC90 ...
May 20 21:07:32 - Node "lhsap10": (is_node_alive lhsap10-be): TRACE POINT
May 20 21:07:32 - Node "lhsap10": (watchdog): Watchdog timer initiated for (PID: 29589 Timeout: 260 secs)
May 20 21:07:32 - Node "lhsap10": (watchdog): Watchdog process itself: WDPID=[29591]
May 20 21:07:35 - Node "lhsap10": (login_check lhsap10-be): TRACE POINT
May 20 21:07:37 - Node "lhsap10": (login_check): ssh -p 224 -o ConnectTimeout=30 access of root to App-Server host lhsap10-be working
May 20 21:07:38 - Node "lhsap10": (login_check): ssh -p 224 -o ConnectTimeout=30 access of pp7adm to App-Server host lhsap10-be working
May 20 21:07:38 - Node "lhsap10": (check_own_app): LINUX App-Server host lhsap10-be for instance JC90 responding
May 20 21:07:38 - Node "lhsap10": (check_own_app): Starting to check lhsap20-be for instance J93 ...
May 20 21:07:38 - Node "lhsap10": (is_node_alive lhsap20-be): TRACE POINT
May 20 21:07:38 - Node "lhsap10": (watchdog): Watchdog timer initiated for (PID: 29841 Timeout: 260 secs)
May 20 21:07:38 - Node "lhsap10": (watchdog): Watchdog process itself: WDPID=[29844]
May 20 21:07:40 - Node "lhsap10": (login_check lhsap20-be): TRACE POINT
May 20 21:07:42 - Node "lhsap10": (login_check): ssh -p 224 -o ConnectTimeout=30 access of root to App-Server host lhsap20-be working
May 20 21:07:43 - Node "lhsap10": (login_check): ssh -p 224 -o ConnectTimeout=30 access of pp7adm to App-Server host lhsap20-be working
May 20 21:07:43 - Node "lhsap10": (check_own_app): LINUX App-Server host lhsap20-be for instance J93 responding
May 20 21:07:43 - Node "lhsap10": (stop_own_app 2): TRACE POINT
May 20 21:07:43 - Node "lhsap10": (treatment_test): (3 /\ 2)
May 20 21:07:43 - Node "lhsap10": (stop_own_app): Instance JC90 on host lhsap10-be not running - skipping step
May 20 21:07:43 - Node "lhsap10": (treatment_test): (1 /\ 2)
May 20 21:07:43 - Node "lhsap10": (stop_own_app): Instance J93 on host lhsap20-be is configured to be excluded - skipping step
May 20 21:07:43 - Node "lhsap10": (stop_saposcol_app): TRACE POINT
May 20 21:07:43 - Node "lhsap10": (stop_saposcol_app): Configured to be skipped
May 20 21:07:43 - Node "lhsap10": (stop_addons_prejci): TRACE POINT
May 20 21:07:43 - Node "lhsap10": (stop_cs SCS): TRACE POINT
May 20 21:07:43 - Node "lhsap10": (stop_cs): Halt Java System Central Services Instance ...
May 20 21:07:43 - Node "lhsap10": (stop_direct SCS 80 131.195.119.230): TRACE POINT
May 20 21:07:43 - Node "lhsap10": (stop_direct): Direct shutdown attempt on local host...
May 20 21:07:45 - Node "lhsap10": (stop_direct): Instance on lsapepoci stopped
May 20 21:07:45 - Node "lhsap10": (stop_direct): Waiting for cleanup of resources.....
May 20 21:07:45 - Node "lhsap10": (stop_direct): Waiting for cleanup of resources with ps -ef|grep SCS80_lsapepoci|grep -v sapstartsrv
May 20 21:07:45 - Node "lhsap10": (watchdog): Watchdog timer initiated for (PID: 31215 Timeout: 260 secs)
May 20 21:07:45 - Node "lhsap10": (watchdog): Watchdog process itself: WDPID=[31217]
May 20 21:07:51 - Node "lhsap10": (stop_sapstartsrv SCS 80 131.195.119.230 LINUX pp7adm): TRACE POINT
May 20 21:07:51 - Node "lhsap10": (is_ip_local 131.195.119.230): TRACE POINT
May 20 21:07:51 - Node "lhsap10": (is_ip_local): 131.195.119.230 considered to be local
May 20 21:07:51 - Node "lhsap10": (stop_sapstartsrv): Instance Service shutdown attempt on local host...
May 20 21:07:53 - Node "lhsap10": (stop_sapstartsrv): There was no local instance service running for SCS80
May 20 21:07:53 - Node "lhsap10": (crit_test_app lsapepoci 80 pp7adm SCS 1): TRACE POINT
May 20 21:07:53 - Node "lhsap10": (crit_test_app): Trying to connect enqueue service of instance SCS80 ...
May 20 21:08:04 - Node "lhsap10": (crit_test_app): No connection to instance SCS80: rc=8
May 20 21:08:04 - Node "lhsap10": (crit_test_app): Instance SCS80 not responding
May 20 21:08:04 - Node "lhsap10": (stop_addons_postjci): TRACE POINT
May 20 21:08:04 - Node "lhsap10": (stop_addons_predb): TRACE POINT
May 20 21:08:04 - Node "lhsap10": (stop_ORACLE_jdb): TRACE POINT
May 20 21:08:04 - Node "lhsap10": (ora_setenv): TRACE POINT
May 20 21:08:06 - Node "lhsap10": (ora_setenv): ORASID=orapp7 ORACLE_SID=PP7 ORACLE_HOME=/oracle/PP7/102_64 SAPDATA_HOME=/oracle/PP7
May 20 21:08:06 - Node "lhsap10": (stop_ORACLE_jdb): Halting J2EE database ...
May 20 21:08:21 - Node "lhsap10": (stop_ORACLE_jdb): J2EE Database stopped successfully
May 20 21:08:21 - Node "lhsap10": (ora_stop_listener): TRACE POINT
May 20 21:08:21 - Node "lhsap10": (ora_stop_listener): Stopping ORACLE listener LIST_PP7
May 20 21:08:30 - Node "lhsap10": (ora_stop_listener): The command completed successfully
May 20 21:08:30 - Node "lhsap10": (ora_wait): TRACE POINT
May 20 21:08:30 - Node "lhsap10": (ora_wait): Wait for Oracle shadow process cleanup (Timeout: 260 secs)
May 20 21:08:30 - Node "lhsap10": (ora_wait): 3792 ? 00:00:00 oracle
May 20 21:08:31 - Node "lhsap10": (stop_addons_postdb): TRACE POINT
May 20 21:08:31 - Node "lhsap10": (stop_saposcol): TRACE POINT
May 20 21:08:31 - Node "lhsap10": *** Done: Executing script [/opt/cmcluster/conf/PP7/sapwas.sh] MODE [stop]
May 20 21:08:31 - Node "lhsap10": (sapwas_main): Leaving SGeSAP stop runtime steps
May 20 21:08:31 - Node "lhsap10": *** Done: Executing script [/opt/cmcluster/conf/PP7/sapwas.sh] args [spawn]
May 20 21:08:31 - Node "lhsap10": Remove IP address 10.1.1.230 from subnet 10.1.1.0
May 20 21:08:31 - Node "lhsap10": Remove IP address 131.195.119.230 from subnet 131.195.119.0
May 20 21:08:31 - Node "lhsap10": Stoping rmtab synchronization process
May 20 21:08:31 - Node "lhsap10": Unexporting filesystem on lsapepodb:/export/sapmnt/PP7
May 20 21:08:31 - Node "lhsap10": Unexporting filesystem on lhsap10-be:/export/sapmnt/PP7
May 20 21:08:31 - Node "lhsap10": Unexporting filesystem on lhsap20-be:/export/sapmnt/PP7
May 20 21:08:31 - Node "lhsap10": Unmounting filesystem on /usr/sap/PP7/SCS80
May 20 21:08:31 - Node "lhsap10": Unmounting filesystem on /oracle/PP7/mirrlogB
May 20 21:08:31 - Node "lhsap10": Unmounting filesystem on /oracle/PP7/mirrlogA
May 20 21:08:31 - Node "lhsap10": Unmounting filesystem on /oracle/PP7/origlogB
May 20 21:08:31 - Node "lhsap10": Unmounting filesystem on /oracle/PP7/origlogA
May 20 21:08:31 - Node "lhsap10": Unmounting filesystem on /oracle/PP7/oraarch
May 20 21:08:31 - Node "lhsap10": Unmounting filesystem on /oracle/PP7/sapreorg
May 20 21:08:31 - Node "lhsap10": Unmounting filesystem on /oracle/PP7/sapdatatemp
May 20 21:08:31 - Node "lhsap10": Unmounting filesystem on /oracle/PP7/sapdata4
May 20 21:08:31 - Node "lhsap10": Unmounting filesystem on /oracle/PP7/sapdata3
May 20 21:08:31 - Node "lhsap10": Unmounting filesystem on /oracle/PP7/sapdata2
May 20 21:08:31 - Node "lhsap10": Unmounting filesystem on /oracle/PP7/sapdata1
May 20 21:08:31 - Node "lhsap10": Unmounting filesystem on /oracle/PP7
May 20 21:08:32 - Node "lhsap10": Unmounting filesystem on /export/sapmnt/PP7
WARNING: Running fuser to remove anyone using the file system directly.
Cannot stat file /proc/4989/fd/100: Permission denied
Cannot stat file /proc/4990/fd/102: Permission denied
Cannot stat file /proc/4991/fd/102: Permission denied
umount: /export/sapmnt/PP7: device is busy
umount: /export/sapmnt/PP7: device is busy
ERROR: Function umount_fs; Failed to unmount /dev/vgPP7FIXE/lvsapmnt
May 20 21:08:32 - Node "lhsap10": Deactivating volume group vgPP7FIXE
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
VG vgPP7FIXE is busy, will try deactivation...
Can't deactivate volume group "vgPP7FIXE" with 1 open logical volume(s)
ERROR: Function deactivate_volume_group; Failed to deactivate vgPP7FIXE
Attempting to deltag to vg vgPP7FIXE...
deltag was successful on vg vgPP7FIXE.
May 20 21:09:02 - Node "lhsap10": Deactivating volume group vgPP7db01
Attempting to deltag to vg vgPP7db01...
deltag was successful on vg vgPP7db01.
###### Node "lhsap10": Package halted with ERROR at Wed May 20 21:09:03 EDT 2009 ######

wci · ‎06-16-2009

Hi

I think your un_export_fs function goes well since it does not complaint about un exporting the file system.
can you see
/export/sapmnt/PP7 mounted on your client after the package halt script passes un export?

check if your NFS and LVM patches are up to date .
Do you kill processes only from your client side?

Regs

George Barbitsas · ‎06-17-2009

yes, when the application is not up on the other node and I issue a cmhaltpkg everything ofes well usually.

when the application is up on the other node the filesystem can't umount because of pending IO....not sure but this seems to be an architectural problem like one poster suggested...

anyone else have any ideas?

Armin Kunaschik · ‎06-17-2009

Is this, by chance, a 2-node SAP cluster with the central instance on one node and a dialogue instance (of the same SID) on the other?

Armin

And now for something completely different...

George Barbitsas · ‎06-17-2009

it is a 2 node cluster with 2 packages on one node, and some SAP components(unclustered) on the other node, but these components NFS mount the filesystem that is used by one of the two packages....

when I issue a cmhalpkg on that particular package the exported mountpoint cannot be unmount because of pending io

Armin Kunaschik · ‎06-18-2009

As I said before, this is an architecture problem!
Place the other application into a SG package, regardless if it's switchable or not. But in a working cluster any application should be able to failover to other nodes. The benefit of this is, that you don't need to failback the production application to get the non-cluster-aware application back to work.
If you did that, it's easy to insert a cmhaltpkg into to stop command section of the prod package.
On the start side you should create a dependency that will not start until the prod package is up. A bit more scripting/configuration is involved, if the production crashes and fails over to the other node. In this case you need to bring up (in this order) HA-NFS, stop 11.18, you can create pre-Scripts and run the necessary actions.

And the last thing: SGeSAP is not a big help with this setup. If you're able to script the SAP/application startup, you don't need SGeSAP. SGeSAP is bloated and slow and too expensive... but this is only a personal opinion.

My 2 cents,
Armin

And now for something completely different...

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: SG(SLES) vs NFS(soft and/or hard mounts)

SG(SLES) vs NFS(soft and/or hard mounts)