Re: HA NFS Hangs during the system failover

UVA · ‎08-31-2008

HA NFS Hangs during the system reboot

HP-UX 11.31
Service guard A.11.18.00
NFS toolkit - A.11.31.02
SGeSAP - B.04.51

Hi,

We have implemented the SeriveGurad Cluster for SAP, with two packages. first is
central package runs on node1 and second is dialog instances runs on node2.

we configured the packages and filesystem, checked the failover and etc.., everything
was perfect. and the SAP installation done, again we checked the failover, network fail
etc..., working fine.

now its the part of SGeSAP with HA-NFS,

after install and configure of the SGeSAP with HA-NFS, we rechechek the failover things.
in following manner.

1. reboot the node1 while the central package is on node1.

it took some time to stop the cluster service on node1 and waiting for accessing the NFS
server. after long time it getting hang and trying to access the NFS server.
during this time i cant do cmviewcl, bdf, ping etc...

finally the package got failed over to node2, but the node1 still at to stop the nfs
client subsystem. finally i have to reset the system.

I dont know wht is causing problem. if we disable all the HA-NFS and SGeSAP, try to
failover with same manner its working fine.

Anybody has hint about this problem. is we have to install any patch and etc..

Details of system.

HP-UX 11.31
Service guard A.11.18.00
NFS toolkit - A.11.31.02
SGeSAP - B.04.51

Steven E. Protter · ‎08-31-2008

Shalom,

Quick checklist here:

The NFS toolkit needs:
1) Its own floating IP address
2) Both nodes need access to the NFS storage, which much be shared.
3) The NFS share must not be in the same volume group as the SAP package or the volume group activation of your package failover will be hung when one package fails and the other does not.

In essence there must be complete independence if they NFS and SAP packages are to operate independently. They can use the same storage array, but not the same Volume Group. If they are in some way dependent, then they need to part of an integrated package.

Look at the nfs package and do a tail -f on all the logs and /var/adm/syslog/syslog.log and then conduct a failover test.

You will see something and that will lead to you posting the error here or having a big surprise and diagnosing and solving the problem.

If the later outcome occurs, please post the error code and solution here so others might benefit from your experience.

What we have here is the need for some detective work. Now you have the tools you need to well, detect.

SEP

Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

UVA · ‎08-31-2008

Hi stev,

ths for the reply.

in our case we are using the same volume groups for the SAP and Shared directory.

but different logical volumes and the HA-NFS using the IP address of package.

these are the logs from the node 1 from where i took the reboot for failover.

Aug 31 03:54:56 nakpprd1 cmcld[3115]: Request from node nakpprd1 to enable global switching for package jdbjciPPR.
Aug 31 03:54:56 nakpprd1 cmcld[3115]: Enabled switching for package jdbjciPPR.
Aug 31 03:54:56 nakpprd1 cmcld[3115]: Service cmdisklockd terminated due to an exit(0).
Aug 31 03:54:56 nakpprd1 cmcld[3115]: Service cmlvmd terminated due to an exit(0).
Aug 31 03:54:56 nakpprd1 cmcld[3115]: Service cmlockd terminated due to an exit(0).
Aug 31 03:54:56 nakpprd1 cmcld[3115]: Turning off safety time: node halting
Aug 31 03:54:56 nakpprd1 cmcld[3115]: Service cmnetd terminated due to an exit(0).
Aug 31 03:54:56 nakpprd1 cmsrvassistd[3119]: Service assistant daemon halted.
Aug 31 03:55:01 nakpprd1 cmcld[3115]: This node (nakpprd1) has ceased cluster activities.
Aug 31 03:55:01 nakpprd1 cmcld[3115]: Daemon exiting
Aug 31 03:57:53 nakpprd1 vmunix: NFS server 172.20.101.69 not responding still trying

During this time i cant do anything on the both the nodes.

Steven E. Protter · ‎08-31-2008

Shalom,

My suggestion is that you reconfigure the NFS package to not share an IP address with the SAP package.

In most failure scenarios everything is going to fail to one node or the other.

There either needs to be complete dependence (1 package for both), or complete independence.

The physical configuration will keep the packages running on the same node.

SEP

Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

UVA · ‎08-31-2008

Hi,

we have complete dependence. one package for both sap and ha-nfs,then why its crating problem.

1. Same VG

2. Same IP

Thanks

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: HA NFS Hangs during the system failover

HA NFS Hangs during the system failover