Operating System - HP-UX
1833368 Members
2900 Online
110052 Solutions
New Discussion

Stale File System in SG environment

 
Andrea Petronio
Occasional Advisor

Stale File System in SG environment

I am exporting a file system in Service Guard environment (consider a 2 node cluster composed by nodes A and B). The file system to be exported is on the shared disk. I execute the following sequence(HP_UX version is 11.11)

1. mount the disk containing the fs to be exported in exclusive mode on node A
2. activate the floating ip and name on node A
3. export (exportfs -i )
4. mount the fs on an HP node external to the cluster (say node C) using as remote host name the floating name (OK so far)
5. un-export (exportfs -u -i )
6. de-activate the floating ip and name on noda A
7. u-mount the disk containing the fs on node A
8. the mounted fs goes in stale on C (ok so far)
9. mount the disk containing the fs to be exported in exclusive mode on node B
10. activate the floating ip and name on node B
11. export (exportfs -i )
12. the fs on node C stays in stale status (NOT OK)

umounting e re-mounting the fs on node C the situation is recovered

Any suggestions to determine the cause of the problem ?
17 REPLIES 17
Geoff Wild
Honored Contributor

Re: Stale File System in SG environment

Use autofs on node c:

In /etc/auto_master:

/- /etc/auto.direct proto=tcp

In /etc/auto.direct:

/local/mount/point floatingdnsname:/export/your/nfs/filesystem

In /etc/rc.config.d/nfs.conf
AUTOFS=1

Rgds...Geoff


Proverbs 3:5,6 Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make all your paths straight.
Andrea Petronio
Occasional Advisor

Re: Stale File System in SG environment

I've tried the suggested settings, but, no joy.
Currently the configuration is :

/etc/auto_master
/- /etc/auto.direct (option proto-tcp was giving an error in autoopt parsing)

/etc/auto.direct
/rnmc mv36a:/disk_mv36a/opt/mv36/core/nmc

where
/rnmc is my mountpoint
mv36a is the floating name
/disk.... is the exported directory

option AUTOFS=1 is set.

I have reboot the node once applied the changes.
behaviour is unfortunately unchanged.
Note that , in case floating is stopped and restarted on the same node there is no problem.
RAC_1
Honored Contributor

Re: Stale File System in SG environment

You need to do a complete unmount on node C and then fresh mount from node B. Looks like NFS binds to floating ip (When on node A) with some reference to NODE A. so when you try to mount from node B, it creates problem.

A complete and clean unmount from node C, should resolve it.

Anil
There is no substitute to HARDWORK
Andrea Petronio
Occasional Advisor

Re: Stale File System in SG environment

This is the work-around but not the solution.
I have an application on node C bounded to the application on the cluster. If nfs is not auto-recovered, high availability of the application is not granted since the system administrator has to umount/remount to recover system availability
RAC_1
Honored Contributor

Re: Stale File System in SG environment

Does your application bind to floating ip only?? and not to the owner of the floating ip?? What happens when you do nslookup "floating_ip" when it is owned by node A and when owned by node B??

Anil
There is no substitute to HARDWORK
Andrea Petronio
Occasional Advisor

Re: Stale File System in SG environment

The application on node C is a client application that requires access to a remote fs via nfs on the server host (in this case running in SG env).
nfs is mounted using floating name.
Result of nslookup is identical when the server is running on node A, is not running (floating not active) or is running on node B as follows:

# nslookup 172.16.115.183
Using /etc/hosts on: gehp183

looking up FILES
Name:
Address: 172.16.115.183

# nslookup 172.16.115.183
Using /etc/hosts on: gehp183

looking up FILES
Name:
Address: 172.16.115.183

# nslookup 172.16.115.183
Using /etc/hosts on: gehp183

looking up FILES
Name:
Address: 172.16.115.183
RAC_1
Honored Contributor

Re: Stale File System in SG environment

Did that ever work?? You may need to go for NFS under MCSG

Anil
There is no substitute to HARDWORK
Andrea Petronio
Occasional Advisor

Re: Stale File System in SG environment

I've tryed to connect node C to a different cluster and I've got no problem, but I've not been able to find any meaningful difference between the 2 clusters set-up. I was looking for hints to spot which is the difference in set-up that causes the problem.
RAC_1
Honored Contributor

Re: Stale File System in SG environment

You mean to say it wirks when you connect C node (apps on C node) to different cluster and do same exercise that you mentioned, it works??

What does it show when you do showmount -e "floating_ip/hostname" from node C?? and in both cases when floating ip is binded to node A and when binded to node B??

Anil
There is no substitute to HARDWORK
Andrea Petronio
Occasional Advisor

Re: Stale File System in SG environment

showmount -e reports always the same output :

export list for :
/disk_mv36a/opt/mv36/core/nmc (everyone)

where can be the floating name, the floating ip or the hostname (of course when the floating address and ip are active).
In case they are not active (during package switch) it reports failure of RPC timeout.
Note that the export list for showmount returns success immediately after package switch, but nevertheless the fs keeps its stale status.
RAC_1
Honored Contributor

Re: Stale File System in SG environment

You did not answer this.

You mean to say it works when you connect C node (apps on C node) to different cluster and do same exercise that you mentioned, it works??
There is no substitute to HARDWORK
Andrea Petronio
Occasional Advisor

Re: Stale File System in SG environment

Sorry , I missed you rfirst question.
The complete architecture is
- cluster composed by node A & B
- node C outside the cluster.
A client application is running on node C and is looking for it's master on the cluster.
If I change node C configuration to connect to another cluster (say composed by nodes D and E) I got no problem.
My problem is that I cannot find any meangful difference between the configuration of cluster composed by nodes A and B and the one composed by nodes D and E.
RAC_1
Honored Contributor

Re: Stale File System in SG environment

NFS patch levels on D and E and that of A and B??

Anil
There is no substitute to HARDWORK
Sheriff Andy
Trusted Contributor

Re: Stale File System in SG environment

Does node D & E have nfs toolkit, vs node A & B?
Andrea Petronio
Occasional Advisor

Re: Stale File System in SG environment

The cluster having problems has the following patching
NFS B.11.11 ONC/NFS; Network-File System,Information Services,Utilities
PHKL_25238 1.0 11.00 NFS nfsd deadlock
PHKL_25993 1.0 thread nostop for NFS, rlimit, Ufalloc fix
PHKL_28185 1.0 Tunable;vxportal;vx_maxlink;DMAPI NFS hang
PHKL_29335 1.0 vx_nospace on NFS write.
PHKL_30920 1.0 (u)mount,final close,NFS umount,Busy syncer
PHNE_30661 1.0 ONC/NFS General Release/Performance Patch
SG-NFS-Tool A.11.11.02 MC/ServiceGuard NFS Script Templates

the cluster working fine has

NFS B.11.11 ONC/NFS; Network-File System,Information Services,Utilities
PHKL_25238 1.0 11.00 NFS nfsd deadlock
PHKL_25993 1.0 thread nostop for NFS, rlimit, Ufalloc fix
PHKL_28185 1.0 Tunable;vxportal;vx_maxlink;DMAPI NFS hang
PHKL_29335 1.0 vx_nospace on NFS write.
PHKL_30920 1.0 (u)mount,final close,NFS umount,Busy syncer
PHNE_29883 1.0 ONC/NFS General Release/Performance Patch
SG-NFS-Tool A.11.11.02 MC/ServiceGuard NFS Script Templates

Please note that I've added the new NFS general patch on the cluster working badly to check if it was helping , but the 2 cluster initially had the same nfs patching level.
RAC_1
Honored Contributor

Re: Stale File System in SG environment

Send sigusr2 signal to rpc.mountd and check the debug log.

Also Get Dave Olker's one thos NFS threads and send him an email!!!!

Anil
There is no substitute to HARDWORK
Andrea Petronio
Occasional Advisor

Re: Stale File System in SG environment

Problem was due to different minor number provided when creating volum groups on the nodes of the cluster.

Thanks to everybody for support