Stale File System in SG environment

Andrea Petronio · ‎02-25-2005

I am exporting a file system in Service Guard environment (consider a 2 node cluster composed by nodes A and B). The file system to be exported is on the shared disk. I execute the following sequence(HP_UX version is 11.11)

1. mount the disk containing the fs to be exported in exclusive mode on node A
2. activate the floating ip and name on node A
3. export (exportfs -i )
4. mount the fs on an HP node external to the cluster (say node C) using as remote host name the floating name (OK so far)
5. un-export (exportfs -u -i )
6. de-activate the floating ip and name on noda A
7. u-mount the disk containing the fs on node A
8. the mounted fs goes in stale on C (ok so far)
9. mount the disk containing the fs to be exported in exclusive mode on node B
10. activate the floating ip and name on node B
11. export (exportfs -i )
12. the fs on node C stays in stale status (NOT OK)

umounting e re-mounting the fs on node C the situation is recovered

Any suggestions to determine the cause of the problem ?

Geoff Wild · ‎02-25-2005

Use autofs on node c:

In /etc/auto_master:

/- /etc/auto.direct proto=tcp

In /etc/auto.direct:

/local/mount/point floatingdnsname:/export/your/nfs/filesystem

In /etc/rc.config.d/nfs.conf
AUTOFS=1

Rgds...Geoff

Proverbs 3:5,6 Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make all your paths straight.

Andrea Petronio · ‎02-25-2005

I've tried the suggested settings, but, no joy.
Currently the configuration is :

/etc/auto_master
/- /etc/auto.direct (option proto-tcp was giving an error in autoopt parsing)

/etc/auto.direct
/rnmc mv36a:/disk_mv36a/opt/mv36/core/nmc

where
/rnmc is my mountpoint
mv36a is the floating name
/disk.... is the exported directory

option AUTOFS=1 is set.

I have reboot the node once applied the changes.
behaviour is unfortunately unchanged.
Note that , in case floating is stopped and restarted on the same node there is no problem.

RAC_1 · ‎02-25-2005

You need to do a complete unmount on node C and then fresh mount from node B. Looks like NFS binds to floating ip (When on node A) with some reference to NODE A. so when you try to mount from node B, it creates problem.

A complete and clean unmount from node C, should resolve it.

Anil

There is no substitute to HARDWORK

Andrea Petronio · ‎02-25-2005

This is the work-around but not the solution.
I have an application on node C bounded to the application on the cluster. If nfs is not auto-recovered, high availability of the application is not granted since the system administrator has to umount/remount to recover system availability

RAC_1 · ‎02-25-2005

Does your application bind to floating ip only?? and not to the owner of the floating ip?? What happens when you do nslookup "floating_ip" when it is owned by node A and when owned by node B??

Anil

There is no substitute to HARDWORK

Andrea Petronio · ‎02-25-2005

The application on node C is a client application that requires access to a remote fs via nfs on the server host (in this case running in SG env).
nfs is mounted using floating name.
Result of nslookup is identical when the server is running on node A, is not running (floating not active) or is running on node B as follows:

# nslookup 172.16.115.183
Using /etc/hosts on: gehp183

looking up FILES
Name:
Address: 172.16.115.183

# nslookup 172.16.115.183
Using /etc/hosts on: gehp183

looking up FILES
Name:
Address: 172.16.115.183

# nslookup 172.16.115.183
Using /etc/hosts on: gehp183

looking up FILES
Name:
Address: 172.16.115.183

RAC_1 · ‎02-25-2005

Did that ever work?? You may need to go for NFS under MCSG

Anil

There is no substitute to HARDWORK

Andrea Petronio · ‎02-25-2005

I've tryed to connect node C to a different cluster and I've got no problem, but I've not been able to find any meaningful difference between the 2 clusters set-up. I was looking for hints to spot which is the difference in set-up that causes the problem.

RAC_1 · ‎02-25-2005

You mean to say it wirks when you connect C node (apps on C node) to different cluster and do same exercise that you mentioned, it works??

What does it show when you do showmount -e "floating_ip/hostname" from node C?? and in both cases when floating ip is binded to node A and when binded to node B??

Anil

There is no substitute to HARDWORK

Andrea Petronio · ‎02-28-2005

showmount -e reports always the same output :

export list for :
/disk_mv36a/opt/mv36/core/nmc (everyone)

where can be the floating name, the floating ip or the hostname (of course when the floating address and ip are active).
In case they are not active (during package switch) it reports failure of RPC timeout.
Note that the export list for showmount returns success immediately after package switch, but nevertheless the fs keeps its stale status.

RAC_1 · ‎02-28-2005

You did not answer this.

You mean to say it works when you connect C node (apps on C node) to different cluster and do same exercise that you mentioned, it works??

There is no substitute to HARDWORK

Andrea Petronio · ‎02-28-2005

Sorry , I missed you rfirst question.
The complete architecture is
- cluster composed by node A & B
- node C outside the cluster.
A client application is running on node C and is looking for it's master on the cluster.
If I change node C configuration to connect to another cluster (say composed by nodes D and E) I got no problem.
My problem is that I cannot find any meangful difference between the configuration of cluster composed by nodes A and B and the one composed by nodes D and E.

RAC_1 · ‎02-28-2005

NFS patch levels on D and E and that of A and B??

Anil

There is no substitute to HARDWORK

Sheriff Andy · ‎02-28-2005

Does node D & E have nfs toolkit, vs node A & B?

Andrea Petronio · ‎02-28-2005

The cluster having problems has the following patching
NFS B.11.11 ONC/NFS; Network-File System,Information Services,Utilities
PHKL_25238 1.0 11.00 NFS nfsd deadlock
PHKL_25993 1.0 thread nostop for NFS, rlimit, Ufalloc fix
PHKL_28185 1.0 Tunable;vxportal;vx_maxlink;DMAPI NFS hang
PHKL_29335 1.0 vx_nospace on NFS write.
PHKL_30920 1.0 (u)mount,final close,NFS umount,Busy syncer
PHNE_30661 1.0 ONC/NFS General Release/Performance Patch
SG-NFS-Tool A.11.11.02 MC/ServiceGuard NFS Script Templates

the cluster working fine has

NFS B.11.11 ONC/NFS; Network-File System,Information Services,Utilities
PHKL_25238 1.0 11.00 NFS nfsd deadlock
PHKL_25993 1.0 thread nostop for NFS, rlimit, Ufalloc fix
PHKL_28185 1.0 Tunable;vxportal;vx_maxlink;DMAPI NFS hang
PHKL_29335 1.0 vx_nospace on NFS write.
PHKL_30920 1.0 (u)mount,final close,NFS umount,Busy syncer
PHNE_29883 1.0 ONC/NFS General Release/Performance Patch
SG-NFS-Tool A.11.11.02 MC/ServiceGuard NFS Script Templates

Please note that I've added the new NFS general patch on the cluster working badly to check if it was helping , but the 2 cluster initially had the same nfs patching level.

RAC_1 · ‎02-28-2005

Send sigusr2 signal to rpc.mountd and check the debug log.

Also Get Dave Olker's one thos NFS threads and send him an email!!!!

Anil

There is no substitute to HARDWORK

Andrea Petronio · ‎05-11-2005

Problem was due to different minor number provided when creating volum groups on the nodes of the cluster.

Thanks to everybody for support

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Stale File System in SG environment

Stale File System in SG environment

Re: Stale File System in SG environment

Re: Stale File System in SG environment

Re: Stale File System in SG environment

Re: Stale File System in SG environment

Re: Stale File System in SG environment

Re: Stale File System in SG environment

Re: Stale File System in SG environment

Re: Stale File System in SG environment

Re: Stale File System in SG environment

Re: Stale File System in SG environment

Re: Stale File System in SG environment

Re: Stale File System in SG environment

Re: Stale File System in SG environment

Re: Stale File System in SG environment

Re: Stale File System in SG environment

Re: Stale File System in SG environment

Re: Stale File System in SG environment