Operating System - HP-UX
1833027 Members
2113 Online
110049 Solutions
New Discussion

A strange NFS (?) problem

 
claudio_22
Regular Advisor

A strange NFS (?) problem

Hello ,

I try to explain .

We have a cluser with 2 nodes that I call node1 and node2 . On node2 run package pkg1

I switched for test pkg1 on node1 ; then , when I come back pkg1 to node2 .. package failed to start because was not able to mount in local ( with automount ) the dir /export/sapmnt/PCP like on /sapmnt/PCP

Instead I was able to mount other two file system exported on the same server and using automount

I was able to go under /sapmnt but any command on PCP didn't work ( for example cd PCP , rm -r PCP and so on ) . No response .

At end I rebooted node2 and started in sigles user mode . At this level I was able to go under /sapmnt ..but not going into PCP dir . I had " access denied "

I switched the pakcage on node1 and everithink was ok . Then I again switched back the package to node2 and .. everithink ok now .

Tried a couple of time to stop/start package on node2 and now work fine .

Now we have not not more the problem but I'd like understand what was the cause of that behavior.

Any suggest ? Any idea of possible cause ? Maybe an NFS problem .. someone has never expericend a similar situation ?

Both server are ia64 hp server rx7620 running
hp-ux 11.23 witj QPK patch bundle May2005 .

We go in production next week .. is possible that the problem will not reappear in future ..however we like try have a better comnpresion of what could be that problem.

Thanks for any replay and efforts


Regards




13 REPLIES 13
Steven E. Protter
Exalted Contributor

Re: A strange NFS (?) problem

Shalom,

The problem is likely to re-appear.

If you have the /var/adm/syslog/syslog.log files for both servers and the logs in the /etc/cmcluster directories, this would be very helpful in figuring out the problem.

Have you installed NFS4 into the two 11.23 servers? It might help.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
claudio_22
Regular Advisor

Re: A strange NFS (?) problem

Thanks Steve .

Yes, of course I've either syslog.log that packages log .

What should I look for in node2 logs ?

And , excuse mine not so good acknowledge on NFS .. how can I look if I use NFS4 ?

Regards
claudio_22
Regular Advisor

Re: A strange NFS (?) problem

Hello ,

On both servers we are using version 3 .



Server nfs:
calls badcalls
70210 0
Version 2: (0 calls)
null getattr setattr
0 0% 0 0% 0 0%
root lookup readlink
0 0% 0 0% 0 0%
read wrcache write
0 0% 0 0% 0 0%
create remove rename
0 0% 0 0% 0 0%
link symlink mkdir
0 0% 0 0% 0 0%
rmdir readdir statfs
0 0% 0 0% 0 0%
Version 3: (70211 calls)
null getattr setattr
118 0% 8970 12% 670 0%
lookup access readlink
50376 71% 1328 1% 23 0%
read write create
4615 6% 1366 1% 347 0%
mkdir symlink mknod
0 0% 14 0% 0 0%
remove rmdir rename
358 0% 0 0% 80 0%
link readdir readdir+
0 0% 16 0% 858 1%
fsstat fsinfo pathconf
48 0% 20 0% 108 0%
commit
896 1%

Eric SAUBIGNAC
Honored Contributor

Re: A strange NFS (?) problem

Hi claudio,

does the /export/sapmnt/PCP belong to the package pkg1 ?

If so, i think the problem you have relies on NFS over TCP.

You should modify your automount map on both nodes to add option : proto=udp
Otherwise we can suppose that the problem will reappear, maybe on node1, maybe on node2

Hope this will help

Eric

(PBFWME;-)

Eric SAUBIGNAC
Honored Contributor

Re: A strange NFS (?) problem

Hi claudio,

Some explanations ... in spite of my poor english ;-(

When your directory is unexported, there is no advertising sent to clients. So there is no reason that autofs unmounts /sapmnt/PCP before timeout occurs. Right ?

Now suppose that, when pkg1 leaves node2, /sapmnt/PCP is mounted by autofs. Suppose again that timeout for unmounting /sapmnt/PCP on node2 didn't still occur when pkg1 is back from node1 to node2.

We could think that when the directory is exported again automount will be able to access the share again.

That is true if the protocol is UDP as it is a "connection-less" protocol. But with TCP, automount simply try to continue the existing connection ... which does no more exist, in spite of floatting IP. So you are stuck by tcp and automount timeouts ...

I have no HP-UX box near me, to control timeout of automount. But i am almost sure that if you wait time enough (> 10mns ?) between leaving node2 and coming back on node2, you will experiment no trouble.

In other words if you want to force the problem to reappear, make this test (without proto=udp option) :

- start package on node1
- before shutting down package on node1, make an access to /sapmnt/PCP from node2 and verify with mount -p that /sapmnt/PCP is mounted.
- now quickly stop package on node1, then restart it on node2, before automount timeout occurs.

==> there is a big chance that the package will not start : it tries to access startsap in /sapmnt/PCP/exe which is no longer accessible as it relies on a connection openned by autofs from node2 to node1.

Hope this will help

Eric

(PBFWME;-)
Lolupee
Regular Advisor

Re: A strange NFS (?) problem

Claudio,

do you have the NFS part of the package in the package script or you are relying on automount.

there is a product from HP for NFS file systems on HA cluster. This product is very easy to use and would solve your issue right away. I believe it is called the Service Guard tools.

If you do not want to go into that path, I would advise you froget automount and mount your files within your script.

the issue, you maybe encountaring are as follows.

You would need to umount the filesystems, and unexports whenever the package comes down. automount would not unexport for you. Moreover, it depends on how you are using the mount, are they in cluster mode when they are vgchanged. This is easy when you use the tool and when you move the mounting & dismounting to the package itself.
Dave Olker
Neighborhood Moderator

Re: A strange NFS (?) problem

Hello Claudio and Steven,

I'd like to point out a couple of things:

1) NFS Version 4 is *not* available on HP-UX yet. It will first arrive on HP-UX 11i v3. It is not availble for 11i v1 or v2.

2) If you're using HP-UX 11i v2 and HA/NFS with automounted directories, be sure to use the -L option for automountd. This option tells AutoFS to *not* use any LOFS - loopback filesystem - mounts (which would occur when server1 is trying to mount a filesystem that resides on server1).

The automountd man page discusses this:

Options

-L Force all mounts to the local host to be NFS mounts instead of the default LOFS mounts. This is necessary for highly available NFS mounts.


There are known problems when LOFS mounts get involved with HA/NFS. The -L option was added by the NFS lab specifically to counter these problems.

The best way to make sure this doesn't happen is to modify the /etc/rc.config.d/nfsconf file as follows:

AUTOMOUNTD_OPTIONS="-L"

You should make this change on *any* 11i v2 NFS server participating in an HA/NFS cluster.

Hope this helps,

Dave


I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
Eric SAUBIGNAC
Honored Contributor

Re: A strange NFS (?) problem

Hi everybody,

lolupee, automount is the method recommanded by HP with "MC/ServiceGuard Extension for SAP". HA NFS toolkit is only used in an SAP cluster for server side of NFS, to export and unexport directories like /export/sapmnt/ or /usr/sap/trans, not to mount or dismount NFS directories.

An other point which wouldn't be solved by the fact you use or not automount in the cluster : in SAP environnement you can have multiple Application Server, that are not part of the cluster, but need to mount with NFS /sapmnt/ from :/export/sapmnt/. So, unless you execute remote script to mount or dismount NFS directories, you may experiment the same behavior on these "external" AS, wether you mount statically NFS directories with /etc/fstab or dynamically with automount.

In my mind, an HA NFS Server should always be reached by clients through UDP not TCP. It is not THE solution, just a component of the solution.

Claudio I don't know if you use MC/ServiceGuard Extension for SAP, but it is strongly recommended in order to obtain support from HP for your cluster. It is an other component of the solution.

(thanks to Dave : i did not know that automount achieves cross mounting through LOFS)

Hope this will help

Eric

(PBFWME;-)
claudio_22
Regular Advisor

Re: A strange NFS (?) problem

Hi all ,

Dave , I've tried adding -L options for automountd but the problem persist .


When I start pkg on node 2 /sapmnt/PCP become unavaliable . I can go under /sapmnt nut when I try do under PCP I've

cd /sapmnt/PCP
sh: /sapmnt/PCP: Permission denied.

If I try ll PCP : PCP unreadable
total 0

What is trange is that when pkg is on it export other fs on itself and they work .

Our workaround is stop pkg n node2 , start on node1 ..stop on node1 and than start on node1 work and than /sapmnt/PCP is accessible

Regards

Eric SAUBIGNAC
Honored Contributor

Re: A strange NFS (?) problem

Claudio,

Did you try option proto=udp ?
Even if it means getting on nerves of you ;-), I do insist on it

here is a small example :

file /etc/auto_master

/net -hosts -nosuid,soft
/- /etc/auto.sap

file /etc/auto.sap

/sapmnt/PCP -soft,proto=udp :/export/sapmnt/PCP
/usr/sap/trans -soft,proto=udp :/export/usr/sap/trans


Some more questions :

- I presume that /export/sapmnt/PCP is exported/unexported by the package's control script ?
- How do you export/unexport it ? Did you add lines of your own in the control script or do you use hanfs.sh ?
- what about access permissions in the export ?

- I suppose that you mount the NFS directories with the relocatable IP of the package as a target, not the official hostname ?
- Are the both nodes NFS clients for :/export/sapmnt/PCP and others directories ?

Eric

(PBFWME;)
claudio_22
Regular Advisor

Re: A strange NFS (?) problem

Hi Eric ,

yes , I use proto=udp

these wthat in auto.direct on both nodes

/sapmnt/PCP -bg,proto=udp liposcs:/export/sapmnt/PCP
/document -bg,proto=udp liposcs:/export/document
/usr/sap -bg,proto=udp liposcs:/export/usr/sap

What I've founf different between nodes is that for fs exported ../export/sapmnt/PCP
the permission for dir PCP on node1 ( that work ) are 775 root:sys while on node2 , that don't work , was 755 root:sys . I've changed to 775 .

But our opinion is that /sapmnt/PCP on node1 in accessed by some application . So tomorrow we need to make a test again while applicatin are stopped .

Thanks for your effort

Regards
claudio_22
Regular Advisor

Re: A strange NFS (?) problem

Hi Eric ,

yes , I use proto=udp

these wthat in auto.direct on both nodes

/sapmnt/PCP -bg,proto=udp liposcs:/export/sapmnt/PCP
/document -bg,proto=udp liposcs:/export/document
/usr/sap -bg,proto=udp liposcs:/export/usr/sap

What I've founf different between nodes is that for fs exported ../export/sapmnt/PCP
the permission for dir PCP on node1 ( that work ) are 775 root:sys while on node2 , that don't work , was 755 root:sys . I've changed to 775 .

But our opinion is that /sapmnt/PCP on node2 in accessed by some application . So tomorrow we need to make a test again while applicatin are stopped .

Thanks for your effort

Regards
claudio_22
Regular Advisor

Re: A strange NFS (?) problem

BTW .. yes I use hanfs.sh .. all is ok on this side , because other two fs are mounted regulary .