Operating System - HP-UX
1832343 Members
2944 Online
110041 Solutions
New Discussion

Re: Service Guard and NFS Issues

 
SOLVED
Go to solution
Marco Santerre
Honored Contributor

Service Guard and NFS Issues

Hello gurus,

I have a bit of a situation that I can't seem to find a way out.

Though I understand that NFS is flaky, I do not have a choice as much as I have to use it.

I have a two-node cluster running a SAP package with a DB on one node and the CI on the other node.

The problem I have with NFS is that ServiceGuard exports 2-3 file systems and they are automounted by the different servers. Naturally, when I kill the packages, if the file systems are mounted (which is almost always the case), /etc/mnttab still shows the NFS file system and therefore hangs bdf and often enough backups as well. If I unmount the NFS file system on all nodes prior to halting the packages, my DB package has a problem coming down because it uses one of those file systems. As you see, it is kind of a catch-22.

Is there a way to make sure that bdf won't hang if I bring down my packages? Or to remove those NFS file system from /etc/mnttab?

Any help is appreciated.
Cooperation is doing with a smile what you have to do anyhow.
23 REPLIES 23
Elmar P. Kolkman
Honored Contributor

Re: Service Guard and NFS Issues

Why not add the NFS mounts in the mount list of the packages? That way they are always mounted after starting the packages and always unmounted when stopping them.
Only remember to stop the NFS server package last...
Every problem has at least one solution. Only some solutions are harder to find.
malki_3
Frequent Advisor

Re: Service Guard and NFS Issues

Why you don't use NFS toolkit for ServiceGiard. It's a way to export and import FS under cluster configuration.
Kent Ostby
Honored Contributor

Re: Service Guard and NFS Issues

Would it be possible to convert the setup so that the DB package used a shared filesystem rather than an NFS filesystem ?

Not sure what your DB package is doing with that filesystem, but if its relatively minor (in size usage) then a shared filesystem would at least eliminate one dependency between the NFS and the DB package.

Best regards,

Kent M. Ostby


"Well, actually, she is a rocket scientist" -- Steve Martin in "Roxanne"
melvyn burnard
Honored Contributor

Re: Service Guard and NFS Issues

Are you using the Serviceguard Extensions for SAP to run the db and ci packages? If so, you must use the NFS Toolkit to do hte NFS side.
There is also s specific way of cinfiguring hte SAP instances to mount these exported NSF file Systems to prevent this "hanging" scenario.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Marco Santerre
Honored Contributor

Re: Service Guard and NFS Issues

OK, I see that I forgot some details, and I apologize for that.

Yes I am using the SAP extension toolkit for SG. And yes, I am also using hanfs.

hanfs has been configured to export all of the said file systems. The problem once again lies with the mounted NFS file system, and not the exported file system.

Eg. : DB package exports /export/usr/sap/trans. Node 1, Node 2, Node 3 will automount (using /etc/auto.direct) node1floatingaddress:/export/usr/sap/trans in /usr/sap/trans.

If my DB package comes down, /etc/mnttab will still show node1floatingaddress:/export/usr/sap/trans /usr/sap/trans and will cause the hang. But, on Node 1, I will see that /export/usr/sap/trans is now unmounted.
Cooperation is doing with a smile what you have to do anyhow.

Re: Service Guard and NFS Issues

So when you haly your cluster for maintenance you should always halt the CI package first... then the DB package.

If the DB cluster is just 'failing over' rather than being stopped for maintenance then the NFS mounts should re-establish themselves when the package restarts *as long as* you are using the UDP protocol for NFS. (NFS v3 defaults to TCP, you need to change this to UDP by specifying '-proto udp' in the mount options in your automount maps)

HTH

Duncan

I am an HPE Employee
Accept or Kudo
Marco Santerre
Honored Contributor

Re: Service Guard and NFS Issues

I didn't know about the UDP option, so it is interesting because I do not have that set. THe problem doesn't lie so much in the coming back up and re-establishing the link of the NFS but more when it comes down. Basically I'd like NFS to time out (or something similar) and unmount itself from the mnttab so that my backups don't have any issues, and that my bdf doesn't hang.
Cooperation is doing with a smile what you have to do anyhow.

Re: Service Guard and NFS Issues

So are you stopping your DB package in order to do backups? Why not just run a stopdb, and leave the package itself up?

HTH

Duncan

I am an HPE Employee
Accept or Kudo
Marco Santerre
Honored Contributor

Re: Service Guard and NFS Issues

Usually, yes. I do a stopdb and leave the package up and running. But when I do maintenance, I have to bring packages down. And when I do, if I just issue a normal host backup, it hangs, because there are some NFS mount points still in /etc/mnttab because the package brought down the floating IP address from which the NFS mount point is loaded from.

Cooperation is doing with a smile what you have to do anyhow.
Marco Santerre
Honored Contributor

Re: Service Guard and NFS Issues

I hate to do this, but in case people were on vacation and might have some new ideas..
Cooperation is doing with a smile what you have to do anyhow.
Geoff Wild
Honored Contributor

Re: Service Guard and NFS Issues

Are the boxes 11i?

Is AUTOFS set to 1 in:

/etc/rc.config.d/nfsconf

I have somewhat the same setup - except we merged CI with DB. Also have 3 App servers as well as a QA amd Dev server which automount the trans filesystem among others.

NFS still hangs sometimes...but not too often...

Rgds...Geoff



Proverbs 3:5,6 Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make all your paths straight.
Ashwani Kashyap
Honored Contributor

Re: Service Guard and NFS Issues

what kind of nfs mount options are you using , hard or soft .

soft mounts should timeout after a few tries .
Marco Santerre
Honored Contributor

Re: Service Guard and NFS Issues

Geoff,

Yes I'm at 11i and yes I'm using autofs.

Ashwani,

Interesting. I don't think I'm using any options at all which would assume I'm using a hard mount. What are the impacts of placing it in a soft mount? and if it does timeout, does it get removed from /etc/mnttab?
Cooperation is doing with a smile what you have to do anyhow.
Massimo Bianchi
Honored Contributor

Re: Service Guard and NFS Issues

Take care with hard and soft mounting. Hard mount must be used if you are sharing executable files, otherwise you will have some unexpected core dumps if there is any network problem.

soft mount should be used just for the trans, since transports are not so critical....

Massimo
Marco Santerre
Honored Contributor

Re: Service Guard and NFS Issues

ok so that means I should be using a soft mount then on sapmnt for example?
Cooperation is doing with a smile what you have to do anyhow.
Marco Santerre
Honored Contributor

Re: Service Guard and NFS Issues

Reading through the different NFS Mount options.. Could I keep a hard mount but yet put a retry in my option list, would it remove it from /etc/mnttab?
Cooperation is doing with a smile what you have to do anyhow.
Ashwani Kashyap
Honored Contributor

Re: Service Guard and NFS Issues

The difference between hard and soft mount is that hard mounts continues to retry the requests until the mount point is available whereas soft mounts returns an error after the maximum number of retransmits is reached .

You can experiment with the retry and timeo options with the soft mounts .

I do not know whether it removes the soft mount from mnttab after max retries are reached . I will have to try that .
Massimo Bianchi
Honored Contributor

Re: Service Guard and NFS Issues

sapmnt has the executables on it (/sapmnt/SID/exe/*), it is THE vital directory for sap. it needs HARD nfs_mount

the retry count is only meaningfull for soft mounts, for hard mount the autofs will keep trying forever.

Massimo

Ashwani Kashyap
Honored Contributor

Re: Service Guard and NFS Issues

OK I tested it .

I exported a file system from an UX 10.20 box and soft mounted it on a UX 11.11 box .

THen I unexported the file system from the 10.20 box .

On the 11.11 box , bdf reported and error but completed , and did not hang . But the entry was still there in the mnttab .

Now I umount the nfs directory on 11.11 . Completed without any errors and removed the entry from the mnttab .
Marco Santerre
Honored Contributor

Re: Service Guard and NFS Issues

So if I understand correctly, Massimo, I'm back to my original problem, where I have to keep a hard mounted NFS directory, and AutoFS will keep trying forever.

So, when I'm in maintenance mode, basically, /etc/mnttab will retain that particular NFS mount directory and "hang" until the floating address is back up again.

Am I right?
Cooperation is doing with a smile what you have to do anyhow.
Ashwani Kashyap
Honored Contributor
Solution

Re: Service Guard and NFS Issues

In my setup here , I have the CI and the DB on the same nodes and the app servers NFS mounts the exported directories .

I do not use SAP extensions and all my nfs mounts are soft mounts with a timeo of 20 and I seldom encountered problems with NFS .
Massimo Bianchi
Honored Contributor

Re: Service Guard and NFS Issues

When you are in maintenace, you should do a clean umount of the filesystem, and create an empty /etc/auto.direct file.

After that, have the autofs re-read the configuration. kill -HUP

undo all after the maintenance.

In the event of an unplanned maintenance, you could always bring upo the virtual ip by hand, export, umount the fs, and clead it by hand.

ifconfig lanx:y virtual_ip ....
exportfs -i /export/sapmnt...



I used this logic sometimes, when things go wrong...

HTH,
Massimo
Marco Santerre
Honored Contributor

Re: Service Guard and NFS Issues

I may be tempted to try the soft mounts, even though it is discouraged just so that I can have the timeouts in place.

I realize it's not the best solution. I was hoping for something a little more robust, but at the same time, unmounting the NFS prior to bringing the package down also creates problems.

Since I have redundancy on my network, I may be inclined to work with the core dumps if they don't happen too often.
Cooperation is doing with a smile what you have to do anyhow.