Re: NFS with SG

Jeff_Traigle · ‎03-27-2006

This ugly topic again. I found several threads on this topic. I remember using the NFS Toolkit in class in August. We don't have it here although we are using NFS in a couple of clusters. If I remember correctly, the Toolkit stops the daemons when the package halts. In one of our clusters, this is essentially what someone tried to duplicate in the control script. However, the problem arises that two packages in the cluster both export file systems so stopping the daemons from either package isn't allowing things to work very well.

My own thinking was that it should be sufficient to leave the NFS daemons running on both nodes at all times. The packages would simply export and unexport their particular file systems as they run and halt. All should be good in the world as the package IP addresses move from one node to the other and the exports become available again.

I tried my theory this morning and it seemed to work in a limited configuration. (i.e. I only halted and ran the package on a single node.) The client, as expected, hung processing while the package was halted and proceeded normally when the package ran again. (This was just a simple while loop to append to a file on the NFS mounted file system.)

Am I not thinking of something that would cause this not to consistently and reliably work moving the package to another node in the cluster?

--
Jeff Traigle

Dave Olker · ‎03-27-2006

Hi Jeff,

------------ DISCLAIMER -------------

I haven't given this any real thought - these are just ramblings off the top of my head, so please don't hold me or HP to any of this. I'm not recommending anyone change the logic of the NFS Toolkit scripts based on my comments. This is just a friendly discussion among peers.

------------ DISCLAIMER -------------

When you say "leave the NFS daemons running", can you be more specific about which daemons you're referring to? Are you talking about the nfsd/nfsktcpd daemons? rpc.mountd? All NFS daemons including rpc.lockd/rpc.statd?

Leaving the nfsd and nfsktcpd daemons running *might* be safe in certain situations, but there is the potential that they'll continue trying to field requests for the filesystems you're trying to migrate. I'm concerned about the nfsds holding buffer cache pages busy on the server that are associated with files on the filesystem you're trying to terminate and keeping the mount point "busy". If the filesystems are busy then they won't unmount.

Leaving rpc.mountd running is probably safe, as it should recognize that filesystems are not exported any longer and stop granting MOUNT requests to new clients. That one is *probably* safe.

As for rpc.lockd and rpc.statd, these guys have to be terminated and restarted if you're using the File Lock Migration feature of the NFS toolkit. The way this mechanism works is we copy the /var/statmon/sm entries from the primary to the adoptive node and then terminate/restart the rpc.lockd/statd daemons so that they will notify the clients using the migrated filesystem that they need to reclaim their locks. This is a critical step to making this lock migration feature work.

There are probably other scenarios I'm not thinking of at the moment that would lead to problems if the daemons aren't stopped/started as part of the package migration.

Anyway, that's my first pass at this.

Regards,

Dave

I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

Steven E. Protter · ‎03-27-2006

Shalom,

Interesting Jeff, normally you let the cluster start services. You can see the logic in Dave's comments as to why you might want to leave certain NFS services running.

Surely the nfs.core and nfs.client should be left running. This is a situation that requires a lot of testing to make sure all scenarios keep the server running the way you'd like it too.

SEP

Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

Jeff_Traigle · ‎03-27-2006

I figured I'd get Dave's attention on this one. :) Don't worry... I know this configuration is a bit loony and I wouldn't dream of holding HP or you responsible.

I guess I should also mention these are 11.11 systems I'm working with.

As for the daemons I was referring to, I meant all of them. The one control script as it is now (and has been for who knows how long before I came along) is using the nfs.server script so is killing everything it takes care of when that package halts. In addition, the package that is stopping NFS services when it halts is also shutting off nfskd (by way of the -q option). In the start sections of both scripts, they run nfs.core and nfs.server to make sure all of the necessary daemons are running.

I'll have to give this some more thought based on Dave's input too.

--
Jeff Traigle

Prashant Zanwar_4 · ‎03-27-2006

Way we practised is keeping seperate packages to activate resources, like NFS resources and it will also act as IP package.. So depending on the node on which services package need to run, you will need running daemons running or control likewise..

Hope it helps
Thanks
Prashant

"Intellect distinguishes between the possible and the impossible; reason distinguishes between the sensible and the senseless. Even the possible can be senseless."

Dave Olker · ‎03-27-2006

Just a quick comment on Jeff's post regarding nfskd.

The nfskd daemon does nothing on HP-UX. It was a placeholder daemon we created back when we were going to implement UDP kernel threads to replace the nfsd daemons. We abandoned that work and just implemented the nfsktcpd process for tracking the pools of NFS/TCP kernel threads. All of this goes away in 11i v3 when we move to a single system-wide thread pool that handles both UDP and TCP requests.

The point is nfskd should not be consuming any resources, nor should it be stopping any packages from successfully migrating. If it is, please let me know.

Regards,

Dave

I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

Jeff_Traigle · ‎03-28-2006

After thinking some more about the way this cluster is configured and Dave's ideas, I've come up with the following alternative plan.

In the current configuration without the NFS Toolkit, there's no handling of the locks at all. The server daemons are simply stopped on the active node (at least in one package) and started on the adoptive node. It doesn't look too difficult to get the lock files moved though so I might as well handle them. (Though I forgot to bring the pages into work with me this morning, I did find the nfs.flm script in the class material when I got home last night.)

Here's my second shot at a plan for dealing with these packages...

Action during package halt:

Copy /var/statmon/sm files associated with clients with locks on exports the package controls to shared disk

Kill rpc.lockd, rpc.statd, nfsd, and rpc.mountd

Unexport the file systems controlled by the package

Delete copied lock files from /var/statmon/sm

Restart rpc.lockd, rpc.statd, nfsd, and rpc.mountd

Action during package run:

Move lock files from shared disk to /var/statmon/sm

Export the file systems controlled by the package

Will the daemons need to be killed and restarted as the package starts or will they be able to handle the new lock files dynamically? Think this should reasonably handle the situation or am I missing something important still?

--
Jeff Traigle

Dave Olker · ‎03-28-2006

Jeff,

I guess I'm missing the point of why you're trying to reinvent this particular wheel. The SG NFS Toolkit with File Lock Migration seems to work very well for lots of customers and it is a tested/supported solution by HP. It seems like you're trying to implement a home-grown version of this and it's not clear to me why.

As for your plan about moving lock files around, you'd better factor in the case of what happens if the primary server panics and you don't have the opportunity to perform a clean shutdown of the filesystem and copy /var/statmon/sm files because the primary node is dead.

Regards,

Dave

I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

Dave Olker · ‎03-28-2006

One other comment - the only time rpc.lockd and rpc.statd do crash recovery notification as at daemon start time. That's why we need to populate the sm directory and then terminate and restart the daemons. That's when they look at the sm entries, move them to sm.bak and start sending crash recovery notification messages.

Also, you'd better take care not to disturb any existing sm entries on the adoptive node, otherwise you'll effectively blow away any of their locks and not notify them to recover them when you terminate/restart the daemons.

Regards,

Dave

I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

Kent Ostby · ‎03-28-2006

I would echo what Dave has said about the NFS toolkit.

Doing support of ServiceGuard setups for HP, I have seen a lot of customers go through the headache of home grown semi-NFS-toolkit solutions.

While I understand there is a cost involved in purchasing the toolkit, you also get script troubleshooting, patches for new versions of SG and the OS, etc while if you write your own, you'll be doing all of the troubleshooting, upgrading, etc on your own.

"Well, actually, she is a rocket scientist" -- Steve Martin in "Roxanne"

Jeff_Traigle · ‎03-28-2006

I'd love to use the toolkit. I don't get a thrill from recreating what others have done... especially when what they've done is supported. Hopefully we can convince the customer to pay for it since it doesn't appear to be all that expensive, especially considering they have to pay for our time to hobble something together anyway. We shall see.

--
Jeff Traigle

Jeff_Traigle · ‎04-10-2006

Thanks for all the input. I ended up needing to write my own hanfs.sh script even though I really didn't want to take that approach. Tested my implementation yesterday and it seemed to work well enough. (Certainly better than what was in place previously anyway.)

--
Jeff Traigle

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: NFS with SG

NFS with SG

Re: NFS with SG

Re: NFS with SG

Re: NFS with SG

Re: NFS with SG

Re: NFS with SG

Re: NFS with SG

Re: NFS with SG

Re: NFS with SG

Re: NFS with SG

Re: NFS with SG

Re: NFS with SG