1832858 Members
3616 Online
110048 Solutions
New Discussion

problem with lockd

 
SOLVED
Go to solution
Guettache
Advisor

problem with lockd

Hi,
A Mac client (OS 10.5 or leopard) of our HP server has ramdom problems mounting the Hp-ux shared folders using NFS. By ramdom, I mean the client can sometimes mount them and sometimes not and gets in this case the following message on the Mac system log:
nfs server xxxx: /share : lockd is not reponding.
The hp server is running hp-ux 11.0
Any idea?

Thanks,
Narimane
13 REPLIES 13
skt_skt
Honored Contributor

Re: problem with lockd

Debug logging can be toggled on and off by sending the SIGUSR2 signal to the running lockd or statd process(kill -17 pid).By default , the dubug information is logged to the /var/adm/rpc/lockd.log and /var/adm/rpc.statd.log files respectively.


My first recommendation would be as follows:

1) Get a listing of the nodes in /var/statmon/sm:

# ll /var/statmon/sm

2) Get a listing of nodes in /var/statmon/sm.bak:

# ll /var/statmon/sm.bak

3) Collect a debug logfile from rpc.statd and rpc.lockd

# ps -ef | grep rpc
# kill -17
# kill -17

wait 30 seconds

# kill -17
# kill -17

I'd want to examine the debug logfile from rpc.statd to see what it's doing before killing and restarting it, or just killing/restarting it might not be enough to clear the race condition.

Let me know if you need any help interpreting the data.
======
If the servers in /var/statmon/sm.bak are permanently removed then here is what you should do:

1) Kill rpc.statd and rpc.lockd

# kill $(ps -e | egrep 'rpc.statd|rpc.lockd' | awk '{print $1}')

2) Remove any entries from /var/statmon/sm.bak for systems that are permanently gone from your environment

3) Restart rpc.statd and rpc.lockd

# /usr/sbin/rpc.statd
# /usr/sbin/rpc.lockd
==========
Guettache
Advisor

Re: problem with lockd

Hi,

thanks for your reply.

I did enable the logging. So fare the Mac client could mount the hp-ux nfs shared folders. As soon as the problem occurs , I will post the logged data.
I keep the logging enabled.

cheers,
David Nixon
Valued Contributor

Re: problem with lockd

Mac NFS locking is not compatible
with an HP NFS server. Under Tiger
it appeared to work, but didn't: for Leopard
they came up with the NFS mount option 'locallocks' - which does locking in the Mac kernel. This "fudge" will fix problems caused by locks hanging, but the
HP server won't know about such locks. So.
your MAC client applications cannot rely on NFS locking..
Dave Olker
Neighborhood Moderator

Re: problem with lockd

Hi David,

> Mac NFS locking is not compatible with an HP NFS server.

I hadn't heard of this one. Do you have any details on why locking is incompatible with HP-UX? Do you, or anyone else, have a reproduction environment where we can collect some data to understand why locking fails between these systems?

Thanks,

Dave


I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
David Nixon
Valued Contributor
Solution

Re: problem with lockd

Hi Dave,
To summarise my research of the matter:
Before Panther OSX there was no NFS locking.
The OSX NFS implementaion is based on that of FreeBSD 5.1
File locking is incompatible with "non-BSD"
NFS servers. So the problems may also exist with Linux servers.
During tests with HP-UX 11.11 and an OSX Tiger client I found that locks seemingly could be obtained on files - but that didn't prevent another process obtaining a lock.
Also their new 'launchd' mechanism for starting the portmapper was unreliable.

The 'locallocks' work-around is new with Leopard.. I could certainly look
at what ensues without this mount option.

Dave.
Dave Olker
Neighborhood Moderator

Re: problem with lockd

Hi Dave,

Please forgive my ignorance, but what is the significance of "non-BSD" servers when it comes to locking? Is it that Mac clients need to use a specific type of lock, like synchronous locks vs. asynchronous locks? Are they expecting to find the lock daemon running on a specific port? Something else? I've never played with a Mac before so I apologize if I'm asking dumb questions.

Whatever the case, I'd be interested if there was a way to duplicate the problem so I could collect some network traces and rpc.lockd traces on the HP-UX side to see how it deals with whatever lock requests the Mac is issuing. I'd also be interested to see if an 11i v3 server behaves differently because we completely replaced the Network Lock Manager implementation on 11i v3 with a much newer one that might interoperate better with the Mac.

In addition, we're getting ready to release some new rpc.lockd functionality on our 11i v2 systems to help us interoperate better with Linux and Windows systems, so I wonder if those changes would also help us interoperate with the Mac.

If you have a reproduction environment where you can duplicate these lock failures I'd be interested in looking at some data.

Thanks,

Dave


I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
David Nixon
Valued Contributor

Re: problem with lockd

I know that Mac OSX uses Posix advisory locks - but never found evidence of locking working with anything other than an OSX/BSD server..

Will shortly be configuring a Leopard client to use Autofs (a new feature with Leopard).
So could collect data from its lock requests to an 11i v1 server.

Guettache
Advisor

Re: problem with lockd

Hi,

Thanks for the useful replies.

I attach the logfile of the hp-ux rpc.lockd daemon. I hope you can find useful info.

So far, when the Mac user fails to nfs mount the hp-ux shares, restarting the statd and lockd daemons seems to be a work around.

Cheers,
David Nixon
Valued Contributor

Re: problem with lockd

Hi Guettache,
For a better work-around
add 'locallocks' to the line of
mount options in /etc/autofs.conf
Dave Olker
Neighborhood Moderator

Re: problem with lockd

Hi Narimane,

I looked at the log file and it's not enough information to show me the problem. The logfile contains lots of successful lock requests followed by lots of successful unlock reqeusts. Things appear mostly normal in the logfile.

At some points in the logfile I see the client sending in the same UNLOCK request over and over and over again. The logfile merely shows the server had already granted the UNLOCK and removed the lock from it's queues, so every time it gets another UNLOCK request it simply sends back the same successful status to the client.

As for why the client doesn't accept these replies and continues to send the same UNLOCK requests over and over again - the logfile doesn't show that. I'd need to see a network trace taken on either the Mac client or the HP-UX server during the application failure to see why the client doesn't like the server's replies.

If you have the ability to collect a network trace during the failure, it would be helpful to have both a network trace *AND* a debug rpc.lockd logfile of the same duplication event. That way I can sync up what I see in the logfile with what I see in the network trace.

Regards,

Dave


I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
Guettache
Advisor

Re: problem with lockd

Hi Dave,

What I can post now is only statd and lockd logs.
May be this can help.

I post first the statd log.


Narimane
Guettache
Advisor

Re: problem with lockd

Hee is the lockd log

Cheers,

Narimane
Dave Olker
Neighborhood Moderator

Re: problem with lockd

Hello Narimane,

I looked at the log files and this looks like a hostname resolution problem. From the rpc.statd log file:

03.18 11:13:38 hp_server pid=17366 /usr/sbin/rpc.statd
proc sm_prog_1: Program=100024, Procedure=2, Version=1
03.18 11:13:38 hp_server pid=17366 /usr/sbin/rpc.statd
proc insert_mon: called for host Geoff-Macbook-Pro.local
03.18 11:13:38 hp_server pid=17366 /usr/sbin/rpc.statd
proc insert_mon: gethostbyname() of Geoff-Macbook-Pro.local failed
03.18 11:13:38 hp_server pid=17366 /usr/sbin/rpc.statd
proc sm_mon_1: mon_name=Geoff-Macbook-Pro.local, res_state=1, state=-1


The rpc.statd is trying to resolve the hostname "Geoff-Macbook-Pro.local" and cannot successfully do so. It logs the error "gethostbyname() failed" and the request returns an error. Looking at the rpc.lockd log file:


03.18 11:15:18 hp_server pid=17372 /usr/sbin/rpc.lockd
call_udp[hp_server, 100024, 1, 2] returns 0
03.18 11:15:18 hp_server pid=17372 /usr/sbin/rpc.lockd
/usr/sbin/rpc.lockd: !site: Geoff-Macbook-Pro.local not subscribe to status monitor service
03.18 11:15:18 hp_server pid=17372 /usr/sbin/rpc.lockd
/usr/sbin/rpc.lockd: req discard due status monitor problem !


rpc.lockd is reporting that since it cannot register this host with rpc.statd it discards the lock request.

Can you please make sure the NFS server is able to successfully resolve the hostname "Geoff-Macbook-Pro.local" to the correct IP address and let me know if this resolves the problem.

Thanks,

Dave


I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo