Operating System - HP-UX
1833757 Members
2876 Online
110063 Solutions
New Discussion

Cause of rpc.lockd failing

 
SOLVED
Go to solution
Emerson Valley
Occasional Advisor

Cause of rpc.lockd failing

We have a problem with rpc.lockd failing. The process is still running and responds to rpcinfo commands but file locking does not work.

This leads to many problems. Dassault Systemes Catia (Most major problem - No work around) locks up because it cannot lock the files it needs or has lost the locks. Ksh's history file fails because we store it in the users $HOME/.sh_history rather than /var/tmp or some other local directory. Netscape Composer and dtmail both freeze because they cannot lock their temp files.

It is a simple matter to kill the processes (rpc.lockd & rpc.statd) and restart them. I have written a script to do this so the help desk can do it without needing to wait for me to get to it. I am attempting to write a script that will monitor/restart but as stated above rpcinfo reports the daemons as healthy and any attempt to write a program in C that locks files just hang and therfore does not provide an error code to my scripts.

We have patched up to PHNE_28102 - ONC/NFS General Release/Performance Patch and its dependencies.

Emerson Valley
Advanced System Administrator
EDS - Delphi Safety and Interior Systems
email: emerson.valley@eds.com
phone: (248) 655-0639 - fax: (248) 655-8285

4 REPLIES 4
Ajit Natarajan
Valued Contributor

Re: Cause of rpc.lockd failing

This problem occurs if the client loses the contents of /var/statmon/sm during a reboot or other event.

Take a look at clear_locks(1M). It is a command to be run on the client to fix this issue. The patch that you have installed has this command and its man page.

Thanks.

Ajit
HP Gigabit Ethernet
Emerson Valley
Occasional Advisor

Re: Cause of rpc.lockd failing

The problem is is a server side issue because when the process fails. ALL clients immediately stop being able to lock files which of course cripples the apps staed above.

I will however try running the clear_locks and report my results.
Brian Hackley
Honored Contributor
Solution

Re: Cause of rpc.lockd failing

Emerson,

Suggestion: ensure any/all NFS clients are at latest lockd patch levels as well as the NFS Server.

Here are some tips for general NFS File locking debugging. Try these if the clear_locks fails to get you any progress.

rpc.lockd and rpc.statd are subject to DNS lookup hangs.
On the NFS Server, set the /etc/resolv.conf
retrans and retry options
(see resolver man page)
I usually use
retrans 1000 ...or 2000
retry 2

This cuts down the length of time waiting on resolver queries to map an IP to a hostname.

In addition, start some data collection on the rpc.lockd on the NFS Server. Use kill -17 to turn on debug mode; it will log to /var/adm/rpc.lockd.log by default. Another kill -17 will turn it off. I recommend that you ensure plenty of space in /var/adm, and then start the logging BEFORE a problem appears. Even better, setup a nettl trace. As soon as the problem is detected for the first time from a client, stop the nettl trace, and shut off the debug logging.
You'll have a boatload of data to swim through, but you do stand a good chance of capturing the problem.

Hope all this helps,

-> Brian Hackley
Ask me about telecommuting!
Emerson Valley
Occasional Advisor

Re: Cause of rpc.lockd failing

Thanks Brian. I will give the logging option a shot. i cannot do a nettl however because I do not know what triggers the crap-out and those trace files get huge very fast!

-Em