Cause of rpc.lockd failing

Emerson Valley · ‎04-01-2003

We have a problem with rpc.lockd failing. The process is still running and responds to rpcinfo commands but file locking does not work.

This leads to many problems. Dassault Systemes Catia (Most major problem - No work around) locks up because it cannot lock the files it needs or has lost the locks. Ksh's history file fails because we store it in the users $HOME/.sh_history rather than /var/tmp or some other local directory. Netscape Composer and dtmail both freeze because they cannot lock their temp files.

It is a simple matter to kill the processes (rpc.lockd & rpc.statd) and restart them. I have written a script to do this so the help desk can do it without needing to wait for me to get to it. I am attempting to write a script that will monitor/restart but as stated above rpcinfo reports the daemons as healthy and any attempt to write a program in C that locks files just hang and therfore does not provide an error code to my scripts.

We have patched up to PHNE_28102 - ONC/NFS General Release/Performance Patch and its dependencies.

Emerson Valley
Advanced System Administrator
EDS - Delphi Safety and Interior Systems
email: emerson.valley@eds.com
phone: (248) 655-0639 - fax: (248) 655-8285

Ajit Natarajan · ‎04-01-2003

This problem occurs if the client loses the contents of /var/statmon/sm during a reboot or other event.

Take a look at clear_locks(1M). It is a command to be run on the client to fix this issue. The patch that you have installed has this command and its man page.

Thanks.

Ajit
HP Gigabit Ethernet

Emerson Valley · ‎04-02-2003

The problem is is a server side issue because when the process fails. ALL clients immediately stop being able to lock files which of course cripples the apps staed above.

I will however try running the clear_locks and report my results.

Brian Hackley · ‎04-02-2003

Emerson,

Suggestion: ensure any/all NFS clients are at latest lockd patch levels as well as the NFS Server.

Here are some tips for general NFS File locking debugging. Try these if the clear_locks fails to get you any progress.

rpc.lockd and rpc.statd are subject to DNS lookup hangs.
On the NFS Server, set the /etc/resolv.conf
retrans and retry options
(see resolver man page)
I usually use
retrans 1000 ...or 2000
retry 2

This cuts down the length of time waiting on resolver queries to map an IP to a hostname.

In addition, start some data collection on the rpc.lockd on the NFS Server. Use kill -17 to turn on debug mode; it will log to /var/adm/rpc.lockd.log by default. Another kill -17 will turn it off. I recommend that you ensure plenty of space in /var/adm, and then start the logging BEFORE a problem appears. Even better, setup a nettl trace. As soon as the problem is detected for the first time from a client, stop the nettl trace, and shut off the debug logging.
You'll have a boatload of data to swim through, but you do stand a good chance of capturing the problem.

Hope all this helps,

-> Brian Hackley

Ask me about telecommuting!

Emerson Valley · ‎04-07-2003

Thanks Brian. I will give the logging option a shot. i cannot do a nettl however because I do not know what triggers the crap-out and those trace files get huge very fast!

-Em

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Cause of rpc.lockd failing

Cause of rpc.lockd failing

Re: Cause of rpc.lockd failing

Re: Cause of rpc.lockd failing

Re: Cause of rpc.lockd failing

Re: Cause of rpc.lockd failing