Operating System - Tru64 Unix
1831228 Members
2914 Online
110021 Solutions
New Discussion

NFS lock problems

 
David Lesny
Occasional Contributor

NFS lock problems

I recently started getting the following errors on an NFS client node called mx0. This node mounts the home disks for our users which reside on fx. Both nodes are running 5.1B PK4.

Jan 14 10:53:52 mx0 lockd[510]: Can't create client handle to fx.hep.uiuc.edu NLMv4: RPC: Port mapper failure - RPC: Timed out
Jan 14 10:53:52 mx0 lockd[510]: call_rpc to fx.hep.uiuc.edu NLMv4 failed: RPC: Port mapper failure

When this occurs, the processes accessing the files on fx hang in the U state. I assume they are waiting on the remote locking.

Are there any parameters, config changes etc,
that I can make to fix this problem which just started in the last few weeks?

thanks, dave
6 REPLIES 6
Johan Brusche
Honored Contributor

Re: NFS lock problems


Do you have portmap, rpc.lockd and rpc.statd running on bot sides ? Is automount or autofs involved here ?

Things to check:

rpcinfo -p fx | grep -e map -e lock
rpcinfo -p mx | grep -e map -e lock

rpcinfo -u localhost 100021 1
rpcinfo -u localhost 100021 3

__ Johan.


_JB_
David Lesny
Occasional Contributor

Re: NFS lock problems

Portmap, rpc.lockd and rpc.statd are running on both nodes when the error occurs. We do not use automount nor autofs. All file system are mounted via fstab entries with the following options on MX

rw,hard,intr,rsize=8192,wsize=8192


The requested output, when an error has not occured is

FX
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100021 1 tcp 1025 nlockmgr
100021 2 tcp 1025 nlockmgr
100021 3 tcp 1025 nlockmgr
100021 4 tcp 1025 nlockmgr
100020 3 tcp 1025 llockmgr
100021 1 udp 1204 nlockmgr
100021 2 udp 1204 nlockmgr
100021 3 udp 1204 nlockmgr
100021 4 udp 1204 nlockmgr
100020 3 udp 1204 llockmgr

MX
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100021 1 tcp 3024 nlockmgr
100021 2 tcp 3024 nlockmgr
100021 3 tcp 3024 nlockmgr
100021 4 tcp 3024 nlockmgr
100020 3 tcp 3024 llockmgr
100021 1 udp 1726 nlockmgr
100021 2 udp 1726 nlockmgr
100021 3 udp 1726 nlockmgr
100021 4 udp 1726 nlockmgr
100020 3 udp 1726 llockmgr

Localhost
program 100021 version 1 ready and waiting
program 100021 version 3 ready and waiting

The next time the error occur, I will post the output from the same commands.

When I get the RPC error, if I reboot the node, the problem may reoccur in 5 minute, 5 days, 5 weeks, etc. It is not something I can reproduce at will.

thanks for your help,

dave


Ralf Puchner
Honored Contributor

Re: NFS lock problems

there could be different reasons:
- sporadic host resolution problem
- network outage

the only chance to get a conclusion what is going on is to use a sniffer and check communication. Or we can guess....

Help() { FirstReadManual(urgently); Go_to_it;; }
David Lesny
Occasional Contributor

Re: NFS lock problems

The network has been reliable and stable.
The switch is an HP2848 and the NICs in each node are DE602. No network errors have been logged on either node nor the switch.

On the name resolution, to eliminate that as a possibility, today I changed nameserver the MX node does its resolving.

thansk, dave
David Lesny
Occasional Contributor

Re: NFS lock problems

Today I have had numerious lock manager problems. When the MX node is in the messed up state, rpcinfo on the local host returns a timeout

rpcinfo -u localhost 100021 1
program 100021 version 1 is not available
rpcinfo -u localhost 100021 3
program 100021 version 3 is not available

A scan of the local host however does show the nfslock manager running

rpcinfo -p mx0.dmz.hep.uiuc.edu | grep -e map -e lock

100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100021 1 tcp 1025 nlockmgr
100021 2 tcp 1025 nlockmgr
100021 3 tcp 1025 nlockmgr
100021 4 tcp 1025 nlockmgr
100020 3 tcp 1025 llockmgr
100021 1 udp 1031 nlockmgr
100021 2 udp 1031 nlockmgr
100021 3 udp 1031 nlockmgr
100021 4 udp 1031 nlockmgr
100020 3 udp 1031 llockmgr

If I stop NFS, kill the hung jobs, remount the NFS file systems, and restart NFS, the lockmanger will then work.

Is there anything I can look at to see why the local lock manager is hanging?

thanks, dave






Ralf Puchner
Honored Contributor

Re: NFS lock problems

Dave,

this indicates a network problem. But if you will not troubleshoot it or use a network monitor then we can discuss forever....
Help() { FirstReadManual(urgently); Go_to_it;; }