System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

NFS locking problems - older 11 system

JeremyinNC
Occasional Visitor

NFS locking problems - older 11 system

I am having trouble starting an Oracle database that is being accessed via an NFS mount on a NetApp filer. When I try to bring it up there are complaints which Oracle support says point to a file locking problem. I am not an HPUX guy so I may be looking in the wrong place.

I've enabled the debug logging and I see the following messages in my rpc.statd.log file but I am not certain they are related, the host is reachable (I can mount it just fine) but I don't know where to go to find out what's calling statd to cause these errors, I've removed the entries from sm.bak but they keep reappearing.

Here is what I'm seeing:
06.18 11:19:17 hpssv218 pid=925 rpc.statd
call_tcp[donerail, 100024, 1, 6] returns 5
06.18 11:19:17 hpssv218 pid=925 rpc.statd
statd_call_statd notifying system regret about hpssv218
06.18 11:19:17 hpssv218 pid=925 rpc.statd
Addr get response timedout:deleting host =>> regret
06.18 11:19:17 hpssv218 pid=925 rpc.statd
delete hash entry (400ef860), regret, 100024, 1
06.18 11:19:17 hpssv218 pid=925 rpc.statd
add hash entry (400ef860), regret, 186b8, 1
06.18 11:19:17 hpssv218 pid=925 rpc.statd
(400ef860):[regret, 100024, 1] is a new connection
06.18 11:19:17 hpssv218 pid=925 rpc.statd
call_tcp[regret, 100024, 1, 6] returns 5
06.18 11:19:32 hpssv218 pid=925 rpc.statd
enter sm_try: recovery_q = donerail
06.18 11:19:32 hpssv218 pid=925 rpc.statd
statd_call_statd notifying system donerail about hpssv218
06.18 11:19:32 hpssv218 pid=925 rpc.statd
Addr get response timedout:deleting host =>> donerail


6 REPLIES
Steven E. Protter
Exalted Contributor

Re: NFS locking problems - older 11 system

Shalom,

Until HP-UX 11iv3 11.31 NFS 4 was not introduced. The locking mechanism on NFS versions prior to this is really not robust enough to handle this situation.

This is typical of how locking performs on older versions of NFS.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Dave Olker
HPE Pro

Re: NFS locking problems - older 11 system

Oh brother...

NFS v3 locking works just fine. We've got thousands of customers using it every day, and there's way more customers running NFS v3 than v4.

Yes, sometimes there's a configuration problem that causes file locking to hang and it need to be identified and resolved. Usually it has to do with hostname resolution on the client and/or server.

The RC and WTEC groups have a cookbook to collect data for NFS file locking issues (I know because I wrote it). I'm sure they can help you collect the data you need to root-cause the file locking problem - and I guarantee the answer is not going to be move to NFS v4.

Regards,

Dave
JeremyinNC
Occasional Visitor

Re: NFS locking problems - older 11 system

I am pretty sure NFSv3 allows file locking, this system's been up since 2001 or so :)

Is there a checklist I can follow?

I've read a PDF on NFS on HP11i performance tuning that's helped some but this box has been heavily "customized" and I'm terrified of knocking down the house of cards.

I moved the files to another NFS server and re-mounted there, if that does not work maybe I'll give someone at HP a call.
Dave Olker
HPE Pro

Re: NFS locking problems - older 11 system

Ok, let's go back to the start.

> When I try to bring it up there are
> complaints which Oracle support says point
> to a file locking problem.

What "complaints" are you seeing? What Oracle documentation shows these complaints are caused by a file locking problem? What explanation are they offering?

Troubleshooting a file locking problem is usually something done on both client and server, which makes it a little more difficult with a NetApp filer because I'm not aware of how to enable debug rpc.lockd logging on a filer.

You obviously know how to enable debug logging on the HP-UX side since you attached a snippet of an rpc.statd debug log. What I'd suggest is enable debug logging on both rpc.lockd and rpc.statd, reproduce the problem, disable debug logging on both daemons and then provide the debug logfiles - either to HP support or post them here and I can have a look.

Again, this may not tell us everything because you're only seeing one half of the needed information (like listening to one side of a phone conversation and trying to infer what's happening on the other end) but in some cases looking at the client's log files can tell us enough about what's going on.

Regards,

Dave
JeremyinNC
Occasional Visitor

Re: NFS locking problems - older 11 system

I actually copied all the files over to a Fedora 12 system and mounted them from there instead. The error I am seeing is:

ORA-00600: internal error code, arguments:[2806], [60]

It's still starting though, I forgot to set up the Fedora box for NIS and the HPUX box is too old to speak LDAP (at least it's not configured to do so) so they where not playing well permissions wise. It's attempting to start now.
Yeow Yiew Choong
Occasional Visitor

Re: NFS locking problems - older 11 system

On your HP-UX, you can try this command:

Remove all locks on local NFS server for client, client1:

# clear_locks client1

Remove all locks created by the local client system and held by the remote NFS server, server1:

# clear_locks -s server1

Hope this help. You can refer the manual using man clear_locks.