Networking
cancel
Showing results for 
Search instead for 
Did you mean: 

NFS lockd issue 11iv3 & redhat linux AS

SOLVED
Go to solution
wvsa
Regular Advisor

NFS lockd issue 11iv3 & redhat linux AS

Good afternoon all;

Having a strange problem with NFS. We have a rx6600 mounting two directories to a redhat linux server. Here are the dfstab mount options:

share -F nfs -o sec=sys,anon=0 /nsp_mart
share -F nfs -o sec=sys,anon=0 /sas_bi

On the redhat linux box we mount the directories as follows:

romans02:/nsp_mart /nsp_mart nfs bg,soft 0 0

romans02:/sas_bi /sas_bi nfs bg,soft 0 0


With these options I can log on as a user and read and write a file under the /nsp_mart directory.

However in running a application (SAS) the application hangs when it attempts to open or write a file to directories under /nsp_mart.

The SAS application will work if it is run in the following manner:

sas -filelocks none

Should mention that SAS is run from the redhat linux server and is attempting to read and write files to the /nsp_mart directory. At any rate when the -filelocks none option is used all is well SAS writes and reads from the /nsp_mart directory.


My question is this is there a problem with lockd on the hpux 11iv3 server?

I have cleared the locks using the clear_lock command Have kctune klm_log_level=9 to see if there are any errors in syslog, no messages appear in syslog.

Not sure how to proceed, see that lockd is on port 4045 and running lsof -i -P have never seen this port being used.

Not sure what is causing this problem would greatly appreciate any input.

Thank you!

Norm

27 REPLIES
Dave Olker
HPE Pro
Solution

Re: NFS lockd issue 11iv3 & redhat linux AS

Hi Norm,

So are you saying that you enabled debug KLM logging and reproduced the hang but never saw any messages related to KLM? That tells me this may not be a locking problem. Have you tried collecting a network trace while reproducing the problem? I'd suggest collecting a nettl trace or Wireshark trace on the rx6600 while reproducing the problem. Once the hang occurs, let it hang for a few seconds then stop the trace. The trace will hopefully show what over-the-wire requests were happening when the hang happened.

Regards,

Dave
wvsa
Regular Advisor

Re: NFS lockd issue 11iv3 & redhat linux AS

Hello Dave;

Thank you for responding. Once I figure out how to get nettl trace and/or wireshark configured and running will let you know what I find.


Norm
Dave Olker
HPE Pro

Re: NFS lockd issue 11iv3 & redhat linux AS

Here's what I'd suggest on the HP-UX NFS server:

1) Turn on nettl tracing:
# /usr/sbin/nettl -tn pduin pduout loopback -e ns_ls_ip -f

2) Reproduce the hanging application

3) Turn off nettl tracing:
# /usr/sbin/nettl -tf -e all

This will create one or two files using the name you supplied in step 1 and the suffix ".TRC000."

Once you have this file you can load it into Wireshark (yes, Wireshark can directly read HP-UX nettl format) and filter on the IP address of the client, or NFS packets, or KLM packets, etc.

Hope this helps,

Dave

wvsa
Regular Advisor

Re: NFS lockd issue 11iv3 & redhat linux AS

David;

Thank you for your responses. Will running trace this morning. Would you be willing to be another set of eyes and look at the trace. Will provide all the necessary background data.


Thanks again


Norm
Dave Olker
HPE Pro

Re: NFS lockd issue 11iv3 & redhat linux AS

Sure Norm.

I'd want the *raw* trace files, not any formatted stuff. I'd also want the IP addresses of the NFS client and server. You can either post the trace files to this thread, or if you're not comfortable with that (I wouldn't blame you) you can send the trace files to me directly: dave.olker@hp.com.

Regards,

Dave
Duncan Edmonstone
Honored Contributor

Re: NFS lockd issue 11iv3 & redhat linux AS

Dave,

Once you have the emails from Norm, you might want to ask a forums admin (such as Melvyn) to edit your post and take out your email address before the dreaded spam-bots strike...

Duncan

HTH

Duncan
Dave Olker
HPE Pro

Re: NFS lockd issue 11iv3 & redhat linux AS

Hi Duncan,

I'm a moderator of the forums so I can edit any post. I appreciate your concern, but I have no problem listing my HP email here. I encourage HP customers to contact me, and many do. :)

Dave
Dave Olker
HPE Pro

Re: NFS lockd issue 11iv3 & redhat linux AS

Hi Norm,

I looked at the first round of nettl traces you sent me and here's the concerning thing I see in the trace:


172.17.6.19 -> 172.17.6.29 Portmap V2 GETPORT Call NLM(100021) V:4 TCP
172.17.6.29 -> 172.17.6.19 Portmap V2 GETPORT Reply (Call In 27711) PROGRAM_NOT_AVAILABLE
172.17.6.19 -> 172.17.6.29 Portmap V2 GETPORT Call NLM(100021) V:4 TCP
172.17.6.29 -> 172.17.6.19 Portmap V2 GETPORT Reply (Call In 28515) PROGRAM_NOT_AVAILABLE
172.17.6.19 -> 172.17.6.29 Portmap V2 GETPORT Call NLM(100021) V:4 TCP
172.17.6.29 -> 172.17.6.19 Portmap V2 GETPORT Reply (Call In 28695) PROGRAM_NOT_AVAILABLE
172.17.6.19 -> 172.17.6.29 Portmap V2 GETPORT Call NLM(100021) V:4 TCP
172.17.6.29 -> 172.17.6.19 Portmap V2 GETPORT Reply (Call In 28720) PROGRAM_NOT_AVAILABLE
172.17.6.19 -> 172.17.6.29 Portmap V2 GETPORT Call NLM(100021) V:4 TCP
172.17.6.29 -> 172.17.6.19 Portmap V2 GETPORT Reply (Call In 28812) PROGRAM_NOT_AVAILABLE

The Linux client is attempting repeatedly to retrieve the port number of the NLM (Network Lock Manager) daemon running on the HP-UX system. The HP-UX box should reply with port number 4045. The fact that it doesn't reply with a port number tells me that rpc.lockd and rpc.statd may not be running on your 11i v3 system.

Can you verify that you have the LOCKMGR variable set to 1 in your /etc/rc.config.d/nfsconf file on the HP-UX system? Can you also confirm that the following command run on the HP-UX server returns the expected results:

# rpcinfo -t localhost 100021
program 100021 version 1 ready and waiting
program 100021 version 2 ready and waiting
program 100021 version 3 ready and waiting
program 100021 version 4 ready and waiting

You should get a "ready and waiting" reply for all 4 versions of program 100021, which is the Network Lock Manager. If you don't then we need to figure out why the lock manager is not getting started on your HP-UX system.

Regards,

Dave
wvsa
Regular Advisor

Re: NFS lockd issue 11iv3 & redhat linux AS

Hello Dave;

here is the info you requested:

#***************************************************************************
LOCKMGR=1
LOCKD_OPTIONS=""
STATD_OPTIONS=""

#***************************************************************************
# NFS client configuration variables
:q!
root@romans02:/etc/rc.config.d
# rpcinfo -t localhost 100021
rpcinfo: RPC: Program not registered
program 100021 is not available


The rpcinfo does not look good, so what is the next step?

Norm
Dave Olker
HPE Pro

Re: NFS lockd issue 11iv3 & redhat linux AS

Try this on the 11.31 system:

# /sbin/init.d/lockmgr stop
# /sbin/init.d/lockmgr start

Then re-try the rpcinfo command and see if you get the appropriate output.

Dave
wvsa
Regular Advisor

Re: NFS lockd issue 11iv3 & redhat linux AS

Hello Dave;

Ok got the following:

# rpcinfo -t localhost 100021
program 100021 version 1 ready and waiting
program 100021 version 2 ready and waiting
program 100021 version 3 ready and waiting
program 100021 version 4 ready and waiting

Interesting, all of rx6600 servers have the same problem, looks like the administrator (me) is doing something consistantly wrong, how shocking.

So should I try the application now?

wvsa
Regular Advisor

Re: NFS lockd issue 11iv3 & redhat linux AS

Dave;

Any thoughts as to why lockd is not starting up? Guess the problem is in /etc/rc.config.d

Thank you for your help


Norm
Dave Olker
HPE Pro

Re: NFS lockd issue 11iv3 & redhat linux AS

No, the problem isn't in the config file. The only parameter in the config file is LOCKMGR=1 and you already confirmed that.

Problem sounds like rpc.lockd stops responding to requests after running awhile. No idea why that's happening. That would require a separate action plan to figure out.

What version of ONCplus are you running on your 11.31 systems?

# swlist ONCplus

Dave
wvsa
Regular Advisor

Re: NFS lockd issue 11iv3 & redhat linux AS

Hi Dave;

# ONCplus B.11.31.09.01 ONC+ 2.3
ONCplus.NFS B.11.31.09.01 ONC/NFS; Network-File System,Information Services
,Utilities
Dave Olker
HPE Pro

Re: NFS lockd issue 11iv3 & redhat linux AS

Ok, so you're on the latest stuff. I was gonna suggest updating to the latest stuff to see if you're hitting a known/fixed problem.

Does the application work now?
wvsa
Regular Advisor

Re: NFS lockd issue 11iv3 & redhat linux AS

Dave;

Is there anything I should do before testing the application again?

Dave Olker
HPE Pro

Re: NFS lockd issue 11iv3 & redhat linux AS

No, please try the application
wvsa
Regular Advisor

Re: NFS lockd issue 11iv3 & redhat linux AS

Dave;

It works! yes indeed. It would be sure nice know why the lockmgr had stopped on all of our servers.


Thank you for your help!


Norm
wvsa
Regular Advisor

Re: NFS lockd issue 11iv3 & redhat linux AS

Hi Dave;

How should I go about finding out why lockmgr is not starting. It seems to be a problem on many of our hpux 11iv3 servers. Should we start a case or ?


Norm
Dave Olker
HPE Pro

Re: NFS lockd issue 11iv3 & redhat linux AS

Are you sure that's the problem - that the daemon is not starting? Or is the daemon running but not responding? Do you have any systems that are reproducing the problem at the moment? If so, are the rpc.lockd and rpc.statd daemons running on the system? Are they registered with rpcbind? (rpcinfo -p) If the problem is the daemons are not starting properly during system boot the first place I would look is in the /etc/rc.log file and syslog.log files for any errors.

If the daemons are getting started at boot time and then later stop responding, the question is are the daemons getting killed or are they still running but hung?
rochelle lauer
Occasional Contributor

Re: NFS lockd issue 11iv3 & redhat linux AS

Hello

I have a similar problem but only
with Redhat 5. Are you running Redhat 5 ?

My issue is that on the HP-UX server
(11.31 rx2660) the rpc.lockd starts using
all the CPU time and after a few minutes
the server is useless ! Looks like a
hang but it really is not.


My problem is when running Java 1.6
Java 1.5 has no issues. It appears that
Java puts locks in $HOME/.java which
is an NFS mounted directory.

I have tried HP-UX nfs server to HP-UX
client and it works fine.

I have tried HP-UX nfs server and Redhat 4
client and it works fine.


Note: I have had this problem for several
months and HP support has spent many
days looking at it. It appears that the client continues to send lock requests.

However, support did not have a Linux client
to test this out.

Any ideas would be greatly appreciated

Rochelle Lauer


Dave Olker
HPE Pro

Re: NFS lockd issue 11iv3 & redhat linux AS

> I have tried HP-UX nfs server to HP-UX
> client and it works fine.
>
> I have tried HP-UX nfs server and Redhat 4
> client and it works fine.
>
> Note: I have had this problem for several
> months and HP support has spent many
> days looking at it. It appears that the
> client continues to send lock requests.

What is the prognosis from HP Support? Do they think this is an HP problem? Has anyone from RedHat investigated the issue?

What do you mean by "the client continues to send lock requests"? Is the RedHat client sending lock requests it shouldn't? Regardless of HP Support's ability to reproduce the problem, have they looked at rpc.lockd logs and nettl traces from the working RH4 client and the failing RH5 client to see what's different?

Regards,

Dave

rochelle lauer
Occasional Contributor

Re: NFS lockd issue 11iv3 & redhat linux AS

Hello

I have just reviewed the case I had
with support. It was actually about
a year ago and though it was thoroughly
analyzed it was never resolved. I was
hoping that if someone else had a similar
problem it would help resolution.
I am going to open a new case.

To answer you questions

1. It was determined that the HP-UX lockd
was spinning

2. Java 1.6-07 and later caused the problem

3. There were many traces and it appears
that HP-UX machine was sending
NLM_DENIED
(cause is yet to be determined, see below)

4. The Java program received the denial and
kept sending requests forever.
Hence the spinning

5. It is appears that the only quirk is
that this version is java is sending
a lock request with an offset of a very
large number and HP-UX was not responding
properly

As our issue occurred both with
the java plugin and with the JavaWS
it is not a program that can be looked at
to see why it wasn't returning an error
after many lock denied.

However, this issue of why HP-UX cannot
handle this lock was never resolved.

I was hoping some who had a similar problem
would help in resolving the issue.

This hang is still happening and
installing java 1.6-20 is essential for
our work. So I am going to open another
support call to follow up

If anyone has any ideas they are welcome !

Regards
Rochelle

Dave Olker
HPE Pro

Re: NFS lockd issue 11iv3 & redhat linux AS

Just to be sure, are you running the latest ONCplus code? We've made many fixes to rpc.lockd/KLM as newer versions were released. Latest version is 11.31.09.02.

Dave