Re: Bizarre lockd behaviour

Pyers Symon · ‎10-08-2009

We have an issue where we are getting very strange behavior between an HP-UX 11.11 NFS client and an Soalris 8 server. In essence it seems that the Solaris lockd is only responding intermittently to an rpcinfo call:

For example:

# rpcinfo -T tcp fid001 nlockmgr 1
rpcinfo: RPC: Timed out
program 100021 version 1 is not available

To complicate issues this is part of a Veritas cluster .......

However a few second later:

# rpcinfo -T tcp fid001 status 1
program 100024 version 1 ready and waiting

Further investigation shows that the response is very slow (when it works) but is always below the 10 sec timeout of rpcinfo. truss & netstat on the Solaris box indicates that a connection is being made in all cases but for some reason the server side is not responding correctly at the lower level...

Any thoughts?

I suspect that this is at teh Sun end ...

Dave Olker · ‎10-08-2009

Your two examples were testing two different RPC programs. The first was lockd and the second was statd. Does the same RPC daemon respond differently at different times?

As for which system is at fault, the best way to determine that would be to collect a network trace on *BOTH* systems. A trace on the HP-UX system should show when the packet is sent and how long it takes to receive a reply. A trace on the Sun system should show when the request arrives and how long it takes for it to respond. In many cases the client and server traces can be misleading on their own so it helps to get a trace from both systems and compare them.

Regards,

Dave

I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

Pyers Symon · ‎10-08-2009

Sorry Dave (hi again!)

That was my fault .. I didn't check what I wrote. Both examples should have been from lockd ... (I was checking statd at the same time)

This is what is should have looked like ...

fid001# rpcinfo -T tcp fid001 nlockmgr 1
program 100021 version 1 ready and waiting
fid001# rpcinfo -T tcp fid001 nlockmgr 1
rpcinfo: RPC: Timed out
program 100021 version 1 is not available

I will arrange a network trace for you ...

Pyers Symon · ‎10-12-2009

Turned out it was the number of lockd threads started on the server that was the issue. The system was set at the default of 20 (dec) which was being exceeded.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Bizarre lockd behaviour

Bizarre lockd behaviour