1836456 Members
2258 Online
110101 Solutions
New Discussion

NFS issues - timeout

 
Patrice Rothrock
Occasional Advisor

NFS issues - timeout

I have increased NUM_NFSIOD from the default of 4 to 10 and insured all is 100FD and AUTOMOUNT_OPTIONS="-t 3600". Any other possible causes on why I am still getting the following errors in /var/adm/syslog/syslog.log?

Dec 2 07:33:56 'server_name' vmunix: NFS server 'server_name' not responding still trying.

Thanks! gwpfaff@tycoelectronics.com
Oracle/SAP
14 REPLIES 14
Jeff Schussele
Honored Contributor

Re: NFS issues - timeout

Hi Patrice,

Network and/or routing/firewall issues can come into play here.
When you get these problems can you ping & traceroute to the NFS server?
Can you telnet on ports 2049 & 1110 to the host?

Rgds,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Patrice Rothrock
Occasional Advisor

Re: NFS issues - timeout

The connection is a local to a dedicated 8 port switch - where the server has GIGE connection to one of the two GBIC's interfacees on the switch and the clients are 100 BaseT connected, NOTE: Made sure the switch settings are set correctly as well.
Oracle/SAP
Patrice Rothrock
Occasional Advisor

Re: NFS issues - timeout

Sorry Jeff - never answere your question, Yes I can...for both ports indicated
Oracle/SAP
Steven E. Protter
Exalted Contributor

Re: NFS issues - timeout

I'm confused by your last post yet attempt to clarify.

Any machine in this mix that has a Gigabit Network card: Autonegotiate in configuration and auto negotiate on the switch.

Any machine with a 100 BaseT or slower: Hard code the settings both in config and on the switch.

Have you looked at the syslog.log on the NFS server? inetd -l to turn on enhanced logging. That message may be more helpful that what you've posted thus far.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Patrice Rothrock
Occasional Advisor

Re: NFS issues - timeout

Hi Steve,

The server ports are auto negotiate
Oracle/SAP
Patrice Rothrock
Occasional Advisor

Re: NFS issues - timeout

swtich are auto as well
Oracle/SAP
Mark Greene_1
Honored Contributor

Re: NFS issues - timeout

run the following:

UNIX95= ps -efH|head -1; UNIX95= ps -efH|egrep "inetd|rpc|mount|nfs"|grep -v grep; rpcinfo -p|egrep "service|nfs"; showmount -e; mount -p

And you should get output similar to this:

UID PID PPID C STIME TTY TIME CMD
root 811 0 0 Dec 13 ? 00:00 nfskd
root 907 1 0 Dec 13 ? 00:02 /usr/lib/netsvc/fs/automount/automount -f /etc/auto_master
root 1404 1 0 Dec 13 ? 02:11 /opt/dce/sbin/rpcd
root 806 1 0 Dec 13 ? 00:00 /usr/sbin/rpcbind
root 896 1 0 Dec 13 ? 00:00 /usr/sbin/rpc.lockd
root 890 1 0 Dec 13 ? 00:00 /usr/sbin/rpc.statd
root 1845 1 0 Dec 13 ? 00:00 /usr/sbin/rpc.mountd
root 1856 1 0 Dec 13 ? 00:00 /usr/sbin/nfsd 4
root 1873 1856 0 Dec 13 ? 00:00 /usr/sbin/nfsd 4
root 1857 1856 0 Dec 13 ? 00:00 /usr/sbin/nfsd 4
root 1874 1857 0 Dec 13 ? 00:00 /usr/sbin/nfsd 4
root 27644 20314 0 11:46:35 ttyp3 00:00 more /etc/rc.config.d/nfsconf
root 23330 1 0 Dec 21 ? 00:24 /usr/sbin/inetd
program vers proto port service
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs

if your results are different, then you are not running all of the necessary services, and/or rpc.lockd and rpc.statd are out of sync. These processes have to be started together, or you'll have problems.

mark
the future will be a lot like now, only later
Patrice Rothrock
Occasional Advisor

Re: NFS issues - timeout

Does the running of the ps command imply on the clients and the server or..
Oracle/SAP
Todd Whitcher
Esteemed Contributor

Re: NFS issues - timeout

Hello,

The message you are receiving simply means that the NFS server is not responding due to a timeout. The reasons vary but you can do a few things to determine where to start diagnosing the problem. NFS performance t-shooting can become a time consuming event, I've referenced a few documents below to help you out.

To start, the NFSD daemons are server side only and don't affect the NFS clients. If you increase the NFSD's do so on the NFS server. The Automount options -t 3600 tell the Automounter how often to check to see if the mount point is active so that it can either umount it or reset its timer. Its a good idea to set this higher than the default of 5 minutes but it will not solve the timeout messages you are receiving, it may solve the frequency at which you see them but thats not going to resolve your root problem.

What you need to do is figure out if your problem is with the NFS server, the Network or the NFS client.

For two "Shorter Guides" to NFS Performance issues reference these ITRC documents, NETUXKBRC00006283 & KBAN00000261

For a very detailed White paper on NFS peformance follow this link.

http://www.docs.hp.com/hpux/onlinedocs/1435/NFSPerformanceTuninginHP-UX11.0and11iSystems.pdf

Some Things to consider:

TCP vs. UDP?
Are you using the Legacy Automounter or Autofs?

For TCP you need the Autofs client and not the legacy Automounter, you can tell which you are using by checking the output of ps -ef|grep auto, if you have a autofs_proc process then you are using autofs if not you are using the Legacy Automounter.

By default a 11.X server will attempt TCP and fall back to UDP if TCP is not available. The general idea is to use UDP on localized / clean networks and to use TCP between systems that traverse multiple devices. You can control which protocol to use via tha mount option proto= see man mount_nfs for details. If your using the Legacy Automounter UDP is your only choice, if you use the Autofs program you can specify TCP.


From your client you can start to narrow down your issue with the nfsstat command.

( check those docs above for more details on the output of the nfsstat commands )
nfsstat -c


Client rpc:
calls badcalls retrans badxid timeout wait newcred
1081 0 0 10 8 10 0

Client nfs:
calls badcalls nclget nclsleep
1074 0 1074 0
null getattr setattr root lookup readlink read
0 0% 240 22% 0 0% 0 0% 636 59% 0 0% 34 3%
wrcache write create remove rename link symlink
0 0% 0 0% 0 0% 1 0% 0 0% 0 0% 0 0%
mkdir rmdir readdir statfs
0 0% 0 0% 160 14% 3 0%

For the above stats the items of note are the number of retransmissions, timeouts and badxid's. Where,

calls The total number of RPC calls made.

badcalls The total number of calls rejected by the RPC layer.

retrans The number of times a call had to be retransmitted due to a timeout while waiting for a reply from the server.

badxid The number of times a reply from a server was received which did not correspond to any outstanding call.

timeout The number of times a call timed out while waiting for a reply from the server.

wait The number of times a call had to wait because no client handle was available.

newcred The number of times authentication information had to be refreshed.


Since the number of badxid's received is roughly the same as the timeouts, this indicates that the NFS server involved is simply late or slow in responding, perhaps due to load. The late reply will be seen as a transaction ID which is no longer valid. The read/write sizes and timeout values can be altered in the mount options to accomidate this on a per server basis.

If the retransmit/timeout rates are an order of magnitude larger than the badxid count, it would indicate that the replies were simply dropped/lost somewhere in transit to or from the server. To isolate where in the path the packets are being dropped, a network trace using the client (nettl tracing in HP's case), server, or external analyzer is needed. Some switches/routers keep statistics on packet loss on a per port basis as well.


Also don't discount softare levels, the ONC ( NFS ) patches should be checked to see an update is appropriate.

The latest NFS patches are:

11.11 PHNE_29211 s700_800 11.11 ONC/NFS General Release/Performance Patch

11.0 PHNE_28982 s700_800 11.00 ONC/NFS General Release/Performance Patch

Hope this helps,

Todd








Brian Hackley
Honored Contributor

Re: NFS issues - timeout

Hello,

We have provided comprehansive troubleshooting tips for exactly this issue. Take a look at ITRC docs NETUXKBRC00006283 and KBAN00000261 . If you can get to the bookstore, get "Optimizing NFS Performance" by Dave Olker (HP Press 2002) which adds many layers of detail to those 2 docs.

-> Brian Hackley

Ask me about telecommuting!
Jeff Schussele
Honored Contributor

Re: NFS issues - timeout

Todd/Brian,

Has KBAN00000261 been renamed? TKB Search by Doc ID returns no results.
Could you post the link or the new name please?

Thx,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Todd Whitcher
Esteemed Contributor

Re: NFS issues - timeout

Attached is the document I referenced KBAN00000261. We have notified the ITRC support folks to see why its not available.
Todd Whitcher
Esteemed Contributor

Re: NFS issues - timeout

Maybe this time?
Jeff Schussele
Honored Contributor

Re: NFS issues - timeout

Thanks Todd

Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!