NFS issues - timeout

Patrice Rothrock · ‎12-02-2003

I have increased NUM_NFSIOD from the default of 4 to 10 and insured all is 100FD and AUTOMOUNT_OPTIONS="-t 3600". Any other possible causes on why I am still getting the following errors in /var/adm/syslog/syslog.log?

Dec 2 07:33:56 'server_name' vmunix: NFS server 'server_name' not responding still trying.

Thanks! gwpfaff@tycoelectronics.com

Oracle/SAP

Jeff Schussele · ‎12-02-2003

Hi Patrice,

Network and/or routing/firewall issues can come into play here.
When you get these problems can you ping & traceroute to the NFS server?
Can you telnet on ports 2049 & 1110 to the host?

Rgds,
Jeff

PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!

Patrice Rothrock · ‎12-02-2003

The connection is a local to a dedicated 8 port switch - where the server has GIGE connection to one of the two GBIC's interfacees on the switch and the clients are 100 BaseT connected, NOTE: Made sure the switch settings are set correctly as well.

Oracle/SAP

Patrice Rothrock · ‎12-02-2003

Sorry Jeff - never answere your question, Yes I can...for both ports indicated

Oracle/SAP

Steven E. Protter · ‎12-02-2003

I'm confused by your last post yet attempt to clarify.

Any machine in this mix that has a Gigabit Network card: Autonegotiate in configuration and auto negotiate on the switch.

Any machine with a 100 BaseT or slower: Hard code the settings both in config and on the switch.

Have you looked at the syslog.log on the NFS server? inetd -l to turn on enhanced logging. That message may be more helpful that what you've posted thus far.

SEP

Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

Patrice Rothrock · ‎12-02-2003

Hi Steve,

The server ports are auto negotiate

Oracle/SAP

Patrice Rothrock · ‎12-02-2003

swtich are auto as well

Oracle/SAP

Mark Greene_1 · ‎12-02-2003

run the following:

UNIX95= ps -efH|head -1; UNIX95= ps -efH|egrep "inetd|rpc|mount|nfs"|grep -v grep; rpcinfo -p|egrep "service|nfs"; showmount -e; mount -p

And you should get output similar to this:

UID PID PPID C STIME TTY TIME CMD
root 811 0 0 Dec 13 ? 00:00 nfskd
root 907 1 0 Dec 13 ? 00:02 /usr/lib/netsvc/fs/automount/automount -f /etc/auto_master
root 1404 1 0 Dec 13 ? 02:11 /opt/dce/sbin/rpcd
root 806 1 0 Dec 13 ? 00:00 /usr/sbin/rpcbind
root 896 1 0 Dec 13 ? 00:00 /usr/sbin/rpc.lockd
root 890 1 0 Dec 13 ? 00:00 /usr/sbin/rpc.statd
root 1845 1 0 Dec 13 ? 00:00 /usr/sbin/rpc.mountd
root 1856 1 0 Dec 13 ? 00:00 /usr/sbin/nfsd 4
root 1873 1856 0 Dec 13 ? 00:00 /usr/sbin/nfsd 4
root 1857 1856 0 Dec 13 ? 00:00 /usr/sbin/nfsd 4
root 1874 1857 0 Dec 13 ? 00:00 /usr/sbin/nfsd 4
root 27644 20314 0 11:46:35 ttyp3 00:00 more /etc/rc.config.d/nfsconf
root 23330 1 0 Dec 21 ? 00:24 /usr/sbin/inetd
program vers proto port service
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs

if your results are different, then you are not running all of the necessary services, and/or rpc.lockd and rpc.statd are out of sync. These processes have to be started together, or you'll have problems.

mark

the future will be a lot like now, only later

Patrice Rothrock · ‎12-02-2003

Does the running of the ps command imply on the clients and the server or..

Oracle/SAP

Todd Whitcher · ‎12-03-2003

Hello,

The message you are receiving simply means that the NFS server is not responding due to a timeout. The reasons vary but you can do a few things to determine where to start diagnosing the problem. NFS performance t-shooting can become a time consuming event, I've referenced a few documents below to help you out.

To start, the NFSD daemons are server side only and don't affect the NFS clients. If you increase the NFSD's do so on the NFS server. The Automount options -t 3600 tell the Automounter how often to check to see if the mount point is active so that it can either umount it or reset its timer. Its a good idea to set this higher than the default of 5 minutes but it will not solve the timeout messages you are receiving, it may solve the frequency at which you see them but thats not going to resolve your root problem.

What you need to do is figure out if your problem is with the NFS server, the Network or the NFS client.

For two "Shorter Guides" to NFS Performance issues reference these ITRC documents, NETUXKBRC00006283 & KBAN00000261

For a very detailed White paper on NFS peformance follow this link.

http://www.docs.hp.com/hpux/onlinedocs/1435/NFSPerformanceTuninginHP-UX11.0and11iSystems.pdf

Some Things to consider:

TCP vs. UDP?
Are you using the Legacy Automounter or Autofs?

For TCP you need the Autofs client and not the legacy Automounter, you can tell which you are using by checking the output of ps -ef|grep auto, if you have a autofs_proc process then you are using autofs if not you are using the Legacy Automounter.

By default a 11.X server will attempt TCP and fall back to UDP if TCP is not available. The general idea is to use UDP on localized / clean networks and to use TCP between systems that traverse multiple devices. You can control which protocol to use via tha mount option proto= see man mount_nfs for details. If your using the Legacy Automounter UDP is your only choice, if you use the Autofs program you can specify TCP.

From your client you can start to narrow down your issue with the nfsstat command.

( check those docs above for more details on the output of the nfsstat commands )
nfsstat -c

Client rpc:
calls badcalls retrans badxid timeout wait newcred
1081 0 0 10 8 10 0

Client nfs:
calls badcalls nclget nclsleep
1074 0 1074 0
null getattr setattr root lookup readlink read
0 0% 240 22% 0 0% 0 0% 636 59% 0 0% 34 3%
wrcache write create remove rename link symlink
0 0% 0 0% 0 0% 1 0% 0 0% 0 0% 0 0%
mkdir rmdir readdir statfs
0 0% 0 0% 160 14% 3 0%

For the above stats the items of note are the number of retransmissions, timeouts and badxid's. Where,

calls The total number of RPC calls made.

badcalls The total number of calls rejected by the RPC layer.

retrans The number of times a call had to be retransmitted due to a timeout while waiting for a reply from the server.

badxid The number of times a reply from a server was received which did not correspond to any outstanding call.

timeout The number of times a call timed out while waiting for a reply from the server.

wait The number of times a call had to wait because no client handle was available.

newcred The number of times authentication information had to be refreshed.

Since the number of badxid's received is roughly the same as the timeouts, this indicates that the NFS server involved is simply late or slow in responding, perhaps due to load. The late reply will be seen as a transaction ID which is no longer valid. The read/write sizes and timeout values can be altered in the mount options to accomidate this on a per server basis.

If the retransmit/timeout rates are an order of magnitude larger than the badxid count, it would indicate that the replies were simply dropped/lost somewhere in transit to or from the server. To isolate where in the path the packets are being dropped, a network trace using the client (nettl tracing in HP's case), server, or external analyzer is needed. Some switches/routers keep statistics on packet loss on a per port basis as well.

Also don't discount softare levels, the ONC ( NFS ) patches should be checked to see an update is appropriate.

The latest NFS patches are:

11.11 PHNE_29211 s700_800 11.11 ONC/NFS General Release/Performance Patch

11.0 PHNE_28982 s700_800 11.00 ONC/NFS General Release/Performance Patch

Hope this helps,

Todd

Brian Hackley · ‎12-03-2003

Hello,

We have provided comprehansive troubleshooting tips for exactly this issue. Take a look at ITRC docs NETUXKBRC00006283 and KBAN00000261 . If you can get to the bookstore, get "Optimizing NFS Performance" by Dave Olker (HP Press 2002) which adds many layers of detail to those 2 docs.

-> Brian Hackley

Ask me about telecommuting!

Jeff Schussele · ‎12-03-2003

Todd/Brian,

Has KBAN00000261 been renamed? TKB Search by Doc ID returns no results.
Could you post the link or the new name please?

Thx,
Jeff

PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!

Todd Whitcher · ‎12-03-2003

Attached is the document I referenced KBAN00000261. We have notified the ITRC support folks to see why its not available.

Todd Whitcher · ‎12-03-2003

Maybe this time?

Jeff Schussele · ‎12-03-2003

Thanks Todd

Jeff

PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

NFS issues - timeout

NFS issues - timeout

Re: NFS issues - timeout

Re: NFS issues - timeout

Re: NFS issues - timeout

Re: NFS issues - timeout

Re: NFS issues - timeout

Re: NFS issues - timeout

Re: NFS issues - timeout

Re: NFS issues - timeout

Re: NFS issues - timeout

Re: NFS issues - timeout

Re: NFS issues - timeout

Re: NFS issues - timeout

Re: NFS issues - timeout

Re: NFS issues - timeout