1832770 Members
3014 Online
110045 Solutions
New Discussion

Stale NFS mount

 
Lars-Olof Fermvall
Frequent Advisor

Stale NFS mount

I have a problem where I write to a file on an NFS mounted device for a couple of hours, but most of the time it fails due to a 'stale NFS file handle' (see the ...log file attached). What am I doing wrong?
Nemo enim saltat sobrius, nice forte insanit
18 REPLIES 18
Steven E. Protter
Exalted Contributor

Re: Stale NFS mount

nfs mounts can go state if the machine you are mounting goes offline.

It can be booted, or network connectivity is lost.

Assuming the machine was not booted, lets look at network issues.

At both ends of the connection if possible:

lanadmin -x 0

Replace 0 with the actual number of the active NIC card.

Is it 100 BaseT Full Duplex as you expected.

If you see Autonegotiate, take steps to go manual.

/etc/rc.config.d/hpbtlanconf (including an example) lets you hard code this stuff.

Talk to the admin of your switch. Especially if its Cisco, go to explicit, manual port configuration. You should only go with auto negotiate on HP 1000 BaseT Cards an up.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Pete Randall
Outstanding Contributor

Re: Stale NFS mount

A stale NFS file handle occurs when you lose the connection to the server that contains the "NFS mounted device". Either the server has gone down or there are issues with the lan connection. Make sure you are up to date on NFS patches and make sure that the server is reliable and it's network connection is too.


Pete


Pete
Elena Leontieva
Esteemed Contributor

Re: Stale NFS mount

Try to do the following:

mount -v to get the full path of NFS mount
fuser -ck /mount point
umount /mount point

if this help mount -a, if not - reboot.
Bill Hassell
Honored Contributor

Re: Stale NFS mount

This is very common with NFS...it is not a good protocol to use on a LAN that is overloaded or having problems due to misconfigured PCs or other network devices. The NFS server may be too slow to respond (may need a lot of patches and/or NFS export changes). You may find that NFS is too unstable for a production environment and that simply using FTP is not only much faster but much more reliable. And stale NFS mountpoints will cause your local machines too hang (commands like bdf and even logins). NFS servers must be MORE reliable (including the network) then the clients in order to successful.


Bill Hassell, sysadmin
Mark Greene_1
Honored Contributor

Re: Stale NFS mount

Have you run traceroute between the two? If the servers are more than 3 hops away, or if there are static routes in either server or in any of the intervening routers or switches, you are not likely to have much success with this.

mark
the future will be a lot like now, only later
Lars-Olof Fermvall
Frequent Advisor

Re: Stale NFS mount

Hi guys!

Thanks for all your comments! When I get back to site (I am in my hotel room - it is morning in Australia) I will try the things that are applicable. A bit more info: The device that is NFS mounted is a NAS 2000 device, so I may also take up the issue with its supplier (it has two LAN interfaces, in 'fault tolerant' mode). Both the HP B2600 and the NAS are connected to the same 100 Mbit/s switched hub (I'll have to check the exact model).
Nemo enim saltat sobrius, nice forte insanit
Bill Hassell
Honored Contributor

Re: Stale NFS mount

Aha, now the puzzle gets more interesting. NAS devices are generally designed for a Windows environment and have little testing in the Unix area. I would strongly recommend the NFS Performance book by Dave Olker and also check the lanadmin numbers for the interface. It is VERY common to have a failed autonegotiation in a 100 Mbit/full duplex connection which causes VERY bad performance and lots of collisions, FCS errors and other nasty LAN problems. Lock the HP-UX NIC card *and* the switch port to 100-full-duplex, no autonegotiation. lanadmin errors should disappear and performance should go way up.


Bill Hassell, sysadmin
Lars-Olof Fermvall
Frequent Advisor

Re: Stale NFS mount

Hi!
Again, thanks for all your suggestions. This is where I am at right now:
The LAN seems very healthy: I have Glance running all the time, and I yet to see any errors or even a collision on the network graph. I may have missed one, but things seem steady at 100 Mbit/s (though I did not run lanadmin while I was still on site, so I may be guessing - I'll see if I can get it done by someone else). I have not (yet) reconfigured the (built in?) NIC on the B2600 to hard code the speed. Also, after each 'stale handle' problem, the access to the NFS mounted file system works without any further intervention. The switched hub is a HORIZON VH-2402S2.
Before each time I was using the connection I used unmount twice to reset everything (nothing left in /etc/mnttab for).
However, since I could now get my hand on a large enough disk partition, I am testing a work around: I send the output of vxdump|gzip to another disk intead of to the NAS, and then the script moves the resulting file to the NAS when the backup is finished. This has (so far) been successful. If nothing else, it means that accessing the file via NFS is now done intensely for about five minutes instead of less intensely over two hours or so. Thus the odds for not getting a network problem should be better.

Points? No, I will not forget to hand them out!
Nemo enim saltat sobrius, nice forte insanit
Steven E. Protter
Exalted Contributor

Re: Stale NFS mount

Some more advice on tracking down NFS issues caused by network connectivity.

traceroute from client to host and back.

Why?

What happens if some router is unhappy with life. Had this happen to me on a wan link, router needed an ios upgrade, new power supply and a good thorough cleaning.

The only way I got network support to even LOOK at it was to show them where the traceroutes were hanging.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Ollie R
Respected Contributor

Re: Stale NFS mount

Hi,

For your reference, the Dave Olker document that Bill refers to can be found at:
http://h21007.www2.hp.com/dspp/files/unprotected/devresource/Docs/Presentations/NFSperf.pdf

It's a fantastic document and any recommendations it offers should be seriously considered.

Ollie.
To err is human but to not award points is unforgivable
Lars-Olof Fermvall
Frequent Advisor

Re: Stale NFS mount

Thanks again! Now things appear to be working on site (the work around helped - maybe the idea of keeping a file open over an NFS mount for hours was a bad one to start with). I have attached the lanadmin statistics (the system was re-booted just a few days ago). I'll follow the link to the document.
Nemo enim saltat sobrius, nice forte insanit
Lars-Olof Fermvall
Frequent Advisor

Re: Stale NFS mount

Just one thing I forgot to mention: The HP system is running 10.20...
Nemo enim saltat sobrius, nice forte insanit
Ollie R
Respected Contributor

Re: Stale NFS mount

Just another thing to mention - assign points!
To err is human but to not award points is unforgivable
Lars-Olof Fermvall
Frequent Advisor

Re: Stale NFS mount

Trust me, I HAVE assigned points! - but maybe it takes time to update (I can't see my point assignments either)? If they are not up on Monday, I'll have another go...
Nemo enim saltat sobrius, nice forte insanit
Bill Hassell
Honored Contributor

Re: Stale NFS mount

Just a note: Dave Olker has an excellent paper as mentioned, but I was referring to the book:

"Optimizing NFS Performance: Tuning and Troubleshooting NFS on HP-UX Systems" -- David Olker

Your lanadmin statistics look just fine (all zeros in the error section). Glance/gpm show fairly broad values. For instance, on any full-duplex connection, there will NEVER be a collision. The real issues revolve around NFS statistics (ie, nfsstat) so you'll need to run nfsstat and script look at the client values. You may need additional NFS daemons (biod). The default number is 4 which is likely much too small for heavy NFS traffic. And then there are kernel parameters such as ninode, ncsize, etc (all covered in depth in Dave's book).


Bill Hassell, sysadmin
Lars-Olof Fermvall
Frequent Advisor

Re: Stale NFS mount

Hi Bill!
I looked at the presentation (not the full book) over the weekend, and found it VERY interesting. Next time I have an opportunity I'll check some kernel parameters to see if I can gain some performance when copying files to the NAS (A 1.8 Gbyte file takes 10 minutes). Especially I'll turn off direct reporting for disks, since all disks are either (512 Mbyte cached) RAID systems or mirrored using HP:s disk mirroring SW (root disk). I'll also see if increasing the buffer cache may help (Currently is is static 10% in a system with 768 Mbyte). I have some unused memory right now, but not applications are running on the system yet. I have already bumped many of the kernel parameters mentioned in the paper to make sure the four Oracle instances run well. Out of the kernel resources Glance is showing, I have lots of headroom.

One thing that has eluded me is to get bigger write/read blocks than 8 kbyte. Maybe the 32kbyte setting is not supported by the 10.20 NFS PV3 implementation?
Nemo enim saltat sobrius, nice forte insanit
Lars-Olof Fermvall
Frequent Advisor

Re: Stale NFS mount

Bill,

I forgot to mention that I was initially running 8 biod, but increased that to 16 last week. There are nfs statictics as an attachment in an earlier posting.
Nemo enim saltat sobrius, nice forte insanit
Massimo Bianchi
Honored Contributor

Re: Stale NFS mount

Hi,
in my environtments, helped much the following settings:

-o rsize=32768,wsize=32768,hard

added to your already present options. They eliminated completely the error.

Be sure to use the "hard" option if you also use executable on that NFS share.

HTH,
Massimo