System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

Occasional I/O errors when copying to NFS shares

SOLVED
Go to solution
Paul Maglinger
Regular Advisor

Occasional I/O errors when copying to NFS shares

We're running HP-UX 11.23 servers that have mounts to remote NFS shares on a Windows 2003 Storage Server machine. Not on every file, but occasionally we get an error similar to this:

128:cp: cannot create /source/recship.lis: I/O error

After which we try recopying the file and it works okay. I have downloaded several documents on tuning NFS and have made the registry changes on the Windows server and it helped quite a bit, but we are still getting them about 10 times a day. The entries in the fstab look similar to this:

server.domain.com:/Source /source nfs retrans=10,timeo=14,rw,suid,hard,intr 0 0

Any other suggestions on where to look or what to try to eliminate these errors?
15 REPLIES
Steven Schweda
Honored Contributor

Re: Occasional I/O errors when copying to NFS shares

> [...] have made the registry changes on the
> Windows server [...]

If you think that the problem is on the
Windows system (which I'd guess is likely),
then why ask in an HP-UX forum?
Paul Maglinger
Regular Advisor

Re: Occasional I/O errors when copying to NFS shares

Hi Steven. Somehow I knew you'd be the first one to respond with a less than useful and witty response.

The error is coming from the HP-UX servers. I have made all the changes I know to make on the Windows server side and was wondering if I missed something on the client side.
Paul Maglinger
Regular Advisor

Re: Occasional I/O errors when copying to NFS shares

Sorry Stephen. I've been working on this for some time and am worn thin.

I feel like I've gone through everything I can on the Windows side from networking, processes and disk I/O. I can't find a bottleneck anywhere on that side. You say you believe it's a still a problem on the Windows side. What leads you to believe so?

TwoProc
Honored Contributor
Solution

Re: Occasional I/O errors when copying to NFS shares

I think you need to run:
nfsstat -rc

then read this link on what the results tell you:

http://docs.hp.com/en/B1031-90070/ch05s03.html#cbdbcceb

And the reference page to the various arguments for the mount

http://docs.hp.com/en/B2355-60130/mount_nfs.1M.html
We are the people our parents warned us about --Jimmy Buffett
Steven Schweda
Honored Contributor

Re: Occasional I/O errors when copying to NFS shares

> [...] Windows side. What leads you to
> believe so?

You wrote, "[...] have made the registry
changes on the Windows server and it helped
quite a bit [...]". That, combined with my
never having seen anything like this on any
AIX, HP-UX, Solaris, or Tru64 system, was
what did it.
Bill Hassell
Honored Contributor

Re: Occasional I/O errors when copying to NFS shares

> 128:cp: cannot create /source/recship.lis: I/O error

I/O error on write. So the HP-UX code is trying to write but the status back from the Windows box is [invalid:corrupted:timed-out:unknown] or something similar. There is no reason to assume that Unix NFS will be problem free when connecting to Windows where NFS is a very new protocol. Just like Unix systems, the first step is to display the NFS stats as well as bring the Windows NFS and networking code up to the latest revisions. And I am assuming that lanadmin -g reports no errors like FCS or collisions.

And if the Windows NFS has any logging capability, certainly turn that on. Otherwise, you'll have to use Wireshark to trace the error and start digging into the protocol.


Bill Hassell, sysadmin
Dave Olker
HPE Pro

Re: Occasional I/O errors when copying to NFS shares

Hi Paul,

Are you using TCP or UDP for this mount? If you're using UDP, my first suggestion would be to try again with TCP.

If you're using TCP, I'd suggest mounting the filesystem without the "timeo=14" option. I always recommend leaving "timeo" set to default on TCP filesystems.

Hope this helps,

Dave
Paul Maglinger
Regular Advisor

Re: Occasional I/O errors when copying to NFS shares

Thanks for all the replies.

John - Great information. The first link you sent went right to the spot. The nfsstat -rc indicates that the timeout and bacxid were nearly the same. The link you sent says:

If the timeout and badxid values displayed by nfsstat -rc are of the same magnitude, your server is probably slow. Client RPC requests are timing out and being retransmitted before the NFS server has a chance to respond to them. Try doubling the value of the timeo mount option on the NFS clients.

I have already increased the timeout value to 14, which is double the default of 7. At what point is the timeout value too much?

Steve - Thanks, it looks like you were on the right track.

Bill - Thanks for the explanation and suggestion. That clears up a few things. I had not looked at lanadmin results.

Dave - I'm using TCP. As mentioned above, John's link suggested doubling the value of the timeo. I know I didn't submit the results of the nfsstat -rc earlier, but with this additional knowledge do you still suggest returning the timeo value to default? What other option would you take to address the timeout and badxid times?
Dave Olker
HPE Pro

Re: Occasional I/O errors when copying to NFS shares

When using NFS/TCP, I *always* recommend using default value for timeo.

Dave
Dave Olker
HPE Pro

Re: Occasional I/O errors when copying to NFS shares

One point of clarification. The default value of 7 for timeo was determined back in the UDP days, not TCP. The default value for timeo on TCP is 60, not 7. By using 14 you're INCREASING the chances of a timeout.

Dave
Paul Maglinger
Regular Advisor

Re: Occasional I/O errors when copying to NFS shares

Dave - I just referred back to your NFS performance tuning for hp-ux 11.0 and 11i systems. I will remove the timeo entries and see what happens. What about the retrans value?
Dave Olker
HPE Pro

Re: Occasional I/O errors when copying to NFS shares

> What about the retrans value?

I'd remove it as well, though it shouldn't matter. If you look at the retrans information on the mount_nfs man page you'll see: "For connection-oriented transports, this option has no effect because it is assumed that the transport performs retransmissions on behalf of NFS."

In any case, I'd yank both timeo and retrans.

Dave


Steven E. Protter
Exalted Contributor

Re: Occasional I/O errors when copying to NFS shares

Shalom,

Windows does have an event logging system, and it should be checked at the same time you are getting those errors.

As Bill says they are write errors and what your response is, might be very connected to whats going on with the Windows box.

Please post the text of related events.

Dave is the master of NFS, and besides checking the event logs, do everything he says.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Paul Maglinger
Regular Advisor

Re: Occasional I/O errors when copying to NFS shares

Yep. At first I didn't make the connection and then I realized that his name was on my copy of "nfs performance tuning for hp-ux 11.0 and 11i systems". There were a couple of things that initially led me to believe that it was running UPD, which set me off down a dead-end road. I've made the changes as recommended and am going to see how things run this evening. Thanks again everyone.
Paul Maglinger
Regular Advisor

Re: Occasional I/O errors when copying to NFS shares

It appears that the problem has been solved. I installed the Server for NFS Authentication component on all of the domain controllers and we have had no errors for 3 days. Looking back through the records, we did find where a DC was taken down permanently around the time the errors became more frequent. Now we are not getting the errors at all. Thanks to all.