Networking
cancel
Showing results for 
Search instead for 
Did you mean: 

NFS data corruption issue 11.31 Itanium

NFS data corruption issue 11.31 Itanium

We just started using a new Itanium RX6600 with HP-UX 11.31 installed on it over to a NetApp FAS-3050c NAS device. We have an NFS mounted directory which is acting very strange. After a while, you can't do wildcard file searches with the ls command anymore. So if you had a collection of files that started with tr*, doing an ls tr* results in nothing found, but if you do a ls on a specific file, it will show up just fine. Unmounting the NFS directory and remounting it again fixes the problem, but then the problem comes back after a while.

This is a really bad issue for us because we have scripts which rely on being able to use wildcards to cat files together. This really smacks as some kind of BUG with NFS of some kind. We did not have any of this with our old 11.11 system on PA-RISC, so I don't think it's anything to do with our network.

The NFS mount options I'm using in FSTAB is:
tec-storage2.tec.clinitech.net:/vol/ops/gp01 /gp05 nfs rw,hard,vers=3,proto=tcp 0 0
19 REPLIES
Michael Steele_2
Honored Contributor

Re: NFS data corruption issue 11.31 Itanium

what versions of nfs on netapps and servers?
Support Fatherhood - Stop Family Law

Re: NFS data corruption issue 11.31 Itanium

They support version 3 and version 4. Right now I have version 4 support turned off. I also discovered I may not have the latest ONCplus patch bundle installed. The version I show loaded is B.11.31.06, but the version I see out on the web link is B.11.31.08. The web URL is https://software.hp.com/portal/swdepot/displayProductInfo.do?productNumber=ONCplus

The description of a possible bug that might be affecting me is:

Directory related operations on NFS client with ONCplus B.11.31.06 or B.11.31.07 installed and with file system mounted with read/write size greater than 8192 bytes, may result in system panic or data corruption.

Unfortunately, I can't reboot the server right now during the production day to try it, so I'm going to copy the NFS directory to local drive for now to avoid the data corruption.
Dave Olker
HPE Pro

Re: NFS data corruption issue 11.31 Itanium

Hi Patrick,

Yes, the 11.31.08 ONCplus bundle will likely fix your problem. I was involved with that issue when it was first reported and the systems we used to reproduce it in-house were NetApp filers.

Regards,

Dave

Re: NFS data corruption issue 11.31 Itanium

I have installed the new 11.31.08 bundle on our machine. The next issue we're experiencing is the performance of doing a directory listing seems really slow. So you could do an ll command, go away and have a cup of coffee, and it still isn't finished by the time you get back to your desk. This is the complaint I'm getting from my programmers who's home directory is stored on our NetApp filer. Is there reason you can think of for why this is slow? The server has two GB NIC's tied together in link aggregation, so I doubt it's because of a busy network.
Dave Olker
HPE Pro

Re: NFS data corruption issue 11.31 Itanium

Please clarify for me - when did this directory listing performance issue start? Has it always been there? Did it get worse after installing 11.31.08? Does every NFS client see the same performance? Only certain NFS clients see this problem?

Dave

Re: NFS data corruption issue 11.31 Itanium

Dave,

The directory performance issue started when we went to the new Itanium server running 11.31, even before I installed the .08 version of ONC. My programmer says it used to take a few moment to list the contents of the directory on the old 11.11 system on PA-RISC, but it's significantly worse on this new system.

Just to give you an idea, there are approximately 27,000 files inside that source code directory. So you could make an argument that it has a ton of files, but it didn't used to be as much of a problem with the other 11.11 box we moved off of.

So to summarize, not worse after .08 was installed. Only seems to be this client and no others. I do have two other 11.31 Itanium systems we're moving to in the near future, but has .02 ONC on them now and I need to update them.

Also, another observation I've seen is once you've gotten through the pain of doing a LL command on the directory, subsequent runnings of the command will be faster, but I have to guess it is because the contents are being cached. It will be slow again after the cache has aged out.

Patrick

Dave Olker
HPE Pro

Re: NFS data corruption issue 11.31 Itanium

One point of clarification:

> Only seems to be this client and no
> others. I do have two other 11.31 Itanium
> systems we're moving to in the near
> future, but has .02 ONC on them now and I
> need to update them.

Are you saying these other two 11.31 systems do not show this problem? Or do all the 11.31 systems behave the same?

Re: NFS data corruption issue 11.31 Itanium

There hasn't been much activity on the two other machines yet to be certain. They do have a much older version of ONC installed, .02, so they're definately going ot have problems until I update it.
Dave Olker
HPE Pro

Re: NFS data corruption issue 11.31 Itanium

I understand they'll hit the known data corruption issue until they're updated. I'm focusing on the performance issue now. Do those two systems see the same performance to retrieve the 27K directory listings as the 3rd 11i v3 system running 11.31.08 or are they the same speed as the 11.11 systems?

I'm trying to understand if this is a "class" problem with all 11.31 systems in your environment or only with one specific system.

Re: NFS data corruption issue 11.31 Itanium

So far in my testing here, it took about 60 seconds before the contents listed on the screen using a different 11.31 server than the one in discussed here. I don't know if 60 seconds is normal for a directory with 27K items in it, but our programmers seem to think it takes longer than it used to on the 11.11 PA-RISC servers.

It seems to take even more time if you do a "ll *" wildcard type search than if you do a plain ll without anything else on the command line. Once the command finally returns something, subsequent runs are fast for a while until it hasn't been done in some time, then it's back to slow for the 1st attempt.

All of our 11.31 servers have the same configuration as far as number of CPU's, memory, and OS patch level, with the exception of the ONC version.

Another question I have, the ONC version never cropped up when I ran a patch analysis on HP's ITRC website. Should ONC patches be covered as part of the normal patch bundle, or do I need to periodically check the web URL for the ONC package to see if there are updates and apply those instead?
Dennis Handly
Acclaimed Contributor

Re: NFS data corruption issue 11.31 Itanium

>the ONC version never cropped up when I ran a patch analysis on HP's ITRC website. Should ONC patches be covered as part of the normal patch bundle

This is a new ONC version, not a patch, so it doesn't show up with a patch analysis. (Similar to new compiler versions.)

>do I need to periodically check the web URL for the ONC package to see if there are updates and apply those instead?

Yes, for now. I don't know if Bob has plans for swa and non-patches.

Re: NFS data corruption issue 11.31 Itanium

Is NFS version 4 supported with this ONC and would it offer any performance benefits to my NetApp if I tried to use it?
Dave Olker
HPE Pro

Re: NFS data corruption issue 11.31 Itanium

I took my 11.31 system running ONCplus B.11.31.08 and mounted a filesystem from my NetApp filer (F825c running OnTAP 7.1.3) and created two different directory structures: one with 27 sub-directories each with 1,000 files, and the other containing 27,000 files in a single directory.

I mounted the filesystem using default options:

# nfsstat -m
/netapp-1 from atcfiler1:/vol/vol1
Flags: vers=3,proto=tcp,sec=sys,hard,intr,link,symlink,acl,devs,rsize=32768,wsize=32768,retrans=5,timeo=600
Attr cache: acregmin=3,acregmax=60,acdirmin=30,acdirmax=60


I then timed how long it takes to use "ll" in each of these directory structures:

# timex ll -R 1000dirs | wc -l

real 1.48
user 0.25
sys 0.58

27107

# timex ll -R 27000dirs | wc -l

real 1.44
user 0.26
sys 0.41

27001


In both cases it only took 1.4 seconds to complete the operation. I'm not displaying the contents, merely passing the results to "wc -l" so there is no lag time waiting to display all 27,000 files on my screen.

So I agree, 60 seconds sounds like a long time to return 27,000 directory entries, but in my tests I'm getting very good response times so this doesn't appear to be a class problem with all 11.31 systems.


As for NFS v4, yes it is available on 11.31 systems. You can certainly try it and see if it provides better performance. I tried it on my systems and here are the numbers I got with NFS v4:

# timex ll -R 1000dirs | wc -l

real 15.76
user 0.25
sys 4.91

27107

# timex ll -R 27000dirs | wc -l

real 27.48
user 0.27
sys 0.66

27001


Looks like things take a lot longer with NFS v4 on my systems. Your mileage may vary.

Regards,

Dave

Re: NFS data corruption issue 11.31 Itanium

Here is my mount options:

/gp05 from tec-storage2.tec.clinitech.net:/vol/ops/gp01
Flags: vers=3,proto=tcp,sec=sys,hard,intr,link,symlink,acl,devs,rsize=32768,wsize=32768,retrans=5,timeo=600
Attr cache: acregmin=3,acregmax=60,acdirmin=30,acdirmax=60

timex ll | wc -l

real 1:12.72
user 0.27
sys 0.34

28177

Also, I'm seeing this message in the syslog every once in a while...

Synchronous Page I/O error occurred while paging to/from NFS server tec-storage2.tec.clinitech.net
file system is /gp05

What could this error indicate? What sorts of things should I check?
Dave Olker
HPE Pro
Ashish_33
Occasional Advisor

Re: NFS data corruption issue 11.31 Itanium

Patrick, Can you please let me know if the NFS performance problem you were facing has been resolved.
I also have an RX6600 server with HPUX 11.31 and i am also facing the NFS performance issue and the call is with HP since December 2009 and HP has still not been able to resolve it.

Thanks.

Re: NFS data corruption issue 11.31 Itanium

After installing the ONCplus B.11.31.08 package, the major issues I had seemed to be solved. The only issue I've had lately is doing a LL command on a directory with a lot of files seems to take a really long time. One possible fix is to turn on bigendian hash directory on our NetApp NFS option, but it will cause all the mount points to go stale. I don't know if I want to do that right away. I'd have to find a time when I can do it when nothing is running on the system which is becoming more difficult with 24x7 shop.
Dave Olker
HPE Pro

Re: NFS data corruption issue 11.31 Itanium

Hi Patrick,

An interesting test for your long "ll" times would be to mount the filesystem with the "readdir" option:

readdir Disable the READDIRPLUS functionality, which is
used by default on an NFS Version 3 mount point,
and use the NFS Version 2 READDIR functionality
instead. The performance of applications that
read huge directories over NFS will vary between
NFS Version 2 and NFS Version 3 depending on the
type of information that the applications need.
The find command will be faster using NFS Version
3 READDIRPLUS while the ls command will be faster
using NFS Version 2 READDIR. The readdir option
must be used on a case by case basis depending
upon the application. There is no effect on an
NFS Version 2 mount point.


If you can, give that a try and see if the ll times change for the better.

Regards,

Dave
Ashish_33
Occasional Advisor

Re: NFS data corruption issue 11.31 Itanium

Hi Patrick/Dave,

Thanks for your reply.
My problem was resolved by using the following
I exported the filesystem on the NFS server with the ASYNC option.
On the NFS client i set the ketnel parameter nfs_enable_write_behind=1
and mounted the filesystem on the client side using the forcedirectio option.
I short i used the ASUNC mode on both the Server and the Client.