Operating System - HP-UX
1820485 Members
2496 Online
109624 Solutions
New Discussion юеВ

NFS performance problems - large file writes slow system severely

 
SOLVED
Go to solution
Barry C. Dowell
Frequent Advisor

Re: NFS performance problems - large file writes slow system severely

Still more information (since reading around a little more has me finding some other numbers that seem to be mentioned in tuning...)

On "R" - the following Kernel Tunable params are set to the following values:

dbc_max_pct = 50
dbc_min_pct = 5
nbuf = 0
bufpages = 0


On "H" - the following param values are shown:

dbc_max_pct = 50
dbc_min_pct = 5
nbuf = 0
bufpages = 0


From "F" (11.23 system) - following are shown:

dbc_max_pct = 20 (dynamic = yes)
dbc_min_pct = 20 (dynamic = yes)

nbuf and bufpages do not show in list of tunables (because it's an 11.23 system I assume)
Alzhy
Honored Contributor

Re: NFS performance problems - large file writes slow system severely

Your system "H", what NFS patch are you on? I have 11.11 ranging from a K Class to a rp8420/SD and my NFS benchmarks versus most NFS servers (no SFU based Windows NFS Server though) are all fine.

I have the following 11.11 NFS Patches:

PHNE_34662 ONC/NFS General Release/Performance Patch


Also, what NFS mount directive do you use on your 11.11 system. Have you explcitly told it to use V3 and tried UDP instead of TCP?

Can you post for System H:

lanadmin -x 1 (or whatever is your lanN ppa)

Hakuna Matata.
Barry C. Dowell
Frequent Advisor

Re: NFS performance problems - large file writes slow system severely

Here's the results of the command requested to be run against system "H" (by the way, it would seem the same results are shown):

--

# lanadmin -x 1
Current Config = 100 Full-Duplex MANUAL
Barry C. Dowell
Frequent Advisor

Re: NFS performance problems - large file writes slow system severely

Stupid reply (from me) time -- how can I tell what versions/patches are applied? (As in, how can I tell what NFS patches are applied).

--

Also, I see references (in related to locking) about having to use -C for the rpc.lockd option -- how do I make certain that is set to happen on the various machines?

--

Finally, another follow-up on the machines in question here - machine "F" (the 11.23 system) is running Oracle.

Machine "H" has been used for some Java development work and I believe may have MKS or other similar applications installed. Machine "H" also serves up a NFS export to the other hosts ("F", "R" and "B" (a 10.20 box) all mount to "H" for that exported resource).

Machine "R" is the NIS server for the group.


Hope this information helps clarify things more.
Alzhy
Honored Contributor

Re: NFS performance problems - large file writes slow system severely

Barry,

Check the version of your NFS patch as follows:

/usr/contrib/bin/show_patches | grep NFS


Hakuna Matata.
V. Nyga
Honored Contributor

Re: NFS performance problems - large file writes slow system severely

Hi,

you have copied the first page of the landiag output.

What's about the second page?
Any errors?

'Press to continue' - then
you'll see the 'Ethernet-like Statistics'.

Also: your 11i systems have the speed settings 'MANUELL'.
So let the network staff check if the switches are also running 100 MBit FullDuplex.

Volkmar
*** Say 'Thanks' with Kudos ***
Barry C. Dowell
Frequent Advisor

Re: NFS performance problems - large file writes slow system severely

Checking some more, machine "R" reports:

# /usr/contrib/bin/show_patches | grep NFS
PHKL_25238 11.00 NFS nfsd deadlock
PHKL_25652 thread nostop for NFS, rlimit max value fix
PHKL_28185 Tunable;vxportal;vx_maxlink;DMAPI NFS hang
PHKL_29335 vx_nospace on NFS write.
PHKL_30151 NFS binary overwrite hang
PHNE_32477 ONC/NFS General Release/Performance Patch
Barry C. Dowell
Frequent Advisor

Re: NFS performance problems - large file writes slow system severely

Machine "H" reports:

--

# /usr/contrib/bin/show_patches | grep NFS
PHKL_25238 11.00 NFS nfsd deadlock
PHKL_25652 thread nostop for NFS, rlimit max value fix
PHKL_28185 Tunable;vxportal;vx_maxlink;DMAPI NFS hang
PHKL_29335 vx_nospace on NFS write.
PHKL_30151 NFS binary overwrite hang
PHNE_31097 ONC/NFS General Release/Performance Patch
Barry C. Dowell
Frequent Advisor

Re: NFS performance problems - large file writes slow system severely

Machine "F" (the 11.23 machine) reports
--

# /usr/contrib/bin/show_patches | grep NFS
PHCO_31546 quota(1) on an NFS client
PHCO_31645 Japanese NFS/LIBNSL manpages
PHCO_31647 Japanese NFS manpages
PHCO_31661 Japanese NFS/RCMDS manpages
PHCO_31662 Japanese NFS/NIS manpages
PHCO_31666 Japanese core NFS manpages
Barry C. Dowell
Frequent Advisor

Re: NFS performance problems - large file writes slow system severely

Thanks for the followup V. Nyga

Here's the second page of information from the landiag - lan - dis screen (sorry, I had missed the "Return" to continue before)

--

System "R"

--

Ethernet-like Statistics Group

Index = 1
Alignment Errors = 0
FCS Errors = 0
Single Collision Frames = 0
Multiple Collision Frames = 0
Deferred Transmissions = 0
Late Collisions = 0
Excessive Collisions = 0
Internal MAC Transmit Errors = 0
Carrier Sense Errors = 0
Frames Too Long = 0
Internal MAC Receive Errors = 0
Barry C. Dowell
Frequent Advisor

Re: NFS performance problems - large file writes slow system severely

System "H" (second screen from landiag) report:

--

Press to continue


Ethernet-like Statistics Group

Index = 1
Alignment Errors = 0
FCS Errors = 0
Single Collision Frames = 0
Multiple Collision Frames = 0
Deferred Transmissions = 0
Late Collisions = 0
Excessive Collisions = 0
Internal MAC Transmit Errors = 0
Carrier Sense Errors = 0
Frames Too Long = 0
Internal MAC Receive Errors = 0
Alzhy
Honored Contributor

Re: NFS performance problems - large file writes slow system severely

My 11.11 Systems all have:

PHNE_34662 ONC/NFS General Release/Performance Patch

We're heavy on NFS use - both clients (to various NFS Servers, no Windows NFS Servers though) and servers (serving to fellow UNIX machines) and we've never seen this behaviour.

Assuming your network for your problem "H" machine is fine and dandy (switch also set to 100 Full/Manual) -- then I guess you can try applying the latest Gold patches which contain the above NFS mega patch and see if it fixes the problem.



Hakuna Matata.
Barry C. Dowell
Frequent Advisor

Re: NFS performance problems - large file writes slow system severely

Updating yet again.

I pulled down the patches for system "R", applied all. Confirmed after the fact that the following NFS related patches are installed:

# /usr/contrib/bin/show_patches | grep NFS
PHKL_25238 11.00 NFS nfsd deadlock
PHKL_25652 thread nostop for NFS, rlimit max value fix
PHKL_28185 Tunable;vxportal;vx_maxlink;DMAPI NFS hang
PHKL_29335 vx_nospace on NFS write.
PHKL_34595 VM NFS umount fix
PHNE_34662 ONC/NFS General Release/Performance Patch


Ran the following test:

# time dd if=/dev/zero of=/home/bdowell/test.fil bs=32k count=100
100+0 records in
100+0 records out

real 1:47.3
user 0.0
sys 0.0
#


Watching the write operations in Glance Plus (Disk Rpt) I see -no- (0) (zero) Logl Wts showing for Remote disk in the Cum byte field. I see total number of bytes that the test above would have generated showing up in the Phys Wts column.


So, patching the system still hasn't resolved the problem(s) that we're having with speed of accessing (reading and writing) to the NFS space that is provided on the Win box.
Dave Olker
Neighborhood Moderator

Re: NFS performance problems - large file writes slow system severely

Hi Barry,

I'm just back from being gone for a week of vacation and I see you're still having this problem. Bummer.

Can you please send me the "nfsstat -m" output from all three systems - the working and the failing systems - so I can see exactly how the NFS filesystems are mounted on these clients?

Thanks,

Dave


I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
Barry C. Dowell
Frequent Advisor

Re: NFS performance problems - large file writes slow system severely

Hi again Dave - Here's the nfsstat -m output from the 3 systems.


System "Y" is our Windows based (Windows Server 2003 R2) NFS server.

--

nfsstat -m from System "H" (IP addresses munged purposely here for security purposes, same with host system name)

# nfsstat -m
/officegrp from y:/officegrp (Addr 1.1.1.2)
Flags: vers=3,proto=tcp,auth=unix,hard,intr,link,symlink,devs,rsize=32768,wsize=32768,retrans=5
All: srtt= 0 ( 0ms), dev= 0 ( 0ms), cur= 0 ( 0ms)

/home from y:/home (Addr 1.1.1.2)
Flags: vers=3,proto=tcp,auth=unix,hard,intr,link,symlink,devs,rsize=32768,wsize=32768,retrans=5
All: srtt= 0 ( 0ms), dev= 0 ( 0ms), cur= 0 ( 0ms)

/devtst from y:/devtst (Addr 1.1.1.2)
Flags: vers=3,proto=tcp,auth=unix,hard,intr,link,symlink,devs,rsize=32768,wsize=32768,retrans=5
All: srtt= 0 ( 0ms), dev= 0 ( 0ms), cur= 0 ( 0ms)

/public from y:/public (Addr 1.1.1.2)
Flags: vers=3,proto=tcp,auth=unix,hard,intr,link,symlink,devs,rsize=32768,wsize=32768,retrans=5
All: srtt= 0 ( 0ms), dev= 0 ( 0ms), cur= 0 ( 0ms)

/projects from y:/projects (Addr 1.1.1.2)
Flags: vers=3,proto=tcp,auth=unix,hard,intr,link,symlink,devs,rsize=32768,wsize=32768,retrans=5
All: srtt= 0 ( 0ms), dev= 0 ( 0ms), cur= 0 ( 0ms)

/missions from y:/missions (Addr 1.1.1.2)
Flags: vers=3,proto=tcp,auth=unix,hard,intr,link,symlink,devs,rsize=32768,wsize=32768,retrans=5
All: srtt= 0 ( 0ms), dev= 0 ( 0ms), cur= 0 ( 0ms)



nfsstat -m from System "R" (again, IP addresses purposely munged....)

# nfsstat -m
/devtst from y:/devtst (Addr 1.1.1.2)
Flags: vers=3,proto=tcp,auth=unix,hard,intr,link,symlink,devs,rsize=32768,wsize=32768,retrans=5
All: srtt= 0 ( 0ms), dev= 0 ( 0ms), cur= 0 ( 0ms)

/usr/local from y:/hp11.11 (Addr 1.1.1.2)
Flags: vers=3,proto=tcp,auth=unix,hard,intr,link,symlink,devs,rsize=32768,wsize=32768,retrans=5
All: srtt= 0 ( 0ms), dev= 0 ( 0ms), cur= 0 ( 0ms)

/public from y:/public (Addr 1.1.1.2)
Flags: vers=3,proto=tcp,auth=unix,hard,intr,link,symlink,devs,rsize=32768,wsize=32768,retrans=5
All: srtt= 0 ( 0ms), dev= 0 ( 0ms), cur= 0 ( 0ms)

/projects from y:/projects (Addr 1.1.1.2)
Flags: vers=3,proto=tcp,auth=unix,hard,intr,link,symlink,devs,rsize=32768,wsize=32768,retrans=5
All: srtt= 0 ( 0ms), dev= 0 ( 0ms), cur= 0 ( 0ms)

/missions from y:/missions (Addr 1.1.1.2)
Flags: vers=3,proto=tcp,auth=unix,hard,intr,link,symlink,devs,rsize=32768,wsize=32768,retrans=5
All: srtt= 0 ( 0ms), dev= 0 ( 0ms), cur= 0 ( 0ms)

/officegrp from y:/officegrp (Addr 1.1.1.2)
Flags: vers=3,proto=tcp,auth=unix,hard,intr,link,symlink,devs,rsize=32768,wsize=32768,retrans=5
All: srtt= 0 ( 0ms), dev= 0 ( 0ms), cur= 0 ( 0ms)

/home from y:/home (Addr 1.1.1.2)
Flags: vers=3,proto=tcp,auth=unix,hard,intr,link,symlink,devs,rsize=32768,wsize=32768,retrans=5
All: srtt= 0 ( 0ms), dev= 0 ( 0ms), cur= 0 ( 0ms)



nfsstat -m from System "F" (again, IP addresses purposely munged.... )

# nfsstat -m
/usr/local from y:/hp11.11 (Addr 1.1.1.2)
Flags: vers=3,proto=tcp,auth=unix,hard,intr,link,symlink,devs,rsize=32768,wsize=32768,retrans=5
All: srtt= 0 ( 0ms), dev= 0 ( 0ms), cur= 0 ( 0ms)

/public from y:/public (Addr 1.1.1.2)
Flags: vers=3,proto=tcp,auth=unix,hard,intr,link,symlink,devs,rsize=32768,wsize=32768,retrans=5
All: srtt= 0 ( 0ms), dev= 0 ( 0ms), cur= 0 ( 0ms)

/home from y:/home (Addr 1.1.1.2)
Flags: vers=3,proto=tcp,auth=unix,hard,intr,link,symlink,devs,rsize=32768,wsize=32768,retrans=5
All: srtt= 0 ( 0ms), dev= 0 ( 0ms), cur= 0 ( 0ms)

/officegrp from y:/officegrp (Addr 1.1.1.2)
Flags: vers=3,proto=tcp,auth=unix,hard,intr,link,symlink,devs,rsize=32768,wsize=32768,retrans=5
All: srtt= 0 ( 0ms), dev= 0 ( 0ms), cur= 0 ( 0ms)

/projects from y:/projects (Addr 1.1.1.2)
Flags: vers=3,proto=tcp,auth=unix,hard,intr,link,symlink,devs,rsize=32768,wsize=32768,retrans=5
All: srtt= 0 ( 0ms), dev= 0 ( 0ms), cur= 0 ( 0ms)

/missions from y:/missions (Addr 1.1.1.2)
Flags: vers=3,proto=tcp,auth=unix,hard,intr,link,symlink,devs,rsize=32768,wsize=32768,retrans=5
All: srtt= 0 ( 0ms), dev= 0 ( 0ms), cur= 0 ( 0ms)

/devtst from y:/devtst (Addr 1.1.1.2)
Flags: vers=3,proto=tcp,auth=unix,hard,intr,link,symlink,devs,rsize=32768,wsize=32768,retrans=5
All: srtt= 0 ( 0ms), dev= 0 ( 0ms), cur= 0 ( 0ms)
Dave Olker
Neighborhood Moderator

Re: NFS performance problems - large file writes slow system severely

Barry,

Do you get the same behavior regardless of which NFS mounted filesystem you are writing to? It looks like you have 7 different filesystems mounted from serverY. Do they all exhibit this behavior or only one of them?

This behavior leads me to believe that the working client is using buffer cache for it's writes but the failing systems are not. The main reasons I can think of why that might be are:

1) The failing systems are already using all their buffer cache resources so there is none left for NFS to use

2) The client is purposely not using buffer cache effectively for NFS writes

There is no easy way to determine which pages in buffer cache are in use by which kernel subsystem. As for reasons why NFS wouldn't use buffer cache effectively there could be several reasons:

1) You're locking the file you're writing to
2) You're using mmap() against the file
3) There are no biods running on the client

There are others, but those are the most likely candidates. However, since you're reproducing this behavior with the cp command (assuming this is the cp command that ships with HP-UX and not some GNU version of cp that behaves differently) my best guess would be there are no biods running on these systems.

Have you made sure biods are running on the "failing" clients? If there are no biods running then the client will be forced to do all of its own writing in it's own context and the writing will be extremely inefficient compared to writing with biods enabled.

For example:

_________________________________________________________


atcux7(/hp-1) -> model -D
ia64 hp server rx4640 (no description)

atcux7(/hp-1) -> uname -a
HP-UX atcux7 B.11.23 U ia64 4195776772 unlimited-user license

atcux7(/hp-1) -> nfsstat -m
/hp-1 from atcux13:/tmp (Addr 15.43.209.147)
Flags: vers=3,proto=tcp,auth=unix,hard,intr,link,symlink,devs,rsize=32768,wsize=32768,retrans=5
All: srtt= 0 ( 0ms), dev= 0 ( 0ms), cur= 0 ( 0ms)


atcux7(/hp-1) -> ps -ef | grep biods

atcux7(/hp-1) -> iozone -i 0 -r 32 -s 100m -w -+n
Iozone: Performance Test of File I/O
Version $Revision: 3.263 $
Compiled for 32 bit mode.
Build: hpuxs-11.0

Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
Al Slater, Scott Rhine, Mike Wisner, Ken Goss
Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
Randy Dunlap, Mark Montague, Dan Million,
Jean-Marc Zucconi, Jeff Blomberg,
Erik Habbinga, Kris Strecker, Walter Wong.

Run began: Tue Nov 28 11:58:31 2006

Record Size 32 KB
File size set to 102400 KB
Setting no_unlink
No retest option selected
Command line used: iozone -i 0 -s 100m -w -+n
Output is in Kbytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
random random bkwd record stride
KB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread
102400 32 8782 0

iozone test complete.

atcux7(/hp-1) -> /usr/sbin/biod 16

atcux7(/hp-1) -> iozone -i 0 -r 32 -s 100m -w -+n
Iozone: Performance Test of File I/O
Version $Revision: 3.263 $
Compiled for 32 bit mode.
Build: hpuxs-11.0

Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
Al Slater, Scott Rhine, Mike Wisner, Ken Goss
Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
Randy Dunlap, Mark Montague, Dan Million,
Jean-Marc Zucconi, Jeff Blomberg,
Erik Habbinga, Kris Strecker, Walter Wong.

Run began: Tue Nov 28 12:03:55 2006

Record Size 32 KB
File size set to 102400 KB
Setting no_unlink
No retest option selected
Command line used: iozone -i 0 -r 32 -s 100m -w -+n
Output is in Kbytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
random random bkwd record stride
KB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread
102400 32 479185 0
_________________________________________________________


On my 11.23 system I mounted an NFS filesystem from a remote system using the same (default) mount options you're using. I killed all the biods and then started an iozone test to write 100MB to the NFS filesystem. With no biods running the test yielded 8782 Kbytes/second. I then started the biods and ran the same test and it yielded 479185 Kbytes/second.

Can you please verify that biods are running on all the clients?

Regards,

Dave


I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
Barry C. Dowell
Frequent Advisor

Re: NFS performance problems - large file writes slow system severely

Hi again Dave - sorry I still don't have IOZone installed on these boxes (multiple reasons, one of which is space considerations, box "H" is basically "full" on it's available local drive space, drive "R" is a bit better, but then we move on to me not quite knowing how to find proper pre-compiled versions or makeing versions that will run on our system(s)).

Anyway, noticing some discrepancies here though:

Box "R" shows:

# ps -ef | grep biods
root 9273 5880 0 15:41:02 pts/3 0:00 grep biods
# ps -ef | grep biod
root 3222 1 0 11:13:35 ? 0:00 /usr/sbin/biod 16
root 3220 1 0 11:13:35 ? 0:00 /usr/sbin/biod 16
root 3223 1 0 11:13:35 ? 0:00 /usr/sbin/biod 16
root 3221 1 0 11:13:35 ? 0:00 /usr/sbin/biod 16
root 3224 1 0 11:13:35 ? 0:00 /usr/sbin/biod 16
root 3225 1 0 11:13:35 ? 0:00 /usr/sbin/biod 16
root 3226 1 0 11:13:35 ? 0:00 /usr/sbin/biod 16
root 3227 1 0 11:13:35 ? 0:00 /usr/sbin/biod 16
root 3228 1 0 11:13:35 ? 0:00 /usr/sbin/biod 16
root 3229 1 0 11:13:35 ? 0:00 /usr/sbin/biod 16
root 3230 1 0 11:13:35 ? 0:00 /usr/sbin/biod 16
root 3231 1 0 11:13:35 ? 0:00 /usr/sbin/biod 16
root 3232 1 0 11:13:35 ? 0:00 /usr/sbin/biod 16
root 3233 1 0 11:13:35 ? 0:00 /usr/sbin/biod 16
root 3234 1 0 11:13:35 ? 0:00 /usr/sbin/biod 16
root 3235 1 0 11:13:35 ? 0:00 /usr/sbin/biod 16
root 9279 5880 0 15:41:05 pts/3 0:00 grep biod
# uname -a
HP-UX -- hostname: R-- B.11.11 U 9000/785 2014004612 unlimited-user license
#


Box "H" shows:

# uname -a
HP-UX -- hostname: H -- B.11.11 U 9000/785 2016167044 unlimited-user license
# ps -ef | grep biods
root 19987 16513 1 15:40:35 pts/14 0:00 grep biods
# ps -ef | grep biod
root 19993 16513 1 15:41:05 pts/14 0:00 grep biod
#


So, it would seem that box H is not running biods while box "R" should be.

H definitely shows (by memory) the worst performance of the bunch, but that is by anecdotal evidence only. I could be wrong and don't want to mis-speak.

H has not currently been patched up, but will be later tonite.


R is a box that is much more limited on memory (see details much further above). H has 2 GB of RAM installed currently. (I believe R is 512 MB)


Something else of note -- when I bring up Glance plus on host: H and look at Disk Rpt, I see the line for Mem Util shows Current, Avg, and High all at 75% on box H, with half of the bar showing "B" (Buffer Cache?), approximately 15% showing "S", and approx. 10% showing "U"

On host: R, same place I see the Mem Util line showing Current, Avg, and High stuck at 93%, and I believe I've seen them as high as 94% (if not higher). Again, that box is much more memory limited. B is still showing for approximately 50% of the bar, with S and U at fairly equal amounts.


Again, hopefully some of this information helps along the way.
Barry C. Dowell
Frequent Advisor

Re: NFS performance problems - large file writes slow system severely

Dave Olker asked:
Do you get the same behavior regardless of which NFS mounted filesystem you are writing to? It looks like you have 7 different filesystems mounted from serverY. Do they all exhibit this behavior or only one of them?

--

In answer to this, yes there are several different mount points on server Y (the Windows box that is serving up NFS).

On the Windows box there is a structure similar to this:

Unix Shares (folder)
|--- Projects
|--- Missions
|--- Home

etc.

All on the same physical disk. Permissions are all Unix based. Using Microsoft's Server for NFS (for Windows Server 2003 R2). Properties for the server are pretty much defaults though we don't have auditing logging that much because it was believed that the audit logging might have been slowing things down.

We run McAfee Enterprise 8.0i on the system as the virus checker. We've *excluded* the entire Unix Shares structure and all sub directories so that we don't have to worry about the A/V program slowing down to look at every file.


Performance for any of these share points is bad.


Now, you've brought up something else that I suppose could be involved here though and has been reported as an "issue" by one of the developers that uses the box. He complained of locking problems. Hard to have more details here, but it sent me down the path to finding that I should probably be including the -C option for lockd (on the HP UX clients).
At least that is what I think we should be doing since the server is NOT on an HP UX box.
The user reported the locking issue (locks not functioning correctly?!) on host: F, which is the box that seems to be working the best. I nuked (kill'ed) rpc.statd and rpc.lockd and restarted same on the box. Used -C at the end of command line for restarting rpc.lockd

Host "R" has been rebooted several times since adding -C into the options in /etc/rc.config.d/nfsconf for LOCKD_OPTIONS (so that line now looks like this:
LOCKD_OPTIONS="-C"


Host "H" is a pain-in-the-butt to get reset or to restart the NFS Client on because the box is fairly constantly used by our development team/testers. I'm slated to restart that box at approx. 5pm tonite to get patches put into place.


Just to make sure that things are started properly on host "F", I'm slated to knock the users off that box at 5pm also and restart NFS Client so that it starts up with the -C option for sure.


One more thing -- NUM_NFSIOD=16 is the setting on each of the boxes.

I had actually stepped up NUM_NFSD so it's at NUM_NFSD=64 on these boxes. The only box that normally is exporting to (mounted by) the other boxes is "H". The other boxes all mount "Y" and the others ("R" and "F" mount *one* share from "H").
Dave Olker
Neighborhood Moderator

Re: NFS performance problems - large file writes slow system severely

Barry,

I would leave NUM_NFSIODS set to 16 on these systems. I wouldn't recommend increasing this number past 16 until we understand this problem better. My concern is if NUM_NFSIODS is set to 16 on all the systems and yet some of them are not running biods, why? Were they terminated?

Keep in mind, you can terminate and start the biods at any time while the system is running in production. If you have systems that are acting as NFS clients and they're not running biods, I'd recommend issuing the command "/usr/sbin/biod 16" and then try testing again.

Again, no harm in starting the biods while users are on the system. If you're planning on terminating the biods on a system while users are using the system, please make sure you terminate all of them. Don't terminate some of them, leaving some behind, and then try starting more. That can lead to problems.

Regards,

Dave



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
Barry C. Dowell
Frequent Advisor

Re: NFS performance problems - large file writes slow system severely

More follow-up.

I patched up system H and went through the necessary reboot. Confirmed after reboot that biods are running (16 of them are).

Ran simple test:

time dd if=/dev/zero of=/home/bdowell/test.fil bs=32k count=100

Resulting in real time of 47 seconds (assuming it's seconds that is reported)

Copied that output file back to local disk to check on read performance:

time cp /home/bdowell/test.fil /localpath/test.fil

Time results of 1:06.6

I'm figuring that roughly 50 seconds of that time was the reading, with the rest the resultant write operation.


It definitely seems that HP UX 11.11i (as we have it here at least) is not wanting to use any caching on the NFS mounted drives from NON HP systems.

Now, I can run the same test using a different mount point, mounted from another HP UX 11.11i system and see better results:

time dd if=/dev/zero of=/mountonanotherhp11.11box/test2.fil bs=32k count=100

100+0 records in
100+0 records out

real 29.5


For the record, the mountpoint in this case is over on system "R", which is the box that I would call "memory constrained" (but which doesn't have any users doing anything on it currently).

The results are worse (I think) if I up the count for the number of records being dealt with.
Dave Olker
Neighborhood Moderator

Re: NFS performance problems - large file writes slow system severely

Barry,

I'd suggest you collect two nettl traces on the failing NFS client: one writing to the Windows server and one writing to the HP-UX server. I'd like to see what kinds of WRITE requests are going over the wire to these servers and what kind of WRITE replies are coming back from the servers in both the HP-UX and Windows cases.

Let me know if you need help collecting these traces.

Regards,

Dave


I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
Barry C. Dowell
Frequent Advisor

Re: NFS performance problems - large file writes slow system severely

Thanks again for your help Dave. It's nigh on quittin' time here for the day, but I'll try to work with the rest of the team here on the traces and see what we can come up with on Wednesday (11/29/06). Hopefully have something to get back to you with soon after.

Note, I've also turned up the auditing for *everything* on the NFS server side so I can try getting information from there too.
Barry C. Dowell
Frequent Advisor

Re: NFS performance problems - large file writes slow system severely

Ok, another follow-up to this saga (for now). We're discussing exploring other options for the NFS server, i.e., switching to using a Linix box instead of Windows.

Dave - if you don't mind I'd still like to work on the traces and may need some additional assistance to get them done, but at this point I'd be better off working through e-mail if that is ok since I have to be careful with hostnames/IP address information that I put out in the public eye.

This is all still a curious and frustrating issue. Why 10.20 is happy, and why 11.23 is also happy when 11.11 isn't is what makes it so curious.

I've seen some indications in MS related areas talking about the idea that Microsoft's NFS server does *no* caching unless the client asks for it. It has me thinking that perhaps the 11.11 clients are not defaulting to asking for or allowing caching while the 11.23 box does, but that's just a guess.

Anyway, I hope to continue towards figuring this issue out, but probably am best to work mostly offline until a resolution is reached (offline being more one-on-one e-mails).

Thanks all.
Alzhy
Honored Contributor

Re: NFS performance problems - large file writes slow system severely

Maybe it is your 11.11 Box. Just too slow.. just too low on memory... just too many users ON when you performed thy tests? Have you tried mounting just 1 NFS share from your Windows NFS server? Have you checked if there is a patch needed on Windows for a known 11.11 interaction? Is your WIndows NFS Service pasrt of "Windows Services For UNIX (SFU)"? WIndows SFU I've used in quite a number of NAS Builds with flying colours -- most are Solaris Clients though and a sprinkling of Linux machines + 2 11.11 clients (faster machines though.... and GigE)

Hakuna Matata.
Barry C. Dowell
Frequent Advisor

Re: NFS performance problems - large file writes slow system severely

Finally, I think we (with much help from Dave Olker { tips cap towards Dave Olker }) have gotten this performance issue resolved. Or at least I hope we have.

It looks like the problem all along has been that these mount points were trying to work with TCP as the protocol.

Switching to UDP instead of TCP has made orders of magnitude difference in the speed of these mount points.

Speculating here that for some reason HP-UX NFS Client to other systems using TCP is problematic (performance wise) and for some reason when using TCP the buffering that would normally take place through the use of biods either isn't happening or is happening but is slowed severely because of the time required to actually physically write the data and acknowledge all of the writes, etc.

The weird thing is that HP-UX to other HP-UX systems is fine with TCP as the protocol for the NFS mounts. HP-UX 11.23 to non-HP systems is fine, and HP-UX 10.20 to non-HP systems is also fine.

In anycase, we seem to have gotten back to normal performance levels and our development team (the users of these systems) should be much happier now that things are working normally for them.

Thanks to everyone for their assistance, and I hope the tip (thanks again Dave Olker) about using UDP instead of TCP (the default protocol used in the mount options) is helpful to others who might encounter this problem in the future.