Operating System - HP-UX
1838135 Members
3390 Online
110124 Solutions
New Discussion

NFS slow due to in different subnet??

 
kholikt
Super Advisor

NFS slow due to in different subnet??

Hi,

I was trying to create a make_net_recovery for our DNS server to our ignite ux server. Both servers are in different subnet.

My problem here is the make_net_recovery will take several days to finish even the archives itself is still less than 400MB.

After several investigation, I found the following message in our syslog.

Mar 8 08:35:49 adns02 vmunix: NFS server esc_ux not responding still trying
Mar 8 08:35:49 adns02 vmunix: NFS server esc_ux ok
Mar 8 08:36:04 adns02 above message repeats 4 times

It seem to be the NFS cannot maintain a consistent connection between this two servers.

I have tried to do a ftp to transfer 14MB files across this two server, it only took less than two minutes to finish the transfer.

Is there anyway of increase the NFS performance? Both servers are in the same building but they are just in different subnet.

Any help will be appreciated...
abc
4 REPLIES 4
kholikt
Super Advisor

Re: NFS slow due to in different subnet??

one more point I want to add.. for the client server....I do a nfsstat and get the following result

Client rpc:
Connection oriented:
N/A
Connectionless oriented:
calls badcalls retrans badxids timeouts waits newcreds
89593 28763 196532 1 225285 0 0
badverfs timers toobig nomem cantsend bufulocks
0 43747 0 0 0 0

Client nfs:
calls badcalls clgets cltoomany
60840 10 60840 0
Version 2: (441 calls)
null getattr setattr root lookup readlink read
0 0% 384 87% 0 0% 0 0% 0 0% 0 0% 0 0%
wrcache write create remove rename link symlink
0 0% 0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
mkdir rmdir readdir statfs
0 0% 0 0% 2 0% 55 12%
Version 3: (60399 calls)
null getattr setattr lookup access readlink read
0 0% 196 0% 107 0% 263 0% 118 0% 0 0% 4355 7%
write create mkdir symlink mknod remove rmdir
48188 79% 31 0% 13 0% 17 0% 0 0% 21 0% 0 0%
rename link readdir readdir+ fsstat fsinfo pathconf
1 0% 0 0% 0 0% 12 0% 62 0% 10 0% 4 0%
abc
Brian Hackley
Honored Contributor

Re: NFS slow due to in different subnet??

Hi,
Quick test would be to try NFS mount with very small read/write size of 1024. If you're still stuck with slow performance, check out the NFS Performance Assessment application note support document #KBAN00000261 for additoinal tips.
Hope this helps,
-> Brian Hackley
Ask me about telecommuting!
MARTINACHE
Respected Contributor

Re: NFS slow due to in different subnet??

Hi,

Did you apply latest NFS performance patch ?

Here the patch for HP-UX 11.00 :

Patch Name: PHNE_22125
Patch Description: s700_800 11.00 ONC/NFS General Release/Performance Patch
Creation Date: 00/11/13
Post Date: 00/11/15
Hardware Platforms - OS Releases:
s700: 11.00
s800: 11.00

Regards,

Patrice.
Patrice MARTINACHE
rick jones
Honored Contributor

Re: NFS slow due to in different subnet??

Much of the time, NFS transfers are done with 8192 byte blocks. Across most networks, that becomes 6 IP datagram fragments.

All of the fragments of the IP datagram must make it to the destination. If any one fragment is lost, the entire IP datagram is discarded. IP does not retransmit, nor does UDP, so it is up to NFS - the client in this case - to rerequest the operation.

The NFS retransmission timer is something like 700 milliseconds.

If the average packet loss probability is p (eg .01 for 1% packet loss), the probability of any one fragment getting through is 1-p. The probability then of the entire datagram getting through is (1-p)^numfrag or in this example .99^6 or ~.94. This means that 6% of the requests will have to be retransmitted.

If we assume that the response time for an NFS request that is not lost is 10 milliseconds, we can calculate the average response time with retransmissions included (ignoring the possibility that the retransmit is lost...)

RSPTnoloss * Pnoloss + RSPTloss * Ploss

or in this case

10 * .94 + 710 * .06
9.4 + 42.6

or 52 milliseconds per transaction, which means ~19.23 transactions per second at 8192 bytes per transaction or ~153KByte/s or about 1.2 megabits per second.

If you have messages saying that the NFS server is not responding, it means that several retransmissions of a request took place, which probably means that the average packet loss rate is even higher.

FTP is more robust in this area because it uses TCP. First, TCP avoids sending segments that would have to be fragmented. So, if one TCP segment out of 6 are lost, only that one segment has to be retransmitted. Also, TCP has a minimum retransmission timer of 500 milliseconds. Further, TCP has the ability through something called fast retransmit to detect packet loss in less than one round-trip-time, which would mean retransmitting in much less than 500 millliseconds.

The suggestion to go to a 1024 byte mount size helps generally because it means that the NFS requests will no longer be carried in fragmented IP/UDP datagrams, so that multiplicitive (or is it exponential?) increase of the loss rate does not happen. It sill requires the slot retransmission timeout.

I would suggest you try to find the source of the packet losses. A network with that much loss is possibly broken. Starting at the side sending data (the client I suspect) you check the nfsstats as you have and see the retransmissions there. Then check the netstat stats for UDP and IP, though on the sending side those are generally rather boring. Next, examine the lanadmin statistics on the client. In particular, look for Outbound Discards and Errors. While you are there, check the duplex setting of your interface and make sure it matches that of your switch port.

Next, check the similar link and IP stats on the router(s) joining the two subnets. Finally look at the stats on the server - starting with lanadmin and Inbound * and working your way up through IP, and UDP. In IP in particular check for fragments being dropped due to dup or out of space, and in UDP check for checksum failures and socket overflows.
which would be
there is no rest for the wicked yet the virtuous have no pillows