Operating System - Linux
1839319 Members
2761 Online
110138 Solutions
New Discussion

Re: Slow NFS with Redhat 4 update 1

 
Steven E. Protter
Exalted Contributor

Slow NFS with Redhat 4 update 1

Got a problem with a RedHat 4 update 1 machine. Most of our RH 4 machines are running update 2. This one has been in production a while and still runs update 1.

NFS file copies are very slow on this machine, even though it has the same NIC connection speed and NFS rights as update 2 machine.

The copy difference is about 8 to 1 in time use versus and update 2 machine.

Obviously, you see I'm prejudiced to upgrade the machine. What I need and will reward handsomely is reasoning. A story, I had a machine with the same issue and upgrading such and such fixed it.

Or some other idea that solved the problem.

Worklog:
/etc/resolv.conf,nsswitch.conf and networking checked. We are getting 100 BaseT full duplex on our LAN connect, same as the comparision servers.

I upgraded nfs-utils to release 65 last night along with required dependencies.

Here is the rub. If you track the copy job on a good machine versus a bad, the actual CPU time expended is exactly the same. It seems like the problem machine just isn't giving the I/O appropriate priority.

Just need a little evidence, or perhaps a sar job to run.

TIA

You'll be happy with the points you get out of this.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
13 REPLIES 13
Steven E. Protter
Exalted Contributor

Re: Slow NFS with Redhat 4 update 1

sar -u output.


12:00:01 AM CPU %user %nice %system %iowait %idle
12:10:01 AM all 0.07 0.00 0.02 0.05 99.87
12:20:01 AM all 0.05 0.00 0.01 0.04 99.90
12:30:02 AM all 0.06 0.00 0.02 0.04 99.88
12:40:01 AM all 0.05 0.00 0.01 0.04 99.90
12:50:01 AM all 0.05 0.00 0.01 0.04 99.90
01:00:01 AM all 0.05 0.00 0.01 0.03 99.91
01:10:01 AM all 0.05 0.00 0.01 0.04 99.90
01:20:01 AM all 0.05 0.00 0.01 0.03 99.90
01:30:01 AM all 0.05 0.00 0.01 0.03 99.91
01:40:01 AM all 0.06 0.00 0.01 0.03 99.90
01:50:01 AM all 0.05 0.00 0.01 0.03 99.91
02:00:01 AM all 0.05 0.00 0.01 0.03 99.91
02:10:02 AM all 0.05 0.00 0.01 0.04 99.90
02:20:01 AM all 0.05 0.00 0.01 0.04 99.91
02:30:01 AM all 0.05 0.00 0.01 0.03 99.91
02:40:01 AM all 0.06 0.00 0.01 0.03 99.91
02:50:01 AM all 0.05 0.00 0.01 0.03 99.91
03:00:01 AM all 0.05 0.00 0.01 0.03 99.91
03:10:01 AM all 0.05 0.00 0.01 0.04 99.90
03:20:01 AM all 0.05 0.00 0.01 0.03 99.91
03:30:01 AM all 0.05 0.00 0.01 0.04 99.91
03:40:01 AM all 0.05 0.00 0.01 0.04 99.90

03:40:01 AM CPU %user %nice %system %iowait %idle
03:50:01 AM all 0.05 0.00 0.01 0.03 99.91
04:00:01 AM all 0.05 0.00 0.01 0.04 99.90
04:10:01 AM all 0.51 0.03 0.27 2.71 96.48
04:20:01 AM all 0.04 0.00 0.01 0.03 99.91
04:30:01 AM all 6.35 0.00 6.63 2.59 84.44
04:40:01 AM all 0.13 0.00 0.10 0.20 99.58
04:50:01 AM all 0.05 0.00 0.01 0.03 99.91
05:00:01 AM all 0.05 0.00 0.01 0.03 99.91
05:10:01 AM all 0.05 0.00 0.01 0.04 99.90
05:20:01 AM all 0.05 0.00 0.01 0.03 99.91
05:30:02 AM all 0.05 0.00 0.01 0.03 99.91
05:40:01 AM all 0.05 0.00 0.01 0.03 99.91
05:50:01 AM all 0.05 0.00 0.01 0.03 99.91
06:00:04 AM all 0.07 0.00 0.01 0.04 99.88
06:10:01 AM all 0.03 0.00 0.01 0.04 99.93
06:20:01 AM all 0.05 0.00 0.01 0.04 99.90
06:30:01 AM all 0.05 0.00 0.01 0.03 99.91
06:40:01 AM all 0.05 0.00 0.01 0.03 99.91
06:50:01 AM all 0.05 0.00 0.01 0.04 99.90
07:00:01 AM all 0.05 0.00 0.01 0.03 99.91
07:10:01 AM all 0.05 0.00 0.01 0.04 99.90
07:20:01 AM all 0.05 0.00 0.01 0.04 99.91

07:20:01 AM CPU %user %nice %system %iowait %idle
07:30:02 AM all 0.05 0.00 0.01 0.03 99.91
07:40:01 AM all 0.06 0.00 0.01 0.03 99.90
07:50:01 AM all 0.05 0.00 0.01 0.03 99.91
08:00:01 AM all 0.05 0.00 0.01 0.04 99.91
08:10:01 AM all 0.05 0.00 0.06 0.16 99.72
08:20:01 AM all 0.05 0.00 0.01 0.07 99.87
08:30:01 AM all 0.05 0.00 0.01 0.04 99.90
08:40:01 AM all 0.05 0.00 0.01 0.03 99.91
08:50:01 AM all 0.05 0.00 0.01 0.06 99.87
09:00:01 AM all 0.19 0.00 0.02 0.08 99.71
09:10:01 AM all 0.05 0.00 0.01 0.06 99.88
09:20:01 AM all 0.05 0.00 0.12 1.09 98.74
09:30:01 AM all 0.10 0.00 1.29 36.74 61.87
09:40:01 AM all 0.31 0.00 0.24 0.44 99.01
09:50:02 AM all 38.84 0.00 1.37 2.36 57.43
10:00:01 AM all 11.61 0.00 1.48 2.75 84.16
10:10:01 AM all 0.21 0.00 0.11 0.46 99.22
10:20:01 AM all 0.13 0.00 0.17 1.27 98.43
10:30:02 AM all 0.07 0.00 0.13 1.62 98.18
10:40:01 AM all 0.05 0.00 0.07 0.35 99.53
10:50:01 AM all 0.37 0.04 0.23 1.21 98.15
11:00:01 AM all 0.36 0.00 0.31 0.73 98.60

11:00:01 AM CPU %user %nice %system %iowait %idle
11:10:01 AM all 2.07 0.00 2.92 4.70 90.31
11:20:01 AM all 0.65 0.00 0.63 0.25 98.48
11:30:01 AM all 0.41 0.00 0.73 16.44 82.42
11:40:01 AM all 0.93 0.00 1.12 0.70 97.25
11:50:01 AM all 1.91 0.00 2.05 1.96 94.09
12:00:02 PM all 0.57 0.00 1.90 38.60 58.93
12:10:01 PM all 3.58 0.08 3.86 4.40 88.08
12:20:01 PM all 2.50 0.38 2.42 1.09 93.60
12:30:01 PM all 0.88 0.42 2.29 1.94 94.48
Average: all 1.01 0.01 0.41 1.69 96.88

The high i/o wait times correspond with our tests.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Steven E. Protter
Exalted Contributor

Re: Slow NFS with Redhat 4 update 1

Further detail.

This problem is duplicated in IBM x336 server with Broadcom drivers bcm5700.

There may be an issue concerning support with RH 4.X

I have compiled and installed the latest Broadcom drivers to no effect.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Ivan Ferreira
Honored Contributor

Re: Slow NFS with Redhat 4 update 1

Have you verified the netstat -ni output for errors/collisions?

Are you using TCP or UDP?

nfsstat -c is the most valuable tool for troubleshooting nfs:

http://publib.boulder.ibm.com/infocenter/pseries/index.jsp?topic=/com.ibm.aix.doc/aixbman/prftungd/nfscliprfmon.htm

Also, the NFS HOWTO has good performance tips:

http://www.faqs.org/docs/Linux-HOWTO/NFS-HOWTO.html#PERFORMANCE

Of course, it's very probably that you alredy checked that.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Steven E. Protter
Exalted Contributor

Re: Slow NFS with Redhat 4 update 1

The problems are with this box as a client.

We're pretty sure now its network driver. The problem is finding a network driver thats compatible.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Pratyush Paul_1
Valued Contributor

Re: Slow NFS with Redhat 4 update 1

SEP -

This is a driver issue with your NIC card in the system which is having issues, if you could probe the drivers and check with the system which is fine. I am certain there are some drivers/patches issue which blocks the IO performance thru put. I would suggest you compare the lsmod output of the 2 systems. And please do a sanity check on the /etc/modules.conf file too.

Please let me know how it goes.

regards

Pratyush
Die Hard
Steven E. Protter
Exalted Contributor

Re: Slow NFS with Redhat 4 update 1

With RH 4 its /etc/modprobe.conf

alias bond0 bonding
alias eth0 bcm5700
alias eth1 bcm5700
options bond0 miimon=100 mode=active-backup
alias scsi_hostadapter mptbase
alias scsi_hostadapter1 mptscsih
alias scsi_hostadapter2 ata_piix
alias usb-controller ehci-hcd
alias usb-controller1 uhci-hcd


Both cards are active. I think perhaps I need a better bonding driver.

Regards,

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
rick jones
Honored Contributor

Re: Slow NFS with Redhat 4 update 1

So, are the "good" systems using Broadcom NICs?

Are _all_ the stats from ethtool on up through NFS stats completely clean, or are there any retransmissions?

Are the good systems using bonding also?

How are the interrupt coalescing settings set on the good vs bad machines (ethtool -c IIRC?

Do the different machines have different performance on a netperf TCP_RR or UDP_RR test?
there is no rest for the wicked yet the virtuous have no pillows
Steven E. Protter
Exalted Contributor

Re: Slow NFS with Redhat 4 update 1

Yes, good RH 3 systems are using broadcom NIC's.

The alternative servers from HP also use the bcm5700 NIC cards, much as I hate them.

I have run scp tests and such and only have issues with RH 4. It would appear that neither IBM nor HP is providing good drivers.

Since our problem children are using IBM servers, IBM is going to be asked to fix it or provide an alternative solution.

Still, one would think that someone else has experienced this problem.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Steven E. Protter
Exalted Contributor

Re: Slow NFS with Redhat 4 update 1

We are going to drop an Intel NIC in the server and see if it solves the problem.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Steven E. Protter
Exalted Contributor

Re: Slow NFS with Redhat 4 update 1

Intel NIC did not help.

switched from the SMP to the stock kernel.

Helped ALOT!

Anyone else have the same results?

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Steven E. Protter
Exalted Contributor

Re: Slow NFS with Redhat 4 update 1

There is a newer smp kernel, going to try installing that.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
rick jones
Honored Contributor

Re: Slow NFS with Redhat 4 update 1

FWIW, the reason I asked about the NICs Intel vs Broadcom is the two drivers seem to set-up the NICs rather differently. The e1000 driver appears to set interrupt coalescing such that it rather strongly favors throughput over latency (default 8K netperf TCP_RR's per second versus 16K when one sets InterruptThrottleRate to 0) where the tg3 (not sure about bcm5700 driver) seems to have better "out of the box" latency, although one can improve it slightly tweaking coalescing parms.

That, and NFS being as much latency as throughput sensitive.
there is no rest for the wicked yet the virtuous have no pillows
Steven E. Protter
Exalted Contributor

Re: Slow NFS with Redhat 4 update 1

Its running tg3 broadcom on a 100 BaseT switch.

Seems that NFS problems in our shop made the situation seem worse that it was.

Seems to be limited to RH update 1. smp kernel. We can't test with the EL kernel because we don't have any machines at that level.

bcm5700 is possible next time we boot the box.

So is upgrade when the box gets back from a trip to a show in the USA.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com