ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

DL160 G5 Poor Network Throughput?

David_108
Occasional Visitor

DL160 G5 Poor Network Throughput?

Hello,

I am trying to set up a DL160 G5 as a high-performance squid server. It has 12 GB memory, and an Intel pcie nic in addition to the mb NICs.

I installed SuSE Enterprise server 10 SP2 on it, with the squid package.. but noticed the throughput is very slow. Squid does not seem to be the problem since with squid not running, doing a bandwidth test on the server OS itself, I get only about 10 MB/s maximum, whereas another server on the same segment can get 46 MB/s or more. Data seems to be transferring in a burst / hold / burst fashon. I do not have a second DL160 to test identical hardware. The other server is a DL380 G5 also with SLES 10 SP2.

I tried all the obvious things so far but I am stumped (Tried duplex mismatch, etc) No errors or dropped frames reported on the switch or in ifconfig.

I do have the 3 NICs on the same subnet. I modified the arp parameters to avoid the arp flux problem, however the issue did not go away.

Any ideas?
7 REPLIES
rick jones
Honored Contributor

Re: DL160 G5 Poor Network Throughput?

How about netstat stats? Any drops reported up there?

What are you using to check the throughput? Are you setting socket buffer sizes?

Do the two systems have the same sysctl settings?

Do the two systems strap the NICs with the same settings (ethtool -i etc)?

Are you in a position to disable two of the three interfaces just to make sure that there still isn't something else about multiple NICs configured into the same IP subnet?

Silly question, but when you modified the ARP parms, did you just set "default" or did you set for each of the interfaces? Did you flush the ARP caches on either end?
there is no rest for the wicked yet the virtuous have no pillows
David_108
Occasional Visitor

Re: DL160 G5 Poor Network Throughput?

Hi, Rick,

Thanks for the suggestions. When I get back in the office on Monday I will look into them. I did a bunch of troubleshooting today and I am scratching my head. Let me give you more info.

While testing the squid proxy I noticed downloads were slow. I tested the bandwidth using an online tester, bandwidth.com. On my client thru squid I was getting around 10 MB. We have a 100 MB connection.

If I direct my client to the old proxy (BorderManager) I get 46 MB. If I set up workstations (one Mac OSX the other Windows) to access the internet through the new firewall (no proxy) I get the 46 MB.

If I go to the squid server and test directly from a browser there, without squid running I get 10 MB. HOWEVER.. testing a file transfer internally to the server is wire speed! QCheck also shows 1GB internally with a warning that the transfer is too fast! Accessing the internet through the old proxy from the new proxy gets me 46MB! I took ethereal traces during both downloads... they have different profiles as far as throughput & time. However I am not as conversant as I'd like to be with sniffers.. but there is something *different* about the connection between squid and the firewall.

I did disconnect, deactivate, etc the extra NICs. I also removed the Intel card, and set the two built-ins to different subnets... no difference. I rebuilt the server with the 32 bit version of SLES 10 SP2, rather than 64... with the NICs on different subnets... no difference.

I tested from other machines on the subnet. Get this. On a virtual machine (SLES 10 SP1 running on XEN host HP DL380G5) I got the 46MB transfer on the internet. However, on a DL360G5 server running SLES 10 SP1 directly I also got the 10MB speed on the internet and fast file transfers locally.

So, the problem seems to be between a SLES machine running directly on HP hardware... or at least a 360 and 160 G5 (although the problem was the same with the Intel NIC) and its interaction with the firewall, which is running pfSense 1.2 (free BSD) on an HPDL140G3. Other computers can get the full throughput through the firewall while these can't? It is very strange. And in order for SLES to have the full throughput, you need to be running it in XEN with a paravirtualized NIC? Something esoteric is odd here..

rick jones
Honored Contributor

Re: DL160 G5 Poor Network Throughput?

Do you really mean B as in Bytes, or did you mean to say b as in bits?

So, plain transfer over the internet on this SLES system is slow, local tests are fast. It will be very interesting when you can get onto both systems - the slow and the fist and compare their sysctl settings, particularly for TCP and sockets.

Are either both using AppArmor? How about the internal firewalls on each?
there is no rest for the wicked yet the virtuous have no pillows
David_108
Occasional Visitor

Re: DL160 G5 Poor Network Throughput?

Hi, Rick,

Actually it is bits.. I'm looking at the sysctl parameters of two servers, both SLES but one with OES which is fast... and there is a difference:

The slow server has the following items which the other does not:
net.ipv4.tcp_slow_start_after_idle = 1
net.core.netdev_timeout_action = 0
net.core.xfrm_larval_drop = 0

Otherwise, the net settings look the same.
Any of those ring a bell?

Thanks again for your help!
David_108
Occasional Visitor

Re: DL160 G5 Poor Network Throughput?

Answers to your other questions:

I believe AppArmor is installed if it was by default on both the OES and SLES servers. I have not done anything to configure it.

I disabled the SUSE firewall on both.

Thanks
rick jones
Honored Contributor

Re: DL160 G5 Poor Network Throughput?

Personally I've never been fond of slow start after idle - I think it is _too_ conservative in what it sends. That could affect squid - forcing TCP to go through slow-start for each retrieved URL if the inter-request gap were large enough. It wouldn't affect (IIRC) something like say a netperf TCP_STREAM test. Not sure if it would affect a TCP_RR test or not.

If nothing else, toggling that one and trying things again wouldn't hurt. I just cannot call it a smoking gun. Probably best to restart apps of interest after the change.

After that, actual packet traces of a "fast" vs "slow" transfer would be in order.
there is no rest for the wicked yet the virtuous have no pillows
Douw G Steyn
Occasional Contributor

Re: DL160 G5 Poor Network Throughput?

Hi David,

I just posted a problem related to Network problems and a DL160 G5. Is your server configured with 2 x Quad-Core processors? My network problems are only experienced with outbound packets being dropped and increasing response times when 2 CPU's are installed. If I reduce my system to a single CPU config the network responds fine. Just curious whether your config is the same. My problem exists for internal network cards as well as when I add a PCI-e NIC to the system.

Regards,