- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - Linux
- >
- NC510F + Fedora Core 8 - LSA = very bad perfs
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-19-2008 05:35 AM
05-19-2008 05:35 AM
NC510F + Fedora Core 8 - LSA = very bad perfs
I'm using 2 NC510F boards on Fecora Core 8 (kernel 2.6.23.1-42.fc8) and I have very bad performances (460Mbit/s) without HP Linux Socket Acceleration.
The boards are tied together by a fiber optic cable (1m/62.5/125) and yhey have a static IP address each.
I've installed nx_nic-3.4.336-1 and nx_tools-3.4.336-1 but not the nxlsa_3.4.336-1 (it does not compile).
The network performance is mesured by "ping -q -i 0 -s 65507 IP_ADDRx" on each interface
I've only 460Mbits/s as the network bandwidth where in another system I've 722Mbits/s on a 1 gig network (with nVidia chips).
what's wrong ?
Is LSA absolutely required to obtain acceptable performances ? Or LSA is "just" needed to decresase CPU usage ?
Regards,
Steve
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-20-2008 09:10 AM
05-20-2008 09:10 AM
Re: NC510F + Fedora Core 8 - LSA = very bad perfs
In what sort of system(s) are the NC510Fs installed?
Into which slot(s)?
Are they _electrically_ x8 slots?
To which CPU(s) are interrupts from the NIC(s) being sent? grep
Next - when running this ping test is there any one CPU on either end at 100% CPU utilization?
After that, what do you get with a netperf TCP_STREAM test between the two endpoints?
What are the settings for tcp_rmem and tcp_wmem on each side?
sysctl -a | grep rmem
sysctl -a | grep wmem
Does any one CPU saturate when running the netperf TCP_STREAM test?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-22-2008 12:47 AM
05-22-2008 12:47 AM
Re: NC510F + Fedora Core 8 - LSA = very bad perfs
My answers are below :
> In what sort of system(s) are the NC510Fs
> installed?
CPU : Bi DualCore Intel Xeon32 (5138@2.13GHz)
FSB : 1033MHz
RAM : 4Go DDR2-667MHz
OS : Fedora Core 8 (kernel 2.6.23.1-42.fc8)
CPU0/CPU1 + CPU2/CPU3
> Into which slot(s)?
> Are they _electrically_ x8 slots?
PCIe x8 (please see lspci logfile in attachment)
> To which CPU(s) are interrupts from the
> NIC(s) being sent? grep
> /proc/interrupts
CPU 3
> Next - when running this ping test is there
> any one CPU on either end at 100% CPU
> utilization?
[Sender]
No, one CPU at 10%, the others are idle.
[Receiver]
No, one CPU at 20%, the others are idle.
> After that, what do you get with a netperf
> TCP_STREAM test between the two endpoints?
netperf is not installed. So I've grabbed netperf-2.4.4 from ftp://ftp.netperf.org/netperf.
Configure is OK but when I make it I get netlib.c error (undef reference to __CPU_ZERO and __CPU_SET).
> What are the settings for tcp_rmem and
> tcp_wmem on each side?
Both side, the same values:
net.core.rmem_max = 131071
net.core.rmem_default = 111616
net.ipv4.tcp_rmem = 4096 87380 4194304
net.core.wmem_max = 131071
net.core.wmem_default = 111616
net.ipv4.tcp_wmem = 4096 16384 4194304
vm.lowmem_reserve_ration = 256 32 32
> Does any one CPU saturate when running the
> netperf TCP_STREAM test?
Not performed.
Regards,
Steve
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-22-2008 09:16 AM
05-22-2008 09:16 AM
Re: NC510F + Fedora Core 8 - LSA = very bad perfs
None of the CPU's are saturated during the test - also good.
Is CPU 3 the only CPU assigned interrupts from the NIC? IIRC the 336 bits (re)enable MSI-X support and should give you four or five IRQs associated with the interface. What sort of host system model is this, and are there any messages about msi (dmes | grep -i msi) in dmesg? There are some platforms on which MSI-X won't be enabled - I don't know the list myself though.
Being "Mr. Netperf" I'm always leary of using ping for bandwidth measures :) so getting netperf going would be goodness. WRT those compile errors, 2.4.4 got hit by another change in the UI for sched_setaffinity(). That is fixed in the top-of-trunk version. If you have a subversion client on your system(s), you can point it at:
http://www.netperf.org/svn/netperf2/trunk/
or you can just look at src/netlib.c there and back-port the change to your 2.4.4 bits. The other option is to simply comment-out the HAVE_SCHED_SET_AFFINITY (IIRC) in config.h and recompile. Netperf/netserver will lose the ability to bind to a specific CPU, but if need be we can workaround that with taskset or numactl.
When we have netperf/netserver compiled and running, if using explicit setsockopt() calls (eg -s and -S test-specific options) it will be necessary to tweak net.core.rmem_max and net.core.wmem_max to something like 2MB otherwise the setsockopt() calls will be clipped.
My "cannonical" netperf tests for a first pass would be a unidirectional TCP_STREAM test, a single-connection bidirectional bulk TCP_RR test, and a single-connection, single-byte TCP_RR test:
netperf -c -C -t TCP_STREAM -l 30 -i 30,3 -H
netperf -c -C -t TCP_RR -r m -l 30 -i 30,3 -H
netperf -c -C -t TCP_RR -l 20 -i 30,3 -H
after having ./configured netperf with --enable-burst to enable the test-specific -b option above. The -i 30,3 tells it to run at least three iterations and no more than 30 in an attempt to be (by default unless a -I option is present) 995 confident the result reported is within +/- 2.5% of the actual mean. So, each of those commands will run anywhere between 90 and 900 seconds. You can omit the -i option if you are pressed for time.
Single-stream performance is of course not _everything_ and once we get past the single-stream stuff we can discuss using netperf to measure aggregate performance. We can do that here, or perhaps better suited to the netperf-talk mailing list hosted on netperf.org. Up to you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-26-2008 12:05 AM
05-26-2008 12:05 AM
Re: NC510F + Fedora Core 8 - LSA = very bad perfs
I definitively forget ping for testing network performance. netperf is simply awesome. Thanks for this great utility. I'm now above the gig.
(but not at 10 gigs yet;)). Please find the result of the 3 tests in attachment.
Other things I've moted about my platform:
1. APCI is not turned on due to the DMI is not present.
2. when I launch nxudiag -a, only the interrupt test has failed. Do I have MSI/MSIx issues with my platform ?
3. I'm using io sched cfq.
Regards,
Steve
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-27-2008 08:29 AM
05-27-2008 08:29 AM
Re: NC510F + Fedora Core 8 - LSA = very bad perfs
Being a four-core system, the 25% CPU util reported by netperf for the receiver suggests that one of the CPUs, probably the one taking interrupts, was saturated during the TCP_STREAM test.
The "single-connection, bidirectional" TCP_RR test was missing the -f m - I'm not sure what it would have done with -r m as a global option :) It may be part of why the result appears to be so low, assuming I did my sums correctly.
When I last installed "336" onto a system, one thing I forgot was to flash the NIC with nx_tools - as such there was a mismatch between firmware and driver which precluded using MSI-X. I'd remembered to install nx_tools, but didn't remember that I still had to run the flash utility. In my defense :) "ethtool" was told (IIRC by the driver) and so was telling me, that the NIC firmware was the same rev as the driver :( The way I found-out there was a mismatch was to troll through dmesg output for strings with "nx" in them:
dmesg | grep -i nx | less
I have been told that as the firmware gets rev'ed the NIC does improve performance - probably not "and then a miracle occurred" but probably still worthwhile.
WRT no DMI etc - just what model system is this again? I promise I won't stop discussion if it isn't HP, I simply want to know more about the system we are dealing with.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-27-2008 02:33 PM
05-27-2008 02:33 PM
Re: NC510F + Fedora Core 8 - LSA = very bad perfs
If _all_ the traffic is TCP and the systems with the NC510's aren't to be used as routers or bridges then you can get away with enabling the 9000 byte MTU on the other kit - the TCP MSS exchange at the time of connection establishment will paper over the MTU difference automagically.
However, if the systems with the NC510's are to be routers/bridges, or the comms will include UDP or something that doesn't exchange MSSes like TCP everything needs to be 8000 bytes.
That will likely significantly increase the results you get on the TCP_STREAM and the "single connection bidirectional bulk transfer TCP_RR test. It is unlikely to have much effect on the single-byte TCP_RR test.
At the root of much of this is a simple observation - as far as the de jure Ethernet standards are concerned, it takes just as many CPU cycles to exchange an Ethernet frame on 10G as it did on 1G as it did on 100BT as it did on 10BT. So, if you had a system with a 1G NIC that could get 1G using 1/2 of a CPU, in broad handwaving terms, you should not expect much more than 2 gig from that system with a 10G NIC.
Now, as the NICs have progressed from 10BT to 100BT to 1G to 10G the _implementations_ have added things to make it easier on the host - features like ChecKsum Offload (CKO) or Transport Segmentation Offload (TSO) or Large Receive Offload (LRO) or multiple queue support. Jumbo frame is one of those as well since it is an implementation detail not something provided by the IEEE specs.
Those first three - CKO, TSO and LRO (and JF too) are things that will improve the performance of a single connection. The latter - multiple queue support - is something that really only kicks-in with multiple concurrent connections. NICs have also included interrupt avoidance and/or coalescing mechanisms. Those can be two-edged - improving CPU utilization for bulk transfer but sometimes at the cost of increased latency (lower single-byte TCP_RR performance).
And as if I have not digressed enough... :) If you got netperf working in a way that included a working sched_setaffinity call, you can try affinitizing the netperf/netserver to a CPU other than the one taking interrupts from the NC510. In the TCP_STREAM case that may increase the performance you see with standard 1500 byte frames as it will have two CPUs working the problem. Normally the Linux stack will tend to cause the receiving process to run on the same CPU as took the interrupt from the NIC. With multiple core processors it is probably best to bind netperf/netserver to a core in the same processor rather than a core in another processor. Which cores are in which processor is a task left to the reader :) and can perhaps be deduced by looking at the output of /proc/cpuinfo and the various ID's therein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-04-2008 05:45 PM
12-04-2008 05:45 PM