Switches, Hubs, and Modems
cancel
Showing results for 
Search instead for 
Did you mean: 

NC522SFP 10Gbit card only pushes 2-3 Gbit - Win-2003 x64

Udi Karni
Occasional Advisor

NC522SFP 10Gbit card only pushes 2-3 Gbit - Win-2003 x64

We have 2 servers with NC522SFP 10Gbit cards running Windows 2003 SP2.

The 2 cards on the 2 servers are connected via crossover cables - so no network, latency, etc.

We can't seem to push more than 2-3 Gbit/sec even though it appears we are using the correct 64-bit drivers, etc.

??????????????
18 REPLIES
rick jones
Honored Contributor

Re: NC522SFP 10Gbit card only pushes 2-3 Gbit - Win-2003 x64

A litany of the usual sorts of boilerplate questions:

*) What sort of server hardware are you running?

*) are you using the latest firmware and drivers from www.hp.com? I think the driver is up to 4.0.505-1 - at least in the Linux space.

*) what are you using to measure transfer rate?

*) is any one CPU in your server(s) saturated when you are measuring transfer rates?

*) are there any packet losses reported for the protocol you are using for transfer? (eg tcp)

*) before you switched from 1 Gig to 10G, how saturated was your most heavily saturated CPU while doing transfers over the 1Gig link?
there is no rest for the wicked yet the virtuous have no pillows
Udi Karni
Occasional Advisor

Re: NC522SFP 10Gbit card only pushes 2-3 Gbit - Win-2003 x64

Thanks for picking this up.

The servers are DL785 running Win-2003 SP2 x64. They are running the latest firmware and drivers - 4.0.505.0 from 8/20/09.

We have Oracle 11g on both machines and we copy tables across an Oracle db-link. This is a CPU intensive process that maxes out one of the 32 CPUs while pushing about 35 MByte/sec. So - 3-4 of these max out a 1 Gbit connection, and since we want to run many of them concurrently - we are testing 10 Gbit. However - no matter how many we pile on - the curve turns horizontal past 2 Gbit and doesn't increase by much beyond that. We check the speed via perfmon - both for network and disk write speed. The disks on the receiving side can write well over 1 Gbyte/sec - so that's not a problem.

Also - every card has 2 ports - and the card in total - won't exceed 2-3 Gbit - with 2 ports - it just gets divided in half.

I don't think there are packets getting lost.

Before we switched we would run 4 concurrent jobs over 1 Gbit and see around 120 Mbyte/sec transfer rates - so that works flawlessly.

One thing to note - the card is on an X4 instead of the recommended X8 - we have no X8 available. However - we were led to believe that with an X4 we should still expect around 5 Gbit, but 2 Gbit is a little low.

Increasing Jumbo frames to 9000 added maybe 10%. Increasing the max number of Jumbo frames, number of receiving and sending buffers - and setting TCP parms in the registry - 1323=1 and Windowsize as high as 1/4GB didn't help at all.

help !
rick jones
Honored Contributor

Re: NC522SFP 10Gbit card only pushes 2-3 Gbit - Win-2003 x64

I've not had occasion to play with an NC522SFP in a DL785, but I have been playing with one recently on a DL585 G6 (which G is your 785 and which frequency?) using netperf. I am able to achieve rather more than 2-3 Gbit/s - 8 concurrent netperf TCP_STREAM tests have been getting 9+ Gbit/s using RHEL5.3 and the "505" driver from www.hp.com. I am using a x16 (PCIe1.1 IIRC) slot even though it is a x8 (PCIe 2.0) card. I'm using a 1MB requested socket buffer size and 64KB sends. The destinations are spread across a pair of other DL585 G6's with HP AD386A Single-port PCIe 10G NICs (those are only "supported" on Integrity servers.

PCIe1.1 x4 is "the same bandwidth" as a PCI-X 133 MHz slot. The very best you can expect from such a slot is ~7 Gbit/s.

Can you tell how many CPUs are taking interrupts from the NIC? Does Windows 2003 SP2 have support for MSI-X functionality to make that happen on more than one core? If not, then all the interrupts will be handled by one core and it won't matter at that point how many more concurrent connections you are running - that core will become the bottleneck.

In broad handwaving terms, within the context of the "technology" and not specifics of implementation, 10 GbE does little to nothing to make it easier to transfer data than it was for 1 GbE, or for that matter than it was for 100 BT or 10 BT. It still takes just as many CPU cycles to transfer a KB of data with each. Now, there certainly have been implementation details, outside the scope of the IEEE specs, that mean the 100 BT cards might have had better programming models than the 10 BT cards, and the 1 Gig cards have added CKO and TSO to make them "easier" than 100BT. The main thing that 10G NICs add beyond 1 Gig NICs is the ability to spread interrupts across more than one core. Hence the interest in 2003 SP2's support for "MSI-X" and whether or not interrupts from the NIC are being spread - especially on the receiver.

There is still latency when connected back to back - it should though be small :) It likley doesn't matter here since you have several connections, but even back to back, a TCP window size that could hit link-rate over 1 GbE may not be sufficient to hit link-rate over 10 GbE... but we aren't there yet with this situation :)
there is no rest for the wicked yet the virtuous have no pillows
rick jones
Honored Contributor

Re: NC522SFP 10Gbit card only pushes 2-3 Gbit - Win-2003 x64

Correction - my netperf testing thusfar has been with the NC522SFP in an x8 slot in the DL585 G6.
there is no rest for the wicked yet the virtuous have no pillows
Udi Karni
Occasional Advisor

Re: NC522SFP 10Gbit card only pushes 2-3 Gbit - Win-2003 x64

Rick - thanks. It's a DL785G5, and the slot is an X4 gen1.

I am kicking off 14 transfers - and each one saturates a processor - so on a 32-way - each fully used processor equals 3% CPU - so I do in fact see around 42%-43% CPU utilization. However - I am not seeing the 35 MByte/sec per process I would normally see if I kicked off only 1 or 2 or 3 of them.

All that doesn't really answer your question of whether the interrupts are handled by many CPUs or there's one bottleneck CPU that's taking all the hits. I'm still scratching my head how I look that up.

Are you an HP employee? We routinely have HP employees RDP to our servers so they can see for themselves - using the HP virtual room software. If you have the time or inclination - that would be fabulous. If you can't - I still appreciate any and all suggestions you have - and I will try and pursue them.
rick jones
Honored Contributor

Re: NC522SFP 10Gbit card only pushes 2-3 Gbit - Win-2003 x64

Yes, I am "HP" although almost all of what I've been saying thusfar has been in the guise of my alter-ego "Mr. Netperf." Your post caught my eye because I happen to be measuring NC522SFP performance on the DL585 G6 under Linux.

I don't think I am sufficiently Windows-savvy to make my RDP'ing into your system worthwhile, sorry.

Meanwhile, your mentioning having used an x4 slot rather than an x8 slot got me curious, so I've shifted my NC522SFP from an x8 slot (#5) of the DL585 G6 to an x4 slot (#6) and will see what comes of it. Sending to two separate systems seems to peak at about 5400 Mbit/s, the receive figures aren't measured yet. Since we have different HW and very different OSes YMMV :)
there is no rest for the wicked yet the virtuous have no pillows
Udi Karni
Occasional Advisor

Re: NC522SFP 10Gbit card only pushes 2-3 Gbit - Win-2003 x64

So shifting to an X4 cuts performance in half to 5Gbit? Before you were mentioning 7Gbit for X4 - so I'm trying to make sure I am not getting all confused with all the different numbers.

Also - since your server is a G6 - are these X4 gen 2? those are really like X8 gen 1, right? If so - that's even more curious because it would mean that the "4" after the "X" trumps the "gen". Am I getting all confused? I thought that an X8 gen 1 should give you full performance and by extension - X4 gen 2 should as well.

?????????????????????????
Udi Karni
Occasional Advisor

Re: NC522SFP 10Gbit card only pushes 2-3 Gbit - Win-2003 x64

If this is in fact our card -

http://forums.techarena.in/web-news-trends/1161416.htm

Then it would appear that it supports MSI-X and spreads the interrupt load around. No?
rick jones
Honored Contributor

Re: NC522SFP 10Gbit card only pushes 2-3 Gbit - Win-2003 x64

There is what the slot can do, and then what a card does in the slot.

My PCIe 1.1 x4 ~= PCI-X 133 handwaving arriving at 7 Gbit/s was based on a different PCI-X 10G card getting 7 Gbit/s in an Integrity rx4640 and that maxing-out the PCI-X 133 MHz slot on that system.

The first couple inbound tests on my DL585 G6 RHEL 5.3 setup have completed, and they have hit ~6200 Mbit/s. That is getting closer to the 7 Gbit/s handwaving I did.

Also, depending on where things are running on your DL785, and which slot is in use by the NIC, the NC522SFP may have to reach as far as 4 HT hops away to fetch the data to transmit (one from the HT2PCIe chip to the first CPU and as many as three to reach memory hanging from another CPU). In the DL585 G6, since it is only a 4P configuration, the most it has to reach-out is three HT hops. The farther away the memory, well you get the idea. That Integrity rx4640 was a bus-based system so there was only ever one "hop" essentially.

As for G6 being PCIe 2.0. It is my understanding that the Intel-based G6 systems have PCIe 2.0 slots (the DL380 G6 at least, I have one on internal order for just that very reason) I *do* know that a DL785 G6 is still PCIe 1.1. I ass-u-me but do not know a DL585 G6 also remains PCIe 1.1, the DL585 G6 Quickspecs seem to be mum on the topic.
there is no rest for the wicked yet the virtuous have no pillows
rick jones
Honored Contributor

Re: NC522SFP 10Gbit card only pushes 2-3 Gbit - Win-2003 x64

Yes, the *card* supports MSI-X. While that is necessary, it is not sufficient - the OS must also support MSI-X. That was the basis for my asking if Windows 2003 SP2 had MSI-X support.
there is no rest for the wicked yet the virtuous have no pillows
Udi Karni
Occasional Advisor

Re: NC522SFP 10Gbit card only pushes 2-3 Gbit - Win-2003 x64

Oh... the HT Hops. Groan. This seems to be the Achilles' heel of the DL785 G5. It seems to get in the way of database reads as well.

We've been unable to read Oracle sequential at more than 3.5 Gbyte/sec on the DL785 G5, even though synthetic "disktest" can read at 8 Gbyte/sec, and the DL580G5 that's only 16-way can read more than 4 Gbyte/sec. It all seems to stem from "the hops" while on the DL580 G5 the setup is more lean and mean.

I didn't read your previous post carefully enough. I thought you had a DL580G6 - the one with the PCI gen2 slots. Now I understand - you have a DL585 G6 which is still the PIC gen1. So - yes - we are both waiting anxiously for the snazzy new Intel model.

So - back to our predicament. It sounds like you are certainly getting > 5 Gbit from an X4. Realistically - that's all we are looking for. If each of our processes can move 30-35 Mbyte/sec, and each process maxes out a CPU - on a 16-way server like the DL580 we would be able to run no more than 16 transfers maximum, and 35MByte/sec * 16 processes = 5 Gbit. So on the DL580 we don't have cycles for more than 5 Gbit anyway and so that's our target. True, on the DL785 there are more cycles - but most of our servers are 16-way DL580s - so we are just looking for the lowest common denominator.

So - having set out sights on the more modest goal of 5 Gbit - which you have proven is doable on Linux - the question is - why are we only getting 1/2 as much on Windows, and is there anything else we can try.

Next - I was going to try registry parms GlobalMaxTcpWindowSize and TcpAckFrequency although I can't say I'm happy I have to go down the registry route.
Udi Karni
Occasional Advisor

Re: NC522SFP 10Gbit card only pushes 2-3 Gbit - Win-2003 x64

I will also continue to check whether Win-2003 SP2 supports MSI-X.

Contrary to popular belief Windows 2003 SP2 and above is actually quite agile and performant and is completely not the oft maligned OS it would appear to be in the blogosphere. A lot of people are still thinking about the clunkiness of NT and win-2000 but win-2003 SP2 is actually very nice.

You do, however, need to push a few concealed buttons, or set things up just so to make this come out.

I will expand my 10Gbit PhD to this topic as well. Naive me, I thought we'll just put the card in and everything will just work.
rick jones
Honored Contributor

Re: NC522SFP 10Gbit card only pushes 2-3 Gbit - Win-2003 x64

HT hops can be an issue for *any* Opteron-based system (well above single-socket I guess). It may be close to epsilon for a two socket, but it can increase from there. Now that Intel have gone to "links" instead of busses we will see the extent to which the same thing happens there.
there is no rest for the wicked yet the virtuous have no pillows
Udi Karni
Occasional Advisor

Re: NC522SFP 10Gbit card only pushes 2-3 Gbit - Win-2003 x64

Rick - does this mean that win-2003 does not support msi-x? Only Win-2008?

http://download.microsoft.com/download/D/1/D/D1DD7745-426B-4CC3-A269-ABBBE427C0EF/NET-T717_DDC08.pptx
rick jones
Honored Contributor

Re: NC522SFP 10Gbit card only pushes 2-3 Gbit - Win-2003 x64

I presume you are referring to slide 5? Certainly that would be my interpretation of that slide.
there is no rest for the wicked yet the virtuous have no pillows
Günter Weber
Occasional Visitor

Re: NC522SFP 10Gbit card only pushes 2-3 Gbit - Win-2003 x64

Hello Udi,

can you tell me with what cable do you connect the two cards directls without switch?

Greetings from Germany Günter Weber
Udi Karni
Occasional Advisor

Re: NC522SFP 10Gbit card only pushes 2-3 Gbit - Win-2003 x64

Hi Gunter,

Here are the details for the cable from our Quote:

Item # - 0473060
Manufacturer Part $ - 221692-B23
Description - HP - Fibre Channel cable - LC (M) - LC (M) - 49 ft


By the way - we have switched to testing this card with Windows 2008 and we are getting more like 400-500 Mbyte/sec - about double what we were getting with Windows 2003, not too far from the Linux numbers, and about what can be expected when placing a 10 Gbit card on an X4 slot instead of an X8 as it should.

So - Windows 2003 is definitely the problem, although we still haven't gotten a definite answer as to whether it can me made to work - or you simply shouldn't bother with 10Gbit with Windows 2003.
Michael H. Jenkins
Occasional Visitor

Re: NC522SFP 10Gbit card only pushes 2-3 Gbit - Win-2003 x64

I had the same issue using Redhat Linux and was able to get more throughput using these options:

1. Need the latest Firmware and Driver from HP. The driver install in linux will load the firmware.

2. Set Cisco Switch and NIC configuration to Jumbo Frames MTU=9000. Cisco switches can be set higher.

2. Tuning the transmission queue length can sometimes help.

3. Also sometimes you get better performance by pinning the process to a single CPU.
Linux command examples when running iperf:
iperf Server: taskset -p 0x00000001 iperf -s -l 256k -w 1024k -p 9099 -r
iperf Client: taskset -p 0x00000001 iperf -c -l 256k -i 2 -w 1024k -p 9099 -r

I don't follow forms very well so if I don't respond to a question as I probably did not see it. :)

Cheers,
Mike