Switches, Hubs, and Modems
cancel
Showing results for 
Search instead for 
Did you mean: 

Procurve 1800s

Procurve 1800s

Just wondering if anyone has ever experienced this situation:

We have 6 1800 switches, 2 ports trunked together between each switch. 3 ML350 servers hang off 1 switch, all 3 with Gb cards. Approximately 2 months ago, users started complaining that the network was "slow" for all server resources. Not all users were complaining, only certain ones. Investigating, the only users with problems were users running at 100 Mb (link utilization of 1% - 2% max). Not all 100 Mb users were experiencing trouble, however, 0 Gb clients experienced problems. After trying about 10 different fixes (smb signing disable, etc.) we started isolating the switch configuration. We moved 1 problem PC to the same switch with the servers, everything worked perfectly. After playing with various configurations, we finally set fixed trunks between the switches, and enabled flow control on both ends of the trunk.

My main question is "why did this happen?" If flow control is critical, why do I not need to enable flow control on every 100 Mb port? Will flow control cause any adverse effects for my Gb users?

Thanks for all your help!
18 REPLIES
Matt Hobbs
Honored Contributor

Re: Procurve 1800s

Was it setting the static trunk or the flow control that seemed to make the real difference?

Since it seems to be okay for 100Mbit clients connected to the same switch as the servers, it doesn't seem like a switch buffer limitation which is where flow control would usually come in useful.

The other thing I would try is just using a single port between the switches to see if it's in any way related to the trunking.

Re: Procurve 1800s

It seemed to be the flow control that really helped. I tried connecting only 1 port together (either configured as a single trunk, 1 port of a pair of ports as 1 trunk, or 1 port, not set as a trunk)- no dice. Same results in every case until I initiated flow control.
Matt Hobbs
Honored Contributor

Re: Procurve 1800s

I would also be concerned with having to enable flow-control between the two switches. When flow-control is enabled it sends pause frames when one of the 2 devices on either end of the link run out of buffers.

Maybe internally the 10/100 side of the switch it has flow control automatically enabled (I'm really just hypothesising here) so when the Gigabit enabled server on one switch sends traffic through to the other switch (also connected at gigabit), when the receiving switch then has to buffer it to 10/100 that's the point where it runs out of buffers and sends a pause frame back through the uplink.

The problem with this is that it will pause all traffic across that link, possibly degrading the performance of those gigabit clients. It may be imperceivable though so I guess the trade-off is up to you.

What I don't understand at the moment is why this only affects 100Mbit client on the other switch and not on the same switch as the servers.

What if you just enable flow-control on the 10/100 ports instead, does that make any difference?

What sort of performance degradation were you seeing? Can you give a before and after in Mbit/s? What type of performance testing where you running? I'm guessing file sharing and primarily TCP based.

I'm not sure if the 1800 supports it, but what I would do is try and snmpwalk the switches - in particular RFC2665.mib contains the 802.3x information so you can see what ports are pausing traffic inbound and outbound. I've never had a need to do this myself but it's where I'd start. If you need a mib browser I'd recommend the free version of iReasoning MIB browser and you can find the mib file here: http://www.hp.com/rnd/software/MIBs.htm

Re: Procurve 1800s

Matt,

Thanks for all the responses, and I'm glad that someone else is as baffled as I.

Your concern about flow control is exactly what I was wondering - I'm potentially seriously degrading the link between the two switches, and in turn, all the clients connected to the secondary switch, not just the single client requesting 1 particular data stream.

I attempted to enable flow control only on the ports of those running 100 Mbit - no dice. Same results with slow throughput. For performance testing, I simply zipped the i386 directory of a windows XP disc. It's about a 700 MB file. With degraded performance, I was getting windows estimating between 93 and 125 minutes to copy, or roughly, 1 Mbits / sec, or .125 MByte / sec. Hardly "performance." With flow control enabled on the trunks, I'm seeing ~37 MBits / sec or 4.7 MBytes / sec. Not spectacular, but WAAAAY faster than without flow control.

I will download the Mib browser and see if I can look into the 1800 a little closer.

Thanks again for all your help!
Matt Hobbs
Honored Contributor

Re: Procurve 1800s

Alan,

This seems easy enough to reproduce, if you can upload a basic network map of your setup there I'll see if I can try and reproduce it next week.

It's quite possible that this is 'expected' behaviour due to the most likely very small buffers on these switches. The 2800 and 4100's 16-port gigabit module also suffered from this type of issue but HP were able to provide a 'qos-passthrough-mode' command which optimised the buffers for 100-1000 transfers. Due to the basic capabilities of the 1800 I'd be surprised if this type of feature could be implemented.

Having said that, that is only if it is a buffering issue which at this point in time I'm not entirely convinced - given the strange problem description that it only happens on the switch that does not have the servers on it.

These 1800's would be considered as optimised for gigabit so if you needed an excuse to upgrade the 100Mbit machines this would be a perfect opportunity.

Re: Procurve 1800s

Matt,

I've attached a very crude Visio diagram of our network. Let me know if there are any additional questions or if I am unclear on some points. The small buffers may indeed be an issue. I guess it's a decent reason to upgrade the 100 Mb holdouts. I am a little disappointed in the performance, but perhaps you will uncover something I cannot. I did look into the iresource mib viewer, however, it does not appear HP has an MIB for the 1800 series switches.
Matt Hobbs
Honored Contributor

Re: Procurve 1800s

For the purpose of checking flow-control, all you should need to load is the RFC2665.mib. I'll let you know how I go next week.
Matt Hobbs
Honored Contributor

Re: Procurve 1800s

Hi Alan,

I ran a few quick tests today and was unable to reproduce it. For my tests I was using FTP and iperf to test the performance. I was unable to test with SMB as the machines were from different domains and it didn't seem to like that.

I needed to also use a 100Mbit switch connected to the second 1800 as I only had gigabit clients.

With FTP and iperf though, I was getting 100Mbit performance consistently between the gigabit and 100Mbit device.

I was only using one link between the 1800's, and have not logged into the web interface to check the configurations.

If I get some more time later this week I'll see if I can set it up with a 2 port trunk and also I'll make sure to get SMB file sharing working too.

Matt Hobbs
Honored Contributor

Re: Procurve 1800s

Hi Alan,

I tried it again today, this time with 2x 1800's only, a 100Mbit device and a gigabit server, and I still couldn't reproduce this. (Both single link and 2 port trunk between the switches - no flow control).

If you can reproduce this easily and provide some more detail on exactly how you're testing I'll run a few more tests here.

Matt

Re: Procurve 1800s

Matt,

I really appreciate you looking into this. Sorry for the long delay in responding - yesterday I could not get logged in to the HP forums.

The testing I have done has been file copy and program usage. In all cases, running the dual trunk between switches, servers plugged into switch 1, clients on switch 2 (through 6 in our case) any / all 100 mb clients experienced slowdowns. This also was the case if running through a 10/100 switch / hub downstream (off switch 2).

Do you think it could be something wonky with our firmware version? I will get the version information tomorrow and see if a.) an update exists for the 1800 and b.) if this may explain our dis-similar results.

All 10/100 clients have reported a big boost in speed this week, I'd just like to be able to run without flow control. Thanks again for all your support. I will let you know what I find tomorrow.

Alan S.
Matt Hobbs
Honored Contributor

Re: Procurve 1800s

Hi Alan,

Although I couldn't see the issue, I'm still perplexed about it. I have a few remaining ideas which I'd like to test tomorrow.

1. Firmware versions, today I tested with an 1800-8G and an 1800-24G, I did have an updated firmware version from HP on one of them for a separate issue (there are no updates on the website though so you need to contact HP support to get this). The firmware versions didn't seem to make any difference but it would be still good to note what versions you're running.

2. Since you only have the 24-port versions I thought maybe I should be testing with 24-port switches only. What ports are you using as uplinks?

3. I need to check how fast the DL 380 that I was using can really transfer data, going gig-to-gig I was only getting about 230Mbit which may not be enough to overwhelm the switch when going to a 100Mbit device.

4. Back to flow-control, I have a feeling that maybe the switch can receive it's flow control settings from the client when auto-negotiating with a 100Mbit client. e.g, if the client has flow control enabled and is set to Auto, the switch will automatically enable flow-control on its port.

This could explain why you need to also enable flow control on the trunk which would give you end-to-end flow control, finally back to the server which would also be told to pause it's frames. It also explains why you do not need to enable flow control when the 100Mbit is connected directly to same switch as the server.

I've found a few OIDs in the RFC that seem to hint at this so I'll check this theory out tomorrow.

One question for you is that on the 100Mbit client NICs, do you know what their flow control is currently set to?

Re: Procurve 1800s

Matt,

I'm using ports 23 and 24 on Sw #1 to 23 and 24 on Sw #2. Ports 21 and 22 of Sw #1 to 23 and 24 on Sw #3. So on and so forth. Trunking is set T1 = Sw #1 to Sw #2, T2 = Sw #1 to Sw #3. So on and so forth.

As far as overwhelming the switch, I would think that almost anything about the 100 Mb should kill it. I was seeing link utilization peaking at 5% with no flow control, averaging 1% - 2%.

I'm betting you may be right on the flow control being negotiated, thus explaining not needing to set it. I am not sure of the settings for flow control, but I will check tomorrow. I'll also report the firmware.

Alan S.
Matt Hobbs
Honored Contributor

Re: Procurve 1800s

Alan,

I tried working some more on my auto-negotitation flow-control theory this morning but seem to have hit a brick wall as it turns out the 1800 does not reply to the OID's in RFC2665.mib that I was hoping to probe it for.

However, I did test the theory on a 3400 - but it seemed to go against what I was hoping. If flow-control was disabled on the switch port it stayed disabled no matter what the client was set to. There is a chance that the 3400 is not adhering to the standard but I'd say that's a long show and I'm probably not deciphering the RFC properly.

I'd still like to be able to reproduce the issue so if there is anything you can do on your side, possibly trying iperf or FTP transfers instead, that will make it easier for me to reproduce in a lab environment it would be appreciated.
Jordan D
Occasional Contributor

Re: Procurve 1800s

I have a similar issue with a customer being worked out at the NSC as I type. The issue here was very slow file transfers coming from a server. A workstation could copy a 50 meg file to the server in about 6-10 seconds. Pulling down the same size file happened to take somewhere around 10 minutes.

As this moment, the only time the engineer was able to reproduce the slow file transfers from the server to the users was when one side of the link was at auto and the other was forced.

He did determine when forcing both the switch port and the workstation to 100FDx, the problem seemed to vanish.

Re: Procurve 1800s

I will try to test with an FTP client tonight. The symptoms were seen loading database information for software used with one company on the network. Prior to implementing flow control on the trunks, the 100 Mb clients were seeing avg. report response times of 25-30 seconds. This was for a basic report. With flow control enabled, they are getting 2 - 3 sec MAX refreshes.

I checked the switches for Hardware version / Software version.

Hardware: R01
Software: PB.02.03

Just verified that all switches are running the same software version and hardware version.

Re: Procurve 1800s

Jordan,

Were your problems occurring when the server was on switch 1, and a client was on a different switch? My problems went away completely if I put a 100 Mb client on the same switch as the server. Once I had the client on a different switch, the problem presented itself.
Victor Baidez
Occasional Visitor

Re: Procurve 1800s

We are in the same situation, exactly than you.

We have 6 1800 24-G J9028B, servers are in switch 1, from switch 1 to 2,3,4,5 and 6 we have a uplinks from Switch 1 ports 20 to 24.

S= Switch
P= Port

S1P20 <-> S2P24
S1P21 <-> S3P24
S1P22 <-> S4P24
S1P23 <-> S5P24
S1P24 <-> S6P24

When connect any 100FDX pc on SWITCH2 or 3,4,5,6, in autosensing mode or forced, from this pc network its extremely slowed, 40MB = 6min. GB users do not have any problem.

I tried to increase bandwitch creating a trunk to test with 2 ports

S1P1 <--> S4P1
S1P20 <--> S4P24

But The result is the same slow performance 40MB 6min.

Net Traficc its very light in our network.

¿Anyone can helpme?

Information of switchs

Procurve 1800 24-G J9028B
Number of Ports 24
Hardware Version R01
Software Version PB.03.02

Thanks in advance and sorry for mi poor english ;)
Gary Gemmell
Occasional Visitor

Re: Procurve 1800s

You need to disable jumbo frames.

Had exact same problem but solved my updating all switches to same firmware and turning off jumbo frames.

Everything hinges around this as these switches dont seem to like jumbo frames and cant handle the bandwidth.

Buy cheap pay twice was my grans motto and it always seems to hold true.

Buy quality and you will never or very rarely ever have to go through wasted hours days and nights fixing the minutae.

Ill stick with Cisco 3550 from now on - never ever had these problems with Cisco gear.

And yes IOS is an outdated system i hear a lot of people ragging on this but proof of the pudding is in the eating - Cisco stuff is rock solid and as long as you are good with IOS you should rarely encounter wasted time like this!!!