Switches, Hubs, and Modems
1820293 Members
3303 Online
109622 Solutions
New Discussion юеВ

Re: Troubleshooting Drops Rx

 
Scott_111
Advisor

Troubleshooting Drops Rx

I have 4 HP proliant servers running a clustered Oracle database on Red Hat Linux. These 4 servers each have a gigabit connection to a gigabit port on a 20 port gigabit module in a 4108gl modular switch (whew!). These 4 ports are used for inter-server communications and are on a private network (VLAN 2).

The problem is that we are having slowness issues with these inter-server communications. The Status and Counters for these ports show some "Drops Rx" at the same time we are experiencing slowness. All other error counters remain unchanged at 0.

How do I go about troubleshooting this? Does this mean that the receive buffers of the receiving port in the switch is full and therefore the sending machine has to drop packets? Any help would be much appreciated

Thanks,

Scott
18 REPLIES 18
OLARU Dan
Trusted Contributor

Re: Troubleshooting Drops Rx

Hi Scott.

"Drops RX" in "Status and Counters - Port Counters" screen is an obvious mistake HP did not correct yet in their firmware. It should read "Drops Tx" which represents total number of frames that were dropped due to full outbound buffers of that switch port: if you request "Help" from "Status and Counters - Port Counters" screen you'll see this definition.

What I know for sure is that "Late Colln Tx" counter is an indication of duplex mode mismatch between the switch port and the attached NIC.

Also you might want to set the 4 switch ports AND the 4 server NIC to 1000FDx, to get rid of the troubled "Autonegotiation" feature.

Hope this helps,
Dan
Scott_111
Advisor

Re: Troubleshooting Drops Rx

Thanks for your response. There is not an option to force the port to "1000FDx" only "Auto-1000" and this is what it is set to. Also, If I had a duplex mode mismatch wouldn't I be seeing other errors (collisions)?

What causes the full outbound buffers of a switch port to be full and what can I do about it? Is this a limitation of the switch? Is there a way to increase the buffers?

Scott
Ralph Bean_2
Trusted Contributor

Re: Troubleshooting Drops Rx

Hello Scott -

I suggest you fix the problem causing the late collisions then see whether you are still getting Drops.

Dan is right: the late collisions are likely a symptom of a duplex mismatch.

Regards,
Ralph
Scott_111
Advisor

Re: Troubleshooting Drops Rx

Thanks for your response, however, I do not have any Late Collisions. Only Drops Rx.
Ralph Bean_2
Trusted Contributor

Re: Troubleshooting Drops Rx

Scott -

My mistake. I mis-read Dan's reply to you.

Regards,
Ralph
Kevin Richter_1
Valued Contributor

Re: Troubleshooting Drops Rx

"Drops Rx" is NOT a typo or documentation error. It correctly describes how the counters on the HP Procurve 4108gl switch function. If there is congestion on an outbound port (including the interface from this module to the switch's backplane), the outbound port buffers may fill. Once the outbound buffers for a given port are full, any additional traffic destined for that port (occurring before the congestion can be relieved) will be dropped on the incoming port. As it is dropped on the port where it was received, the counters properly reflect this as a "Drop Rx."

While the counter naming may be correct, the troubleshooting remains difficult. Can you identify any ports which may be experiencing high utilization? Remember, this includes the link to the backplane which cannot be directly observed but may be able to be "eliminated" from your testing if you can run a test or two with only these servers connected to this module and no other modules installed in the chassis.

Be sure to contact HP Procurve support if you don't make progress on this. (1-800-HPINVENT - speak keyword "Procurve" at the voice prompt for your product). They can be quite helpful with Procurve issues.
Check the cabling. Next, check the cabling again.
Scott_111
Advisor

Re: Troubleshooting Drops Rx

Why would the backplane even be an issue if these four ports are on the same module? Other ports in the switch have a few drops but these ports have thousands.

What do I do if I run a test with only these servers and I still see Drops?

Thanks,

Scott
Scott_111
Advisor

Re: Troubleshooting Drops Rx

These four servers are running an Oracle Real Application Cluster(RAC) Database.

Whenever slowness is experienced, as in when a query takes a long time to return data, the switch reports "dropped frames." As the slow query is running the "Drops Rx" counter increments and during a fast query no drops are reported by the switch.


I have also noticed that the "Drops Rx" counter remains unchanged at zero for the server having the slow response.

Therefore, could it be possible that during a slow query, the other three servers try to send a large amount of data, at the same time, to the server running the query. This large amount of data directed to this one port causes the buffers to fill and frames to drop.

Thanks again,

Scott
Gonzo Granello
Valued Contributor

Re: Troubleshooting Drops Rx

Scott,

well a lot of opininons, maybe i can contribute to the confustion :-)...... As far as the duplex mismatch goes, Gigabit always operates FULL duplex per IEEE specifcation. The only thing that will be Auto Negotiated is Flow control as compared to 10/100 where half/full duplex is another option. The link speed is NEVER autonegotiated because that is a different process called speed sense and happens before the auto neg ever starts. However, if one side of the link is not set to auto, the other side will fall back to the lowest common determinator, eg. on 1000 it is NO flow control, 10/100 always half duplex - no flow control. That by the way is where most issues start, means if you turn auto neg off on one side you have to turn it off on the other and configure both ends the same. Clearly, auto neg is NOT the issue, configuring one end to not auto neg and leve the other side will be. As far as your servers go, are you using any kind of load balancing between the 4 links or just different ip's ? Turning auto neg off on both ends to make sure there is no mismatch would be one option, to further troubleshoot i would need some more info.

Andreas
most time the day i have to mask my contempt for the a-holes in charge......

Re: Troubleshooting Drops Rx

This looks as if this could be a server issue.

If the server is being heavily utilised, then it will not be able to process the incoming packets.
This will cause the buffers to fill up, therefore the inbound packets will back up in the switch, hence the switch will drop the packets from the other inbound servers.

Can you check the server performance and receive buffers on the offending server during your slow database queries?
Scott_111
Advisor

Re: Troubleshooting Drops Rx

Ok, we installed an old 3Com 3300 10/100 switch and the slowness goes away. All the queries are now fast.

Before this, we fixed the port speeds on the switch and the NICs to 100FDx. The queries still ran slow.

Now I need to figure out if there is a problem with the HP switch? or do my clients have a problem with the HP switch?
Scott_111
Advisor

Re: Troubleshooting Drops Rx

Here is an update if anyone cares...

I am able to duplicate this using FTP. With all four machines connected to Gig ports, I FTP a large file from two machines to one machine. During the file transfer the Drops Rx counter on the "Status and Counters" screen increments for each sending port.

With all four machines connected to 10/100 ports and running at 100FDx, I FTP a large file from two machines to one machine. This time, during the file transfer, the Drops Rx counter does NOT increment.
Ralph Bean_2
Trusted Contributor

Re: Troubleshooting Drops Rx

Hello Scott -

You wrote "These 4 servers each have a gigabit connection to a gigabit port on a 20 port gigabit module in a 4108gl modular switch."

Are the 4 servers connected to the same 20-port module or to more than one 20-port module?

Regards,
Ralph
Scott_111
Advisor

Re: Troubleshooting Drops Rx

Ralph,

They are all connected to the same 20 port module. I have tried a different 20 port module and a 6 port module. They all produced the same results, drops.

Scott
Ralph Bean_2
Trusted Contributor

Re: Troubleshooting Drops Rx

Scott -

OK. And apart from the inter-server communications, is there traffic to or from these servers from other modules/blades on the Switch 4108gl?

Regards,
Ralph
Scott_111
Advisor

Re: Troubleshooting Drops Rx

Ralph,

These 4 ports/servers are only communicating to each other on the same module.

Scott
Chris McFarling
Advisor

Re: Troubleshooting Drops Rx

I've been experiencing similar problems. I have a 4108gl with 2 6-port Gb blades and 2 24-port 10/100 Mb blades. I have 2 Win2K Dell 1650 servers on Gb. I also have 2 Macintosh G4 (OS X 10.2.8) desktops on Gb. My problem is between these Mac G4s and the Win2K servers. Originally everything was 10/100. A few months ago I added the Gb blades. Since then the G4s that are connected via Gb drop off the network fairly often. It always happens when data is being transmitted from the desktop to the server (i.e. a file is being saved or a file is being copied). One of these G4s is on port B4. Digging around in the swith I came up with a couple points of interest. Here is the 1st screen output from the 'show interfaces' command (not sure if this will format correctly on this message board):

Status and Counters - Port Counters

Flow
Port Total Bytes Total Frames Errors Rx Drops Rx Ctrl
---- ------------ ------------ ------------ ------------ -----
A1 1,292,325... 3,332,360... 0 51,065 off
A2 3,030,993... 3,974,839... 0 5876 off
A3 3,469,065... 884,390,170 0 685,398 off
A4 2,032,024... 697,005,582 0 47 off
A5 43,325,123 1,503,198... 0 40,278 off
A6 2,936,419... 3,182,775... 0 718,075 off
B1 2,513,429... 567,435,320 0 210 off
B2 2,777,789... 156,003,895 0 836 off
B3 382,901,385 766,868,003 0 1037 off
B4 1,668,981... 753,116,057 0 1319 off
B5 717,844,295 731,313,675 0 590 off
B6 3,695,501... 816,902,831 0 601 off
G1 1,567,762 112,676,629 0 0 off
G2 3,083,163... 80,530,933 8 0 off
G3 2,682,579... 3,745,235... 0 0 off
G4 1,761,043... 1,291,843... 10 0 off
G5 4,253,146... 398,535,719 2 0 off

Basically, all of my Gb ports have some Drop Rx packets. A6 is up to 718,075. B4, where a problem Mac G4 is, is at 1349. All of the 10/100 ports are at zero. Additionaly I'm getting log entries like this:

I 01/17/91 22:20:11 ports: port B4 is now off-line
I 01/17/91 22:20:14 ports: port B4 is Blocked by LACP
I 01/17/91 22:20:17 ports: port B4 is now on-line
I 01/17/91 22:20:26 ports: port B4 is now off-line
I 01/17/91 22:20:56 ports: port B4 is Blocked by LACP
I 01/17/91 22:20:59 ports: port B4 is now on-line

I'm not sure what's going on here. I'm not using any trunking on this switch. All ports are set to LACP Passive.

Any ideas?

SCOOTER
Esteemed Contributor

Re: Troubleshooting Drops Rx

Chris,

I have seen problems with MAC's with speeds(10/100/1000) before and turning off LACP and fixing the port and Nic's speed-duplex to 1000Fdx fixed the problem with the network drop-out.
Depending on what MAC you have you can fix the speed-duplex in the OS, if not then you need a speed-duplex tool (not supported by MAC but it works like a charm). If you need the tool send me an email (remove the * and NOSPAM ;)) Hope this helps a little,

NOSPAM*m*i*k*2*2*0*0*1*@hotmail.com

Regards,

SCOOTER