Switches, Hubs, and Modems
1752812 Members
5981 Online
108789 Solutions
New Discussion юеВ

Re: Troubleshooting Drops Rx

 
Scott_111
Advisor

Troubleshooting Drops Rx

I have 4 HP proliant servers running a clustered Oracle database on Red Hat Linux. These 4 servers each have a gigabit connection to a gigabit port on a 20 port gigabit module in a 4108gl modular switch (whew!). These 4 ports are used for inter-server communications and are on a private network (VLAN 2).

The problem is that we are having slowness issues with these inter-server communications. The Status and Counters for these ports show some "Drops Rx" at the same time we are experiencing slowness. All other error counters remain unchanged at 0.

How do I go about troubleshooting this? Does this mean that the receive buffers of the receiving port in the switch is full and therefore the sending machine has to drop packets? Any help would be much appreciated

Thanks,

Scott
18 REPLIES 18
OLARU Dan
Trusted Contributor

Re: Troubleshooting Drops Rx

Hi Scott.

"Drops RX" in "Status and Counters - Port Counters" screen is an obvious mistake HP did not correct yet in their firmware. It should read "Drops Tx" which represents total number of frames that were dropped due to full outbound buffers of that switch port: if you request "Help" from "Status and Counters - Port Counters" screen you'll see this definition.

What I know for sure is that "Late Colln Tx" counter is an indication of duplex mode mismatch between the switch port and the attached NIC.

Also you might want to set the 4 switch ports AND the 4 server NIC to 1000FDx, to get rid of the troubled "Autonegotiation" feature.

Hope this helps,
Dan
Scott_111
Advisor

Re: Troubleshooting Drops Rx

Thanks for your response. There is not an option to force the port to "1000FDx" only "Auto-1000" and this is what it is set to. Also, If I had a duplex mode mismatch wouldn't I be seeing other errors (collisions)?

What causes the full outbound buffers of a switch port to be full and what can I do about it? Is this a limitation of the switch? Is there a way to increase the buffers?

Scott
Ralph Bean_2
Trusted Contributor

Re: Troubleshooting Drops Rx

Hello Scott -

I suggest you fix the problem causing the late collisions then see whether you are still getting Drops.

Dan is right: the late collisions are likely a symptom of a duplex mismatch.

Regards,
Ralph
Scott_111
Advisor

Re: Troubleshooting Drops Rx

Thanks for your response, however, I do not have any Late Collisions. Only Drops Rx.
Ralph Bean_2
Trusted Contributor

Re: Troubleshooting Drops Rx

Scott -

My mistake. I mis-read Dan's reply to you.

Regards,
Ralph
Kevin Richter_1
Valued Contributor

Re: Troubleshooting Drops Rx

"Drops Rx" is NOT a typo or documentation error. It correctly describes how the counters on the HP Procurve 4108gl switch function. If there is congestion on an outbound port (including the interface from this module to the switch's backplane), the outbound port buffers may fill. Once the outbound buffers for a given port are full, any additional traffic destined for that port (occurring before the congestion can be relieved) will be dropped on the incoming port. As it is dropped on the port where it was received, the counters properly reflect this as a "Drop Rx."

While the counter naming may be correct, the troubleshooting remains difficult. Can you identify any ports which may be experiencing high utilization? Remember, this includes the link to the backplane which cannot be directly observed but may be able to be "eliminated" from your testing if you can run a test or two with only these servers connected to this module and no other modules installed in the chassis.

Be sure to contact HP Procurve support if you don't make progress on this. (1-800-HPINVENT - speak keyword "Procurve" at the voice prompt for your product). They can be quite helpful with Procurve issues.
Check the cabling. Next, check the cabling again.
Scott_111
Advisor

Re: Troubleshooting Drops Rx

Why would the backplane even be an issue if these four ports are on the same module? Other ports in the switch have a few drops but these ports have thousands.

What do I do if I run a test with only these servers and I still see Drops?

Thanks,

Scott
Scott_111
Advisor

Re: Troubleshooting Drops Rx

These four servers are running an Oracle Real Application Cluster(RAC) Database.

Whenever slowness is experienced, as in when a query takes a long time to return data, the switch reports "dropped frames." As the slow query is running the "Drops Rx" counter increments and during a fast query no drops are reported by the switch.


I have also noticed that the "Drops Rx" counter remains unchanged at zero for the server having the slow response.

Therefore, could it be possible that during a slow query, the other three servers try to send a large amount of data, at the same time, to the server running the query. This large amount of data directed to this one port causes the buffers to fill and frames to drop.

Thanks again,

Scott
Gonzo Granello
Valued Contributor

Re: Troubleshooting Drops Rx

Scott,

well a lot of opininons, maybe i can contribute to the confustion :-)...... As far as the duplex mismatch goes, Gigabit always operates FULL duplex per IEEE specifcation. The only thing that will be Auto Negotiated is Flow control as compared to 10/100 where half/full duplex is another option. The link speed is NEVER autonegotiated because that is a different process called speed sense and happens before the auto neg ever starts. However, if one side of the link is not set to auto, the other side will fall back to the lowest common determinator, eg. on 1000 it is NO flow control, 10/100 always half duplex - no flow control. That by the way is where most issues start, means if you turn auto neg off on one side you have to turn it off on the other and configure both ends the same. Clearly, auto neg is NOT the issue, configuring one end to not auto neg and leve the other side will be. As far as your servers go, are you using any kind of load balancing between the 4 links or just different ip's ? Turning auto neg off on both ends to make sure there is no mismatch would be one option, to further troubleshoot i would need some more info.

Andreas
most time the day i have to mask my contempt for the a-holes in charge......