Switches, Hubs, and Modems
cancel
Showing results for 
Search instead for 
Did you mean: 

HP Procurve 2900-48 - High collision or drop rate

vlad manea
Occasional Advisor

HP Procurve 2900-48 - High collision or drop rate

Hi,

I have a linux cluster based on rocks 5.1 and the switch is a HP procurve 2900 with 48
ports. On one of the ports (3) where i have plugged in a compute node I got this warning:
"High collision or drop rate". The Switch status displayed is Non-critical. I reinstalled the compute node and I see that all my nodes offer the ~the same load in a parallel job (including the third port).
What should I do in this case? It looks like the warning message is still there...

Thanks,
Vlad
_______
11 REPLIES
RonniDK
Advisor

Re: HP Procurve 2900-48 - High collision or drop rate

Have you checked speed/duplex ?

# show interfaces brief

vlad manea
Occasional Advisor

Re: HP Procurve 2900-48 - High collision or drop rate

Yes, I checked and this is the output:

...
3 100/1000T | No Yes Up 1000FDx MDIX on 0
...
However, I see 83,778,975 Tx drops on this port, compared with ~2,000,000 on the other ports.
Any idea how to solve this problem?

Thanks,
Vlad
RonniDK
Advisor

Re: HP Procurve 2900-48 - High collision or drop rate

Do you have drops on all your ports?

I have a similar case @ HP regarding 7 x 2900-48G switches, which all have drops.

I would very much like to hear about your setup, if you do have drops on all your ports.
vlad manea
Occasional Advisor

Re: HP Procurve 2900-48 - High collision or drop rate

Well, I had drops on all ports, but on port 3 they were 40 times higher. I had to reboot the switch today, I was starting to get a lot of errors. I start a new MPI job on all the compute nodes, and after 24 hours the switch is OK.
I looked at #show interfaces and see Drops Tx are 0 on all ports. I had no idea what have caused the previous behavior, maybe a very large parallel job with lots of MPI communication between the nodes.
vlad manea
Occasional Advisor

Re: HP Procurve 2900-48 - High collision or drop rate

The HPC setup I have is the following:
5 compute nodes (Dell PE Sc1425-2x2 processors-2.6 GHz;
8GB RAM)
1 head node Dell PE 2970 2x2 processors-2.8 GHz;
8GB RAM)
1 switch HP Procurve 2900 48.
All the compute nodes and the head node are hooked up into the switch (I am using only one NIC on each machine, so for the moment I am using only 6 ports out of 48).
RonniDK
Advisor

Re: HP Procurve 2900-48 - High collision or drop rate

Okey...

We have a setup in which each computer is using bonding, and about every switch is full.

Can you send me a "show interfaces"?

vlad manea
Occasional Advisor

Re: HP Procurve 2900-48 - High collision or drop rate

There you go:

ProCurve Switch 2900-48G# show interfaces

Status and Counters - Port Counters

Flow Bcast
Port Total Bytes Total Frames Errors Rx Drops Tx Ctrl Limit
----- -------------- -------------- ------------ ------------ ----- ------
1 558,337,029 18,584,801 0 0 on 0
2 493,880,984 2,951,289,212 0 0 on 0
3 322,075,451 2,948,338,084 0 7 on 0
4 4,126,074,570 2,941,405,689 0 1 on 0
5 3,394,545,464 1,483,089,562 0 0 on 0
6 3,282,434,817 1,480,859,094 0 0 on 0
7 0 0 0 0 off 0
8 0 0 0 0 off 0
9 0 0 0 0 off 0
10 0 0 0 0 off 0
11 0 0 0 0 off 0
12 0 0 0 0 off 0
13 0 0 0 0 off 0
14 0 0 0 0 off 0
15 0 0 0 0 off 0
16 0 0 0 0 off 0
17 0 0 0 0 off 0
18 0 0 0 0 off 0
19 0 0 0 0 off 0
20 0 0 0 0 off 0
21 0 0 0 0 off 0
22 0 0 0 0 off 0
23 0 0 0 0 off 0
24 0 0 0 0 off 0
25 0 0 0 0 off 0
26 0 0 0 0 off 0
27 0 0 0 0 off 0
28 0 0 0 0 off 0
29 0 0 0 0 off 0
30 0 0 0 0 off 0
31 0 0 0 0 off 0
32 0 0 0 0 off 0
33 0 0 0 0 off 0
34 0 0 0 0 off 0
35 0 0 0 0 off 0
36 0 0 0 0 off 0
37 0 0 0 0 off 0
38 0 0 0 0 off 0
39 0 0 0 0 off 0
40 0 0 0 0 off 0
41 0 0 0 0 off 0
42 0 0 0 0 off 0
43 0 0 0 0 off 0
44 0 0 0 0 off 0
45 0 0 0 0 off 0
46 0 0 0 0 off 0
47 0 0 0 0 off 0
48 383,366,496 4,161,559 0 0 off 0
A1 0 0 0 0 off 0
A2 0 0 0 0 off 0
A3 0 0 0 0 off 0
A4 0 0 0 0 off 0

ProCurve Switch 2900-48G#
RonniDK
Advisor

Re: HP Procurve 2900-48 - High collision or drop rate

Have you reset the statistics on the switch? According to the "show interfaces" you don't have many drops.

As I understand your previous post:
However, I see 83,778,975 Tx drops on this port, compared with ~2,000,000 on the other ports.
Any idea how to solve this problem?


You have over 83 million drops on port 3 and around 2 million drops every other port - is that correct?
Matt Hobbs
Honored Contributor

Re: HP Procurve 2900-48 - High collision or drop rate

It looks healthy now. Drops simply occur from any buffer overflows. E.g. you had 2 Gig coming in to a 1 Gig port and the switch simply couldn't buffer it so packets were dropped. All you can really do is keep an eye on it, if keeps occurring, maybe you need to setup an aggregated port group for those particular servers that are affected.
vlad manea
Occasional Advisor

Re: HP Procurve 2900-48 - High collision or drop rate

Hi RonniDK,

I simply rebooted the switch from a telnet session, then I noticed that the statistics went away. But yes, indeed
I had those huge drops before. I was running at that time the Linpack benchmark on my cluster to get the maximum real performance and probably this caused the high rate of drops. Actually I pushed the matrix size to the maximum amount permitted by the RAM memory on each server (8 GB). I am going to rerun Linpack next week and monitor again the ports.

Cheers
Vlad
ctgan
Occasional Visitor

Re: HP Procurve 2900-48 - High collision or drop rate

Hi i have the problem which is High collision or drop rate at the swith. This was happen after i have configured the load balance in one of the server. that is become 2 GB port coming into the switch . is it this is the root cause ? may i know how to solve it .  please advice..

 

 

thanks you