Operating System - OpenVMS
1839259 Members
3343 Online
110137 Solutions
New Discussion

Re: MCR LANCP SHOW DEVICE /ALL /COUNT -> Data overruns

 
SOLVED
Go to solution
Art Wiens
Respected Contributor

MCR LANCP SHOW DEVICE /ALL /COUNT -> Data overruns

Alpha ES47 with two, DE602-BB using only one port on each card, VMS v7.3-2 all patched up.

I am noticing "choppy" response in a Telnet session (TCPware v5.7-2 all patched up). Issuing the above command, the only "problem" I see is Data overruns. I also see Data overruns logged against the NCP lines, nothing against the circuits.

Absolutely confirmed that both the switch ports and the SRM variables are set to 100/Full.

Can anyone provide some insight as to what might be causing "Data overruns"?

Cheers,
Art
23 REPLIES 23
EdgarZamora
Trusted Contributor

Re: MCR LANCP SHOW DEVICE /ALL /COUNT -> Data overruns

This is usually a mismatch between the switch port and the NIC. You may have checked the switch port and the console variable but did you check LANCP? Can you post output of LANCP SHOW DEV /CH ?

Art Wiens
Respected Contributor

Re: MCR LANCP SHOW DEVICE /ALL /COUNT -> Data overruns

Here you go. I did not include EIB0 and EID0, they are not plugged in. It would sure seem like "everyone" is in agreement the interfaces are 100/Full. I sat with the network guy and watched him show the switch ports - 100/Full. I have also included the SRM variables for the two ports in the attachment.

Cheers
Art
EdgarZamora
Trusted Contributor

Re: MCR LANCP SHOW DEVICE /ALL /COUNT -> Data overruns

Sure seems like it, although your LANCP output seems a little different from what I get on a fully patched 7.3-2 system (I can't put a finger on it yet because I don't have access right now). What changed? Did this use to work fine? Is the network heavily utilized? The buffers may need increasing.

Bill Hall
Honored Contributor

Re: MCR LANCP SHOW DEVICE /ALL /COUNT -> Data overruns

Art,

I believe data overruns are a count of the number of times the NIC lost an incoming packet. Which would lead to a retransmit(s), which could look choppy.

I've heard the DE602 did not perform as well as the DE500, maybe it can't keep up with the switch port... Are any other LANCP counters increasing on these NICs? Does MONITOR MODES show an extremely high interrupt rate?

I'm running some ES47s with 4 DEGX2-TA (dual Gigabit Ethernet) in each, no jumbo frames, 5 of the 8 ports active (3 LL devices and two EW devices dedicated to SCA traffic) and I see no data overruns.

Bill
Bill Hall
Art Wiens
Respected Contributor

Re: MCR LANCP SHOW DEVICE /ALL /COUNT -> Data overruns

I had this system built for a few months in test mode, I didn't notice a performance issue so I can't say that I ever looked at the LANCP counters.

This week I blew it all away as the SAN allocations had to be redone and started the production build "for real".

The system itself is idle, I see the data overruns start incrementing immediately after a reboot. The switches are "busy" but not with this system.

BTW, the DE602's are plugged into:

"Cisco WS-C3750G-24TS-S

Software version: c3750-ipbasek9-mz.122-25.SEB4

2 Switches in the stack (cluster)"

One NIC is plugged into each switch in the stack.

Art
Volker Halle
Honored Contributor

Re: MCR LANCP SHOW DEVICE /ALL /COUNT -> Data overruns

Art,

did you have a look at LANCP> SHOW DEVICE/INTERNAL_COUNTERS ? Maybe there is some more information available. Could you post the counters in a .TXT atttachment ?

Volker.
Art Wiens
Respected Contributor

Re: MCR LANCP SHOW DEVICE /ALL /COUNT -> Data overruns

Edgar:
"Is the network heavily utilized? The buffers may need increasing."

It's a busy network and it's in a busy switch, but the ES47 is idle. I look this morning and each interface has ~10,000 data overruns.

Which buffers do you propose increasing?

Bill:
"I believe data overruns are a count of the number of times the NIC lost an incoming packet."

If we can believe the LANCP counters both interfaces say:

1 Link up transitions ( 7-NOV-2007 13:02:30.40)
0 Link down transitions

NCP circuit says:

0 Circuit down
0 Initialization failure
0 Adjacency down
1 Peak adjacencies

"Does MONITOR MODES show an extremely high interrupt rate?"

No, the system is "idle". I'm only at the point of having upgraded VMS and TCPware so far.

Cheers,
Art
Art Wiens
Respected Contributor

Re: MCR LANCP SHOW DEVICE /ALL /COUNT -> Data overruns

Attached are the /internal_counters.

Art
Volker Halle
Honored Contributor

Re: MCR LANCP SHOW DEVICE /ALL /COUNT -> Data overruns

Art,

this may be related to:

http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=1129303

What's the image ident and link time of your SYS$EIDRIVER ?

Volker.
Art Wiens
Respected Contributor

Re: MCR LANCP SHOW DEVICE /ALL /COUNT -> Data overruns

Volker thanks for the other thread ... I don't recall reading that one. I'll try bumping up the LANCP buffers.

The driver is:

image name: "SYS$EIDRIVER"
image file identification: "X-42"
image file build identification: "XA99-0060111012"
link date/time: 3-AUG-2005 15:20:02.50
linker identification: "A11-50"

which looks to be from VMS732_LAN-V0400.

I started with my existing VMS v7.2-2, upgraded to v7.3-2, applied VMS732_UPDATE-V1300 and then the 6 or so patches since.

I'll try the buffers.

Cheers,
Art
Volker Halle
Honored Contributor

Re: MCR LANCP SHOW DEVICE /ALL /COUNT -> Data overruns

Art,

there seem to be data overruns at a rate of 1% of all incoming frames. Is there an extreme amount of multicast/broadcast traffic on the switch ?

Volker.
EdgarZamora
Trusted Contributor

Re: MCR LANCP SHOW DEVICE /ALL /COUNT -> Data overruns

<< Edgar:
"Is the network heavily utilized? The buffers may need increasing."

It's a busy network and it's in a busy switch, but the ES47 is idle. I look this morning and each interface has ~10,000 data overruns.

Which buffers do you propose increasing? >>

Sorry for delayed response, was stuck in a meeting. I meant the receive buffers:

LANCP> SET/DEFINE DEV /MIN /MAX

Good luck.
Art Wiens
Respected Contributor

Re: MCR LANCP SHOW DEVICE /ALL /COUNT -> Data overruns

I'm not sure how to confirm or deny that there is an extreme amount of multi-cast traffic. I spoke with the network folks again and they don't seem to think it's excessive.

I have attached the switch port counters ... clean.

Art
Volker Halle
Honored Contributor

Re: MCR LANCP SHOW DEVICE /ALL /COUNT -> Data overruns

Art,

LANCP> SHOW DEV/COUNT will show the counters for the LAN devices. Counters will be shown for total and multicast traffic.

An ES47 should not have problem handling about 30-50 packets/sec - as the switch data seems to show - although they are averaged over 5 minutes, so there could well be spikes.

Volker.
Art Wiens
Respected Contributor

Re: MCR LANCP SHOW DEVICE /ALL /COUNT -> Data overruns

/min=512 /max=1024 doesn't seem to have done anything.

How high a value is "reasonable"?

Any downside to setting these buffers "very high"?

Art
Volker Halle
Honored Contributor

Re: MCR LANCP SHOW DEVICE /ALL /COUNT -> Data overruns

Art,

try to determine the average incoming packet rate from the LANCP counters. Do the Data overrun counters increase steadily or in bursts ? Try to record the counters every 10 seconds and determine, when they increment.

Specyfing too many buffers would just waste nonpaged pool, if they are not being used.

Volker.

Volker Halle
Honored Contributor

Re: MCR LANCP SHOW DEVICE /ALL /COUNT -> Data overruns

Art,

could you please provide the following counters:

$ MC LANCP SHO DEV/INT/DEBUG

Volker.
Art Wiens
Respected Contributor

Re: MCR LANCP SHOW DEVICE /ALL /COUNT -> Data overruns

Here you go.

BTW, I have logged a call w/HP and I'm working with Tony A. on this issue.

Thanks,
Art
Volker Halle
Honored Contributor

Re: MCR LANCP SHOW DEVICE /ALL /COUNT -> Data overruns

Art,


BTW, I have logged a call w/HP and I'm working with Tony A. on this issue


I know ;-)

Good luck,

Volker.
Richard Stockdale
Frequent Advisor
Solution

Re: MCR LANCP SHOW DEVICE /ALL /COUNT -> Data overruns

I did some experiments. I was a bit concerned that perhaps these packets were full-size VLAN tagged packets and the driver gives the device 1518 byte packets which wouldn't have enough room for the 4 byte VLAN tag (never mind that the driver does not support VLANs on the device and so nothing would have worked anyway).

Then Tony Abdella sent me a packet trace and the problem was immediately obvious. As Bill Hall said in one of the replies - the DE600 is not as good as the DE500 in some regards. The DE600 is actually much better than the DE500 for large segmented or chained transmit packets but much worse than the DE500 for small packets.

The packet trace showed an ARP request followed by a flood of minimum size packets. The DE600 just can't cope with line rate minimum size packets, and so you see data overruns and lost packets.

The only choice is to get a more capable NIC, such as a DEGXA which can run at line rate on any packet size at 100 mbits.

Note that the DEGXA, like most gigabit NICs can't run at line rate at 1000 mbits/sec for packet sizes below a few hundred bytes (at least with a conventional interrupt-based system implementation), but it does quite well and has enough buffering to withstand large bursts of line rate minimum size packets at 1000 mbits/sec.

-Dick
Bill Hall
Honored Contributor

Re: MCR LANCP SHOW DEVICE /ALL /COUNT -> Data overruns

Art,

If you are interested in taking Dick's advice and would like to "upgrade" to the DEGXA but need the dual port density of the DE602, I highly recommend Nemonix Engineering's DEGX2-TA. We have both ES40s and ES47s with four DEGX2-TA in each.

If I recall correctly, at VMS 7.3-2 the version of the SYS$EW5700 driver that supports both the HP DEGXA-TA and the Nemonix DEGX2-TA to is included in one of the VMS LAN ECOs. I don't recall which one.

The price on the Nemonix DEGX2 is very good also.

Bill
Bill Hall
Art Wiens
Respected Contributor

Re: MCR LANCP SHOW DEVICE /ALL /COUNT -> Data overruns

Based on the findings by HP support / engineering, and testing the DEGXA 1Gb card HP loaned us, we have 12 - DEGXA's on order.

I still find it hard to believe the DE60x (we also tested a single port DE600) can be overrun, it was only ~400 multicast packets, but in the end, going "faster" is always desirable! ;-)

Cheers, and thanks HP!
Art
Art Wiens
Respected Contributor

Re: MCR LANCP SHOW DEVICE /ALL /COUNT -> Data overruns

DE60x 100Mb nics were being overrun by "bursts" of multicast packets when connected to Cisco 3750 switches.

The DEGXA 1Gb nic does not exhibit this issue.

Art