Operating System - OpenVMS
1828196 Members
2297 Online
109975 Solutions
New Discussion

VMS keeps disabling the ethernet port

 
SOLVED
Go to solution
Mark Berryman
Occasional Advisor

VMS keeps disabling the ethernet port

I have installed VMS V8.3 on an RX2620 (I do not have -1h1 at this time). I can perform DECnet transactions all day long without issue. However, once a TCP/IP stack has been started and a small amount of traffic has been exchanged, VMS issues the error message:
"%EIA0,
Receive CRC validation failure, device now unusable", the link is shutdown and all network traffic stops. This happens on both ethernet ports and with multiple IP stacks. No CRC errors are logged in the LAN counters and the Ethernet connection works fine when plugged into any other device, including the ILO board. I have also verified that speed and duplex are being negotiated correctly. I'm hoping someone here can explain what is happening or, more importantly, knows a way to tell VMS to stop shutting down the port.
15 REPLIES 15
Ian Miller.
Honored Contributor

Re: VMS keeps disabling the ethernet port

What patches do you have?
____________________
Purely Personal Opinion
Hein van den Heuvel
Honored Contributor

Re: VMS keeps disabling the ethernet port

What's on the other side of the wire?
Can you see the speed/duplex selected?
Maybe the TCP/IP stacks use Jumbo-Frames and that fails?

Hein
Volker Halle
Honored Contributor

Re: VMS keeps disabling the ethernet port

Mark,

the most detailled information would be reported by LANCP SHOW DEV/INT EIA0

This error msg does not seem to have been publicly reported before, consider to ask HP what this msg is meant to indicate.

Did this interface work before on this RX2620 ?

Volker.
Mark Berryman
Occasional Advisor

Re: VMS keeps disabling the ethernet port

The system is up-to-date on all patches.

The connection is only running at 100mb and neither side has jumbo frames enabled. Both sides agree on speed and duplex.

I just obtained this system. The previous owner indicates he never ran an IP stack on it.

I will get a copy of the LANCP output and post it here.
Volker Halle
Honored Contributor

Re: VMS keeps disabling the ethernet port

Mark,

here is an answer from an authoritive resource:

'What it means is that the driver is doing a validation of the first 50000 packets for that interface and the calculated CRC is different from the actual CRC. These are packets that passed device validation so no CRC errors are recorded for the device. The CRC error most likely was introduced on each packet during DMA to host memory. The driver is detecting undetected data corruption and turning off the device.

So the device should be replaced.'

IMHO this looks like a hardware problem, if we believe that the driver is coded correctly. This looks like an unusual problem.

Volker.
Jon Pinkley
Honored Contributor

Re: VMS keeps disabling the ethernet port

Volker, that seems like a reasonable explanation, but if the same ethernet driver is being used by the DECnet and IP stacks, why doesn't DECnet traffic trigger the same error?

Jon
it depends
Mark Berryman
Occasional Advisor

Re: VMS keeps disabling the ethernet port

That is my question as well. With this information describing what the system is doing, I booted the system and generated DECnet traffic until the counter showed that 50000 CRC validations had been done. Then I started Multinet. So far, everything is working and the Ethernet interface has not been turned off. I also used SCP to copy a large file onto the RX2620 and then used checksum to compare it to the source. Both ends matched which would indicate that packets are not really being corrupted between the ethernet interface and memory. This is looking more like a bug in the CRC validation code in the driver although why it would care about the difference between DECnet packets and IP packets is a mystery unless it has something to do with the framing difference between 802.3 packets (DECnet V5) and Ethernet V2 packets (IP).
Richard Stockdale
Frequent Advisor
Solution

Re: VMS keeps disabling the ethernet port

The receive CRC check is pretty simple minded - it doesn't matter what type of packet, it just does a CRC calculation of the entire received packet and fails if it doesn't match what was received from the device.

That doesn't mean things won't work as upper layers may be doing checksum calculations and discarding bad packets. Or the corrupted data is in some part of the packet that isn't critical.

As to why DECnet might work and TCP/IP doesn't - it could be the packet size. Or the data in the packet might induce a hardware failure for that packet and not another with a different data pattern. Or the contents of other packets might affect the hardware and cause a bad DMA operation. I'd guess you were just lucky.

As far as running all day long without issue, the test only runs for the first 50000 packets. You could increase this to do the calculation on every packet and see if some DECnet traffic induced the failure.

This check was put into the driver to detect failing AB290A cards and it has been working successfully for several years. I don't think there is a problem with the code.

If you are willing to accept undetected data corruption, you can disable the check using a device-specific function:

$ mc lancp set dev eia/dev=(func="RCRC",value=(low,high))

low,high is a 64 bit number of remaining CRC checks to do, so you could set it to zero and it would do no more checks. The default is value=(50000,0).

- Dick
Volker Halle
Honored Contributor

Re: VMS keeps disabling the ethernet port

Mark,

as the CRC checking code in the driver seems to exist for a reason, you might not want to bet your data on 'luck'. If there is some data corruption in the DMA operation between the interface and host memory, all things are possible: caused by certain bit patterns, message length etc.

Volker.
Richard Brodie_1
Honored Contributor

Re: VMS keeps disabling the ethernet port

"Both ends matched which would indicate that packets are not really being corrupted between the ethernet interface and memory."

You're likely to see transport retries more often than file corruption.
Volker Halle
Honored Contributor

Re: VMS keeps disabling the ethernet port

Mark,

can you imagine any other operating system than OpenVMS, which would go such a route to try to protect your data ? Can you imagine the amount of analysis, which might have gone into detecting this problem in the first place ? OpenVMS LAN engineering is really there to help you protect your data. Consider to follow their advice.

If you want to see the problem yourself, why not try some DECnet MOP LOOP tests with varying bit patterns ? Or try massive DECnet file copies - remember: FAL has it's own end-to-end CRC checks built in.

A historical note: when X.25 networks were in use, you could typically only identify data corruption happening inside the PTT's X.25 network by running DLM (Data Link Mapping = DECnet over X.25) circuits across the X.25 'cloud', because DLM added it's own CRC check into each DECnet packet before transmitting it to the X.25 switch and was checking it after reception and reporting CRC failures. OpenVMS at work trying to protect your data ...

Volker.
Mark Berryman
Occasional Advisor

Re: VMS keeps disabling the ethernet port

I have seen instances where a particular data pattern can cause an interface to misbehave. However, this is rare, and having both interfaces fail the exact same way is even more rare (although still possible when they share some electronics). How many packets does the driver need to see with bad checksums before it declares the port dead?

I have just finished a massive test suite. It consisted of the following protocols:
Native DECnet Phase V
Native DECnet Phase IV
TCP-based traffic
UDP-based traffic
DECnet-over-IP traffic
Both encrypted and non-encrypted traffic (encryption significantly effects data patterns)

The test suite generates packets that run the range from 64 to 1500 bytes with a very wide range of data patterns. Data is verified at both ends of the connection and the protocol counters are monitored. Data is exchanged in both directions. The transmitting end had no packets re-transmitted and neither end had packets with checksum errors. No TCP checksum errors, no UDP, no FAL, no DECnet.

Whatever the driver is complaining about, I cannot see that it is impacting either my data or my throughput.

Is there any way to tell the driver to log the packets that it finds checksum errors in? I'd really like to know why the driver complains but nothing else does.
Volker Halle
Honored Contributor

Re: VMS keeps disabling the ethernet port

Mark,

have a look at SDA> LAN HELP and especially at the SDA> LAN TRACE/CONTEXT output. You may be able to specify tracing special events for this interface.

Otherwise let's wait for Dick Stockdale to respond to this question.

Volker.
Richard W Hunt
Valued Contributor

Re: VMS keeps disabling the ethernet port

This is a shot in the dark, but ...

By any chance is there some sort of compression interface running between the Ethernet I/F and its local destination?

Also, if there is twisted pair involved, I would check for abrasion on the cables.

Finally, is there a time-of-day or event-of-day correlation with something like a janitorial visit with a big floor buffer?
Sr. Systems Janitor
Mark Berryman
Occasional Advisor

Re: VMS keeps disabling the ethernet port

Final report:

As part of testing to discover the root problem, I changed the port speed from 100m to 1G (I had it set at 100m because all the other devices on the network are at that speed). At this point the problem went away. I set the CRC counter to a very high number and have been running in full "production" for over a week, exchanging millions of packets, without a single CRC failure.

Why changing the port speed would make a difference to this issue is something I don't know enough about the internal workings of this particular hardware interface to figure out. However, as I am now able to run with CRC checking on full time without any errors, I am satisfied I have a working configuration. Should this problem resurface, I have added a DE-504 to the system which will provide all of the spare ethernet ports I may need.

If possible, I would still like to get answers to the following questions:

1. How many CRC failures must occur before the device is shut down?
2. Is it possible to log the failing packet(s)? For example, are they copied into the trace buffer?
3. Is the CRC check done while the packet is still in the ring buffer or elsewhere?

Thanks for everyone's feedback.