Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Data Access Protocol error detected; DAP code = 00019008

 
SOLVED
Go to solution
Art Wiens
Respected Contributor

Data Access Protocol error detected; DAP code = 00019008

We are seeing the above message occasionally when using different commands over a Decnet link between two systems - AS800 v7.2-2 and a DS10 v8.3 both Decnet IV, 100Mb/full (locked down), both plugged into the same Cisco 2950 switch. We have seen it when either side is initiating the activity. DIFFERENCE, COPY and DIRECTORY have all produced the error.

I logged a call with HP but I'm having a hard time convincing our network folks that according to HP, the problem 100% of the time is a faulty network switch. The switch is apparently silently allowing data corruption, but Decnet does some additional CRC check and detects an inconsistency. I can almost believe that but a DIRECTORY command? What does it CRC check?

There is "lots of traffic" to/from these two systems (both are test/dev environments). I'm regularily refreshing test areas with 10-20M blocks worth of stuff with both Decnet and FTP ... I've personally never had the problem but others have.

The v8.3 system was a copy of the v7.2-2 system, upgraded quite some time ago (May-2008).

I hate to second guess VMS support (sorry guys), but does anyone else have experience with this error code? There is precious little to be found with Mr. Google on the subject. The network guys are drunk on Cisco Kool-aid ;-).

Cheers,
Art
13 REPLIES
Richard Brodie_1
Honored Contributor

Re: Data Access Protocol error detected; DAP code = 00019008

What does mc lancp show device/count say about the hardware?
Volker Halle
Honored Contributor
Solution

Re: Data Access Protocol error detected; DAP code = 00019008

Art,

could you please show the REAL COMPLETE error message in the correct context ? I assume it's a %RMS-F-BUG_DAP error.

The DAP code of 00019008 consists of the RMS Facility code (RMS$_FACILITY = 0001) and the MAC:MIC code field in the low 16 bits (low 12 bits are MIC code followed by 4 bit MAC code).

MAC:MIC %o11 : %o10 decodes as:

11 Invalid - Field of message is invalid (e.g., bits that are meant to be mutually exclusive are set, an undefined bit is set, a field value is out of range or an illegal string is in a field).

00 10 DAP message type field (TYPE) error

This seems to indicate, that the remote FAL process, when parsing a received DAP message, found an error in the TYPE field of the received DAP message.

The data 'on the wire' should be protected by the Ethernet Frame level CRC check, which the LAN interfaces generate on transmit and check on receive. I don't believe a switch will re-assemble the Ethernet packet.

But the data is NOT protected between memory and the LAN interface, except by the overall DECnet DAP level CRC, which will be checked when closing the file/link - and you didn't get that far here ! Otherwise you would have received %RMS-F-CRC, network DAP level CRC check failed

Diagnosing this kind of intermittent failure can be very hard. You would have to capture all the DECnet (RMS<->FAL) traffic on the wire (i.e. in the switch) and diagnose the messages at the DAP protocol level, if this problem has shown up.

You may also try to enable FAL logging to capture the data in the NET*SERVER.LOG file and diagnose them, if the problem has shown up again:

$ DEFINE/SYSTEM FAL$LOG FFFF (on the system running FAL). This may - of course - create big .LOG files.

Volker.
Hoff
Honored Contributor

Re: Data Access Protocol error detected; DAP code = 00019008

I've seen similar technical and organizational sequences. Variously triggered by a faulty card, or down-revision firmware in the Cisco gear, or occasionally by a bad patch cable.

The preferred response from an experienced networking team is a replacement network path. That alternative path then typically clearly either exonerates or incriminates the networking path as the culprit. This then (also) leads to an investigation of the switch and switch port, and of testing the cables.

If the boxes are in reasonable physical proximity as might be inferred here and if your networking team is not offering an official alternative path, then supplant the (potentially faulty) switch gear with a commodity unmanaged gigabit switch.

This switch is US$50 to US$75 or so from various and sundry reputable vendors, if you don't have a spare around and actually have to go buy one.

Also replace the patch cables while you're at it. Those are good for the occasional flaky.

With a typical gigabit switch (or with an older switch with an uplink port), you can also then uplink into the existing switch if and as needed.
Art Wiens
Respected Contributor

Re: Data Access Protocol error detected; DAP code = 00019008

Richard:

From the switch side, the port counters are "clean", but in LANCP, both systems show a large number of Receive data length errors (80M), but relative to 59G and 898G Bytes received (the v7.2-2 system - 898G - has been up for 476 days).

Volker:

As a result of a DIFF command:
%DIFF-F-OPENIN, ...
-RMS-F-BUG_DAP, Data Access Protocol error detected; DAP code = 00019008

As a result of a DIR command:
%DIRECT-E-OPENIN, ...
-RMS-F-BUG_DAP, Data Access Protocol error detected; DAP code = 00019008

As a result of a COPY command:
%COPY-E-OPENIN, ...
-RMS-F-BUG_DAP, Data Access Protocol error detected; DAP code = 00019008

Hoff:

If I supplant their network switch with my own, I think I can guess what would get supplanted next ;-) I will get them to setup two other ports on another switch as a step in trying to resolve this.

Thanks,
Art
Volker Halle
Honored Contributor

Re: Data Access Protocol error detected; DAP code = 00019008

Art,

in case this is NOT caused by the switch, do you have a 3rd OpenVMS node around, from which you could do DIR node:: to those 2 nodes involved in this problem, to see if this can be pinned to one node or the other ?

Volker.
Art Wiens
Respected Contributor

Re: Data Access Protocol error detected; DAP code = 00019008

Like I said, both of these systems are recipients of fairly regular test area refreshes. The sources are other v8.3 systems, other v7.2-2 systems and sometimes even the occasional VAX v6.2 system! The time required to do these data transfers is usually 20 - 30 minutes (ie. steady traffic flow) ... I have never had it happen to me which is confusing - why does something as simple as a DIR fail, yet 30 minutes of sustained activity goes by fine? It almost seems like it's only when the two test systems, but that was also disproved yesterday when I had to tape restore to the v7.2-2 system and then used Decnet to copy over two 7M block savesets to the v8.3 system - no issue.

We'll go with different switch ports first. I'll take the opportunity to reboot (maybe 476 days is too long ... don't frown, power down ;-)

Cheers,
Art
Volker Halle
Honored Contributor

Re: Data Access Protocol error detected; DAP code = 00019008

Art,

according to the error messages posted, the BUG_DAP error seems to be (mostly ?) happening while opening the remote input file and not during a file/data transfer while reading or writing the data over the DECnet link.

I assume these error show up nearly immediately after the command has been issued. So these errors may be triggered by certain data pattern in the received DAP message, which only show up during the early link establishment.

Volker.
Richard Brodie_1
Honored Contributor

Re: Data Access Protocol error detected; DAP code = 00019008

"both systems show a large number of Receive data length errors."

The doc says that's only logged with 802.3 format packets, which is slightly odd. My guess would be that it's flagging some passing traffic on the network and unrelated.

I suppose it could be that the switch is corrupting packets in memory and forwarding them with a new, good checksum. That's sounding less likely, especially as you don't see it on bulk transfers.
Art Wiens
Respected Contributor

Re: Data Access Protocol error detected; DAP code = 00019008

This may or may not be a duplicate post, it looked like it accepted my reply, but presented a blank page. Anyways ...

FYI, attached are the switch port counters. Looks clean to me. Port 22 is the v7.2-2 system and 18 is the v8.3 system.

Cheers,
Art
Hoff
Honored Contributor

Re: Data Access Protocol error detected; DAP code = 00019008

>If I supplant their network switch with my own, I think I can guess what would get supplanted next ;-)

Why, the networking team, of course.

> I will get them to setup two other ports on another switch as a step in trying to resolve this.

Work with your boss here, too.
Art Wiens
Respected Contributor

Re: Data Access Protocol error detected; DAP code = 00019008

FYI's, on Sunday I moved the two systems to a completely different Cisco 3750 switch with different ethernet cables and the problem has just reoccured.

I have logged another call with HP.

Cheers,
Art
Volker Halle
Honored Contributor

Re: Data Access Protocol error detected; DAP code = 00019008

Art,

as you seem to have a non-intrusive method to reproduce/show this problem (with DIR node::), why not run a couple of batch jobs from different nodes and/or between these 2 doing DIR node:: in a loop and watching the number of failures and recording a failure matrix: how may failures between which pairs of nodes ? One would hope, that this should point to some common path ...

Volker.
Volker Halle
Honored Contributor

Re: Data Access Protocol error detected; DAP code = 00019008

Art,

did you try to provoke the error by using NCP LOOP NODE WITH {mixed,ones,zeros} ? Or with NCP LOOP CIRCUIT ?

Volker.