Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

LAT Port Counter Bugs OpenVMS/VAX v6.2 - fixed in later releases?

 
Mark_Corcoran
Frequent Advisor

LAT Port Counter Bugs OpenVMS/VAX v6.2 - fixed in later releases?

Last week, I was looking at reports of a label not having been printed on one of our industrial label printers, and ended up looking at the counters for the associated LTAnnnn: port that drives the printer.

[Most of our DECservers don't have a new enough version of NAS software that includes per-port bytes TXed/RXed counts, so I had been relying on looking at the LAT port counters to confirm whether or not the application delivered the messages to LAT for onward delivery to the DECserver - i.e. that it wasn't an application fault per-se.

We do have two later versions of NAS that supports per-port bytes transmitted/received ("Input Characters" and "Output Characters") on PCMCIA cards in two different models of DECservers, but I haven't worked out a way (never mind the legitimacy) of getting that as a file onto our OpenVMS systems for downline-loading to DECservers without PCMCIA cards]

As I checked counters and waited for another broadcast to the printer, I was surprised to see that the Bytes Transmitted count reported for the LAT port was far less than I expected.

After some experimentation and then use of Wireshark to capture and examine the associated Ethernet frames, I've found two problems...

1) If the payload of a message exceeds the maximum number of bytes that can be specified in a Data_a slot, then it is split across multiple slots and (potentially) across multiple Ethernet frames.

For any slots that have a Slot Byte Count of the maximum number of bytes, LAT does not add that slot's SBC to the port's Bytes Transmitted count.

[On Page 4-57 of the LAT Specification (AA-NL26A-TE, June 1989), for section 4.4.1.3 (Start Slot), it indicates that:
The minimum slot size queued to receive Data_a and Data_b slots (not including the slot header).
The system receiving this message must limit transmitted Data_a and Data_b slots to this size.

It indicates that the size of the Minimum Data Slot Size field is 1 byte, but doesn't indicate whether or not it is signed or unsigned.

The Slot Byte Count is indicated as being an 8-bit unsigned integer and could theoretically permit 255 characters, but traces show that the Minimum Data Slot Size in the Start Slot is being set to 250]

2) If the payload of a message includes the Carriage Return character, LAT (presumably LTDRIVER.EXE, but it could be LATACP.EXE) splits the message across two or more Ethernet frames, and only the Slot Byte Count from the first slot including the (first) Carriage Return is added to the port's Bytes Transmitted count.

[In actual fact, it might be worse than that - if you have say a 260-byte message and the 255th character is a Carriage Return, I'm not sure what gets counted - this wasn't one scenario that I tested]

 

I was wondering whether or not anyone with access to release notes extending from after OVMS v6.2 might have any indication that such bug(s) have been fixed, or if anyone with a more recent version of OVMS and who is still using DECservers, could try the reproducer below, and see if they have the same problem?

For what it's worth, ANALYZE /IMAGE of...

SYS$COMMON:[SYSEXE]LATACP.EXE gives an image file ID of "V2.0-181A", and a link date/time of 22-APR-1995 00:45:09.51

SYS$SYSDEVICE:[SYSE.SYSCOMMON.SYS$LDR]LTDRIVER.EXE gives an image file ID of "V6.1-525A" and a link date/time of 22-APR-1995 00;43:54.63

 

Reproducer

Set up a DECserver port as follows:

Port 16:                               Server: DS700A

Character Size:            8           Input Speed:               9600
Flow Control:            XON           Output Speed:              9600
Parity:                 None           Signal Control:        Disabled
Stop Bits:                 1           Signal Select:  CTS-DSR-RTS-DTR

Access:               Remote           Local Switch:              None
Backwards Switch:       None           Name:                   PORT_16
Break:              Disabled           Session Limit:                4
Forwards Switch:        None           Type:                      Soft
Default Protocol:        LAT

Preferred Service: None

Authorized Groups:   0
(Current)  Groups:   0

Enabled Characteristics:
Failover,  Input Flow Control,  Lock,  Output Flow Control,  Verification

LAT port is created as follows:
$ MC LATCP CREATE PORT LTA1016:
$ MC LATCP SET PORT LTA1016: /NODE=DS700A /PORT=PORT_16 /APPLICATION /NOQUEUED

Terminal characteristics (SET TERMINAL /PERMANENT) as follows:
Terminal: _LTA1016:   Device_Type: Unknown       Owner: No Owner

   Input:    9600     LFfill:  0      Width: 511      Parity: None
   Output:   9600     CRfill:  0      Page:   72

Terminal Characteristics:
   Interactive        Echo               Type_ahead         No Escape
   No Hostsync        TTsync             Lowercase          No Tab
   No Wrap            Scope              No Remote          Eightbit
   Broadcast          No Readsync        Form               Fulldup
   No Modem           No Local_echo      No Autobaud        Hangup
   No Brdcstmbx       No DMA             No Altypeahd       Set_speed
   No Commsync        Line Editing       Overstrike editing No Fallback
   No Dialup          No Secure server   No Disconnect      Pasthru
   No Syspassword     No SIXEL Graphics  No Soft Characters No Printer Port
   Numeric Keypad     No ANSI_CRT        No Regis           No Block_mode
   No Advanced_video  No Edit_mode       No DEC_CRT         No DEC_CRT2
   No DEC_CRT3        No DEC_CRT4        No DEC_CRT5        No Ansi_Color
   VMS Style Input

Test #1
$ COPY TT: TEST.TXT
01 34567890123456789
02 3456789
03 3456789
04 3456789
05 3456789
06 3456789
07 3456789
08 3456789
09 3456789
10 3456789
^Z
$!Above, a Variable-Length max 255 bytes Record Format, Carriage Return Carriage
$!Control Record Attributes file is created
$ MC LATCP ZERO COUNTERS /PORT=LTA1016:
$ COPY TEST.TXT LTA1016:
$ MC LATCP SHOW PORT LTA1016:/COUNTERS

The COPY of TEST.TXT to the LAT port resulted in 2 Ethernet frames, with a Data_a slot in each, the first containing 23 bytes (the COPY results in a <CR><LF> being sent immediately prior to the file, and the remaining 21 bytes are the string "01 34567890123456789" followed by the implicit <CR> at the end of that line which is present in the file and also sent, hence 23 bytes), and the second containing 117 bytes (the rest of the data;  conspicuously, there is no data slot created at each <CR> instance, seemingly only the first);  however, the LATCP SHOW PORT /COUNTERS indicates that only the Data_a slot containing the 23 bytes was counted.

 

Test #2
Create a file called 240_CHARS.TXT using EDIT /TPU, containing a single line with 24 instances of the string 1234567890 one after the other.

Create a file called 260_CHARS.TXT using EDIT /TPU (EDT won't allow more than 255), containing a single line with 26 instances of the string 1234567890 one after the other.

$ MC LATCP ZERO COUNTERS /PORT=LTA1016:
$ COPY 240_CHARS.TXT LTA1016:
$ MC LATCP SHOW PORT LTA1016: /COUNTERS

The COPY of 240_CHARS.TXT to the LAT port resulted in 1 Ethernet frame, with a single Data_a slot, of 243 bytes (the COPY results in a <CR><LF> being sent immediately prior to the file, and although you entered 240 characters, there is an implicit <CR> at the end of the line which is present in the file and also sent, hence 243 bytes).

The LATCP SHOW PORT /COUNTERS shows that 243 bytes were Transmitted, matching the Slot Byte Count in the Data_a slot.

$ MC LATCP ZERO COUNTERS /PORT=LTA1016:
$ COPY 240_CHARS.TXT LTA1016:
$ MC LATCP SHOW PORT LTA1016: /COUNTERS

The COPY of 260_CHARS.TXT to the LAT port resulted in 2 Ethernet frames, with a Data_a slot in each, the first containing 250 bytes (the COPY results in a <CR><LF> being sent immediately prior to the file, which is then followed by the next 248 characters (up to and including the '8' of the 25th instance of "1234567890")), and the second containing 13 bytes (the remaining "901234567890" and the implicit <CR> at the end of the line which is present in the file and also sent, hence 13 bytes);  however, the LATCP SHOW PORT /COUNTERS indicates that only the second Data_a slot containing the 13 bytes was counted.

[Formerly appearing as woeisme]
3 REPLIES 3
Volker Halle
Honored Contributor

Re: LAT Port Counter Bugs OpenVMS/VAX v6.2 - fixed in later releases?

Mark,

although I don't have access to a DECserver, I've repeated your test #1 using SET HOST/LAT from an OpenVMS VAX V7.3 system to another OpenVMS node. I've copied TEST.TXT to the 'outgoing' LTA device on the source node.

CHARON $ mc latcp sho port

Port Name  Port Type    Status         Remote Target (Node/Port/Service)
---------  -----------  -------------  -----------------------------------
_LTA5007:  Fwd. (NonQ)  Active         //VAXVMS

...

CHARON $ mc latcp sho port lta5007/count

Port Name:  _LTA5007:

Seconds Since Zeroed:                 2
Remote Accesses:                      0   Framing Errors:             0
Local Accesses:                       0   Parity Errors:              0
Bytes Transmitted:                    0   Data Overruns:              0

...

CHARON $ copy test.txt lta5007:
CHARON $ mc latcp sho port lta5007/count

Port Name:  _LTA5007:

Seconds Since Zeroed:                13
Remote Accesses:                      0   Framing Errors:             0
Local Accesses:                       0   Parity Errors:              0
Bytes Transmitted:                  140   Data Overruns:              0

...

So it seems, that the LAT port counter bug has been fixed (at least in V7.3). 

Volker.

Mark_Corcoran
Frequent Advisor

Re: LAT Port Counter Bugs OpenVMS/VAX v6.2 - fixed in later releases?

Hi Volker, thanks for taking the time to try this - it wasn't something I'd considered.

I've not normally used LATCP for much more than SHOW commands or creating LAT ports;  an attempt to SET HOST /LAT to the other test node on the same VLAN failed with:

%LAT-F-SRVDIS, outgoing connections are disabled

I worked out that I needed to do MC LATCP SET NODE /CONN=BOTH (it was just set to INCOMING, probably as part of security lockdown years ago in relation to Sarbanes-Oxley), so tried again, and got:

%LAT-F-NOSRVC, service nodename not known

At that point, I tried SET HOST /LAT own_nodename (i.e. the service name is the same as SCSNODE and the DECnet executor node name, as would most commonly be the case).

This allowed the connection, and when I copied the file to the LAT port, the Bytes Transmitted count increased correctly just as they did for you.

I would guess that the %LAT-F-NOSRVC might require me to do an MC LATCP SET NODE /SERVICE_RESPONDER on the outgoing node, and wait long enough for the multicast messages to be broadcast and for the service name of the target node to be added to the service database on the outgoing node.

I'm nearing the end of shift now, so I will try that tomorrow - just to try and replicate the test as close to yours as possible, but it does look like SET HOST /LAT uses a different path through the code, and doesn't hit the bug (unless there is something "special" about doing SET HOST /LAT to yourself (my recollection is that in DECnet (Phase IV at least), the behaviour for SET HOST 0 is not the same as SET HOST another_node)

I'll report back tomorrow.

[Formerly appearing as woeisme]
Highlighted
Mark_Corcoran
Frequent Advisor

Re: LAT Port Counter Bugs OpenVMS/VAX v6.2 - fixed in later releases?

Follow-up to my post on Thursday of last week - I did perform the test on Friday that I had intended (after executing MC LATCP SET NODE /SERVICE_RESPONDER on the outgoing node, and waiting for the LAT service announcement multicast/broadcast message from the target node to be added to the service database on the outgoing node), but had to leave work early, and didn't get time to post an update.

Copying the TEST.TXT file to the outgoing LAT port on the outgoing node (after having SET HOST /LAT to the target node) similarly resulted in the Bytes Transmitted count for the port reported by MC LATCP SHOW PORT LTAnnnn /COUNTERS increasing by 140.

So, this only proves that this particular path through either LTDRIVER or LATACP (whichever is responsible for updating the per-port counters) does not induce the bug.

At the moment, there's no indication that the bug is fixed in later versions of OpenVMS (be it VAX, AXP or Itanium), unless anyone else pipes up.

Suggestions on how to extract the NAS software from the PCMCIA flash card of a DECserver 700 into a suitable MOM$SYSTEM:WWENG2.SYS would be gratefully received (all I'm trying to achieve is a way reliable way of detecting the number of bytes sent to/received from a DECserver port;  NAS 2.3A supports per-ports bytes count statistics on the DS700 itself).

[I'm not even sure we have any laptops with PCMCIA slots, much less whether or not the card from a DS700 would be readable;  but I'll take a look.

It's possible to enable crash dumps on a DECserver, but on the handful that have had (most likely) hardware faults that induced software exceptions, the resulting dump files are - as you might expect - the same size as the amount of memory in the DS700.

So, even if I could somehow induce a software exception on a DS700 with a recent version of NAS software, I doubt I'd be able to extract it from the dump file]

Mark

[Formerly appearing as woeisme]