Operating System - OpenVMS

Significance of TTY$V_ST_CTRLS bit in UCB$L_TT_STATE1 longword in the UCB structure?

 
Mark_Corcoran
Frequent Advisor

Significance of TTY$V_ST_CTRLS bit in UCB$L_TT_STATE1 longword in the UCB structure?

On Monday, following issues with a serially-driven line printer, I was trying to help our operations team make sure it was printing properly by getting one of our applications to re-issue print jobs to it.

The operations colleague said nothing was coming out of the printer (or even spooling up), and a quick check with an a application command revealed that it still had I/Os that had not yet completed.

The printer is connected to a DECserver 700-16, and normally when this kind of issue occurs, it's because the DECserver port (with XON/XOFF flow control enabled) is in an state of Output XOFFed: Yes

SHOW PORT n STATUS on the DECserver indicated that it wasn't in an XOFF state, but a LOGOUT of the port caused the application to start sending data, until later on when it got into the same state whilst my colleague was making further adjustments to the printer (again, a LOGOUT of the port "fixed" the problem).

Each time, before issuing the LOGOUT PORT command, I ran a command procedure I wrote a while ago, to get all available information relating to a port:

SHOW TERMINAL LTAnnnn:
MC LATCP SHOW PORT LTAnnnn:
MC LATCP SHOW PORT LTAnnnn: /COUNTERS
MC LATCP SHOW LINK /COUNTERS
SDA> SHOW DEVICE LTAnnnn:
SDA> READ SYS$SYSTEM:SYSDEF.STB
SDA> FORMAT ucb_address /TYPE=UCB
TSM> USE SERVER xxxxxx
TSM> SHOW SERVER STATUS
TSM> SHOW SERVER COUNTERS
TSM> SHOW PORT nn CHARACTERISTICS
TSM> SHOW PORT nn STATUS
TSM> SHOW PORT nn COUNTERS

When I got time to look at the information recorded, the only thing that was unusual was values in the UCB structure.

The application makes and keeps open a connection to the LAT port, regardless of whether or not there is anything to send to the port.


In an idle state, the device looks like this in SDA:

I/O data structures
-------------------
LTA5106 Unknown UCB address: 879BBE00

Device status: 00000010 online
Characteristics: 0C040007 rec,ccl,trm,avl,idv,odv
00000200 nnm

Owner UIC [000001,000004] Operation count 212 ORB address 878D9700
PID 0001005D Error count 0 DDB address 878A1C00
Class/Type 42/00 Reference count 2 DDT address 87A5B140
Def. buf. size 132 BOFF 02CC CRB address 878A1C80
DEVDEPEND 480810A0 Byte count 0000 AMB address 8794A200
DEVDEPND2 00001004 SVAPTE 00000000 I/O wait queue empty
FLCK index 34 DEVSTS 000C
DLCK address 86F9F800

*** I/O request queue is empty ***


And the pertinent fields from the UCB are:
879BBE78 UCB$L_STS 00000010
UCB$W_STS
879BBE7C UCB$W_DEVSTS 000C
879BBECC UCB$L_TT_STATE1 00000000


When the application's IO$_WRITEVBLK got stuck the first time
The SHOW DEVICE in SDA reported BOFF as 0000, Byte Count as 0010 and DEVSTS as 0004, otherwise it was the same (other than Operation count).

The pertinent fields from the UCB were:
879BBE78 UCB$L_STS 00000010
UCB$W_STS
879BBE7C UCB$W_DEVSTS 0004
879BBECC UCB$L_TT_STATE1 00000402


When IO$_WRITEVBLK got stuck the second time
The SHOW DEVICE in SDA reported Byte Count as 0000 and DEVSTS as 000C, otherwise it was the same (other than Operation count).

The pertinent fields from the UCB were:
879BBE78 UCB$L_STS 00000010
UCB$W_STS
879BBE7C UCB$W_DEVSTS 000C
879BBECC UCB$L_TT_STATE1 00000402


Whilst conducting testing with a different LAT port today, where I had a VT terminal connected to the DS700-16 and Hold Screen pressed:
The SHOW DEVICE in SDA reported Byte Count as 0010 and DEVSTS as 0004, otherwise it was essentially the same (other than Type, DEVDEPEND, DEVDEPND2).
The pertinent fields from the UCB were:
879D19F8 UCB$L_STS 00000012
UCB$W_STS
879D19FC UCB$W_DEVSTS 0004
879D1A4C UCB$L_TT_STATE1 00000400


Compared to an idle state, when the two IO$_WRITEBLK were stuck until a LOGOUT PORT was issued, they differed by virtue of bits 1 and 10 being set in UCB$L_TT_STATE1.

Compared to an idle state, when I had specifically XOFFed a similar port (and SHOW PORT STATUS confirmed it) then tried to send a print job to it, it differs by virtue of bit 10 being set in UCB$L_TT_STATE1.

My understanding of $TTYUCBDEF.MAR from SYS$LIBRARY:LIB.MLB is that:
UCB$L_TT_STATE1 bit 1 = TTY$V_ST_CTRLS
UCB$L_TT_STATE1 bit 10 = TTY$V_ST_WRITE

The TTY$V_ST_WRITE would be consistent with a pending IO$_WRITEVBLK, and I don't have any issue with that.

The only reference I can find to TTY$V_ST_CTRLS is in the OpenVMS AXP Device Support: Reference manual (AA-Q28PA-TE, MAR-1994), which "describes" it as "Class Output".

Given that the IO$_WRITEVBLK was still pending - in a state remniscent of an XOFFed port - my suspicion would be that the "CTRLS" actually pertains to CTRL-S, and that "Class Output" is a documentation error.

Does anyone have any more information on this bit of the UCB$L_TT_STATE1 longword (particularly under what circumstances it gets set)?

[The DECserver port in question has got Flow Control set to XON, so it is locally managed by the DECserver - if the printer sends an XOFF to the port, the DECserver implements flow control by reducing the credit count in the LAT slot to 0 (or not incrementing it from 0 back up to 1) - thus preventing LAT on OpenVMS from sending anything more for that circuit.

Tracing via Wireshark showed that Run messages would be sent periodically, and the circuit was never torn down;  from a quick glance at the LAT specification, even if the network switches happened to drop an Ethernet frame, it looks like LAT would recover from this, and not induce some weird state for the LTA device]

Whilst I can log a DECserver port out when a LAT device gets into this state - thereby "fixing" the problem (by virtue of the code dealing with the SS$_HANGUP that gets delivered to it) - I'd rather prevent it from getting into this state in the first place.

Or at least, find a means of detecting it (although since the IO$_WRITEVLBK isn't completing, this would probably have to be periodically processing the output of SDA - far from ideal).

FWIW, OpenVMS VAX V6.2 on CharonVAX

 

Any (helpful) suggestions greatly received.

Mark

[Formerly appearing as woeisme]
8 REPLIES 8
Volker Halle
Honored Contributor

Re: Significance of TTY$V_ST_CTRLS bit in UCB$L_TT_STATE1 longword in the UCB structure?

Mark,

TTY$V_ST_CTRLS (or TTY$M_ST_CTRLS) actually indicate, that the TT class driver is to 'STOP OUTPUT TO THE TERMINAL LINE' (see routine TTY$STOP in module [TTDRVR]TTYSUB).

The routine TTY$STOP is called from the CONTROLS handler in module [TTDRVR]TTYCHARI, when a CTRL-S has been received on the port and TTSYNC is not set. It's the only place in the [TTDRVR] facility, where this bit is set.

The TTY$V_ST_CTRLS bit is cleared in routines TTY$ABORT (Abort current output activity) and TTY$RESUME  (CONTINUE OUTPUT ON A LINE).

Based on this, I would assume that a CTRL-Q is getting lost somewhere between the physical printer, the LAT port driver and the terminal class driver...

Volker.

Mark_Corcoran
Frequent Advisor

Re: Significance of TTY$V_ST_CTRLS bit in UCB$L_TT_STATE1 longword in the UCB structure?

Hi Volker, thanks for your reply.

I did see it last Sunday, and have spent all week (barring two days for an eye complaint that involved a hospital visit) testing/trying to replicate the problem.

 

>TTY$V_ST_CTRLS (or TTY$M_ST_CTRLS) actually indicate
Of course, the TTY$V_ST_CTRLS that I mentioned is the bit number;  TTY$M_ST_CTRLS is the bit value - thanks for the correction


>(see routine TTY$STOP in module [TTDRVR]TTYSUB)
Sounds suspiciously like you have access to a(n historic) copy of sources.

Unfortunately, none of my employers in the last 32 years have had cause to pay for them, so that's sadly not an option for me.

However, given the amount of time I spend looking at some issues, it makes me wonder if it would justify the personal expense of purchasing a set (if this was even still possible, and how relevant any such version of sources would be to the version of OVMS I'm currently using or likely to use in the future).


>Based on this, I would assume that a CTRL-Q is getting lost somewhere between the physical printer, the LAT port driver and the terminal class driver...
During my testing at the end of the previous week, and also the start of the week just gone, I have been seeing distinctly odd behaviour, where XOFF from a VT terminal seemed to behave differently from the XOFF sent by the printer.

I had thought that during the testing it appeared to be the case that a SHOW PORT n STATUS on the DS700 was indicating that when the VT terminal was connected, "Input Signals" did not show RXD, whereas it was continually asserted by the printer (by virtue of the printer TXD being crossed to the DECserver RXD).

However, testing yesterday (which may not have involved the same Heath Robinson assortment of cables and connectors/converters (from MJ8 to DB9 to DB25 and back down again)) seemed to show that RXD was always asserted (even when the VT terminal was not transmitting).

My testing yesterday culminated in 16 combinations of TTSYNC/NOTTYSNC and TYPEAHEAD/NOTYPEAHEAD on the LTA terminal device, Remote Modification enabled/disabled on the DS700 port, and power-cycling the printer versus taking it offline and putting in online again.

I need to review the notes that I made for each test, along with the Wireshark traces (the notes were in NOTEPAD, with a helping of copy/paste/edit, and it looks like on at least one of them, I may not have updated part of the text duplicated from a previous test).

Thanks to Wireshark, I found decidedly odd behaviour on the part of the DECserver when the port was XOFFed:

  • Furiously sending Data_b slots of type "Report" (at a rate of 12 per second - the NIC runs at 10Mbps, so even on a lightly loaded LAT host (typically, but not exclusively OpenVMS), it wouldn't be much quicker)
  • Occasionally sending Data_a slots to the OVMS host, containing spurious characters which had not been received on the serial port (two seperate serial analysers saw nothing)

Depending on the terminal TYPEAHEAD and DS700 port Remote Modification settings, the latter would/could cause ?the terminal driver? to send ASCII <BEL> characters to the DECserver for onward delivery to the serial port (this is by virtue of the Bell-on-Discard setting in the Data_b message of type Request that OVMS sends to the DECserver), which then started a deadly embrace when it was received by the printer that was in an Offline state (it sent further XOFFs in response).

The TYPEAHEAD and Remote Modification settings could also cause the DS700 itself to send ASCII <BEL> characters to the printer instead of OVMS sending it.

[This was certainly the case on power-up, where one serial analyser saw 1000s of <NUL> characters being "sent" to the DS700 serial port (probably an artefact of the initialisation of the printer, whilst the serial port is in an indeterminate state, or the printer is conducting POST checks)]

I also observed the DS700 not returning the port from an XOFF to an XON state (following receipt of an XON from the printer) in all but one case*, unless the DS700 received a SHOW PORT n STATUS or SHOW SERVER STATUS command whilst the port was in an XOFF state and before the XON was received (other commands may also "correct" the logic state model within the DECserver).

*Unless this was a failure on my part to edit a copy/paste duplication of earlier test notes.

 

Beyond reviewing my notes and the wireshark traces, I need to do some more testing with the VT terminal, try a couple more cable combinations, and confirm the behaviour across a few DS700s (though this behaviour appeared to occur across three versions of DNAS (1.1A, 1.5 and 2.something) on two different DS700-16s).

The JUN-1989 copy of the LAT Specification that I have doesn't really indicate that a receiving system should do anything with a Data_b slot of type Report other than send an acknowledgement message.

From a design point of view, it makes no sense to me for a DECserver to keep spitting this message out ~12 times per second, particularly since the values reported in the message do not change.

[If bits flipped between each message, or if the DECserver was waiting for some kind of response other than a general LAT acknowledgement, that would be a different matter]


Much of this I wouldn't have caught without my network colleagues SPANning the DECserver port so I could see the unicast traffic to it.

It's only from the captured network traffic that I now know that at the time of the problem, I should have issued repeated SHOW SERVER STATUS and SHOW SERVER COUNT commands (I think the TTY$V_ST_CTRLS might have been an artefact of the odd DECserver behaviour, and it's likely something that I won't actually be able to reproduce).

[The production DECserver that had the port issue is using a DNAS version that doesn't record/display Input/Output characters on a per-port basis (which I would have seen accelerate at a great rate of knots).

So, the only way to detect it would have been higher than normal LAT traffic (if you happen to know what your normal level is) from SHOW SERVER COUNT, or the fact that the DS700 was running at 17% CPU (from SHOW SERVER STATUS) because of these Data_b messages for a single port (and the figure-of-eight/snake/rotating pattern on the 7-segment LED was noticably slower)]

 

Once I've completed my analysis, I'll report back with a summary of my findings - though they may come as a surprise to those of us still using DECservers (and explain problems that others have seen and couldn't explain).

I've also recently managed to acquire a Lantronix ETS16 (which Jan-Erik Söderholm mentioned in an 04-FEB-2020 response to David Turner's "Anyone know which NON DEC Terminal Servers support LAT" posting on c.o.v), that I eventually plan to bring in to work and test as a replacement for our aged DECserver estate (yes, ETS16s are also EoL, but they're relatively young compared to the DS700s), so I may find that it at least behaves as one might expect/hope.

I hadn't forgotten you or your response, and I'm very grateful for the time you provide responding to me and others here.

 

Mark

[Formerly appearing as woeisme]
Volker Halle
Honored Contributor

Re: Significance of TTY$V_ST_CTRLS bit in UCB$L_TT_STATE1 longword in the UCB structure?

Mark,

thanks for posting your 'research data'. You will probably agree, that this is mostly an 'academic excercise' on your part, which may help you better understand the problem and develop a better detection mechanism and worarkound.

You'll never be able to escalate and change this odd behaviour against these old software and hardware versions.

Still interesting to read

Volker.

Mark_Corcoran
Frequent Advisor

Re: Significance of TTY$V_ST_CTRLS bit in UCB$L_TT_STATE1 longword in the UCB structure?

I've finally made time to provide an update from my last post on this topic.

For the avoidance of doubt, when "DECserver port" is mentioned, this refers to the serial port, not the AUI or 10BASE-T port.

The undesirable behaviour was observed on DS700-16s, using DNAS v1.1A;  the most recent DNAS we have is v2.3A (though it may be greater;  I recently found that the rebuild procedure for spares forces the same software version of the defunct DECserver, thus potentially downgrading it and forcing a MOP downline-load if it has a more recent version on PCMCIA).

I have a recollection that I did mange to not-downgrade a spare at v2.3A, and the same behaviour was observed - it seems quite likely, as it seems to be an intrinsic/base part of DNAS code that is going wrong.

I mention below "OpenVMS host (other hosts may behave differently)" - we are using DECservers with OpenVMS systems, and from Wireshark traces,  I can see that OpenVMS sets the Bell-on-discard preference parameter to 1.

Anything that speaks LAT could theoretically use a DECserver (and the LAT protocol spec certainly provides the names of a few (defunct) systems/manufacturers), but I have no idea what they set (and whether or not that might be configurable),

I've encountered a total of seven DECserver "bugs" - I use quotation marks because the <BEL> and <NUL> characters are actually documented in the LAT protocol specification, but application-level developers (who simply see DECservers as a means to an end for delivering characters over the network to a serial device) may not have had cause to look at that low a level.

Four of the bugs are characterised/most readily noticeable by the fact that the figure-of-8 / snake / rotating pattern on the 7-segment LED noticeably slows down, and that a corresponding SHOW SERVER STATUS shows a far higher CPU usage percentage than might ordinarily be observed (depending on how many DECserver ports are in use, the volume of traffic and baud rates).

1. Slow figure-of-8 / snake / rotating pattern #1
The first bug encountered was where a Telnet Listener Port is configured and enabled in the DECserver, the corresponding DECserver port to which it points is in an Output XOFFed: Yes state, there is output pending to the DECserver port, and the Telnet connection is dropped.

In these circumstances, the DNAS software gets stuck in some very tight infinite loop, and SHOW SERVER STATUS reports 99% CPU usage - the solution is either to have the attached serial device send an XON to the port, or to LOGOUT the port.

The remaining six bugs all stem from having conducted testing with a Printronix P8000 printer using a miswired "crossed" serial cable, and relate to the DECserver port having Signal Select set to CTS-DSR-RTS-DTR (the same behaviour may occur with a differently miswired cable and Signal Select being set to RI-DCD-DSR-DTR;  I haven't tested this).

[I had always previously been of the belief that if Signal Control for a port was Disabled, then the Signal Select setting was immaterial, but it's not - the two are unrelated.

If Signal Select was capable of being disabled, then a miswired cable (in conjunction with certain signals being asserted by the attached device) would not cause a problem.

Unlike a Lantronix ETS terminal server, Signal Select CAN'T be disabled]

 

The cable has previously been used in tests where it was connected to a VT terminal (either connected to a port that was enabled for Local access and for use as an actual terminal, or connected to a port that was enabled for Remote access, where I was using keyboard input from the VT terminal to simulate a PLC).

The cable had RXD/TXD crossed (which allowed my VT testing to work), RXD GND/TXD GND crossed, but DTR, DSR, CTS and RTS were straight-through.

The P8000 had a factory default setting for the HOST INTERFACE / Serial / Request To Send set to "On Line and BNF [Buffer Not Full]".

The issues came when the printer was taken offline, or when it was powered up and before it had finished initialisation, causing the P8000 to deassert the RTS signal (whilst the Signal Select setting of the DECserver Port causes the DECserver to attempt to assert the RTS signal).

[The behaviour of the VT terminal in relation to asserting of hardware signals is different compared to the (factory default) behaviour of the P8000, hence why I had never encountered this behaviour before]

 

2. Slow figure-of-8 / snake / rotating pattern #2
DECserver port is set to Access: Local, attached device deasserts RTS signal.
or
DECserver port is set to Access: Dynamic or Remote, has a Status of Idle (i.e. there is no LAT connection established to a remote host), attached device deasserts RTS signal.

In these circumstances, the DECserver treats the deassertion of the RTS signal as a framing error (but Framing Errors count for the port does not increment).

 

3. Slow figure-of-8 / snake / rotating pattern #3
DECserver port is set to Access: Dynamic or Remote, and has Remote Modification enabled, and has Output XOFF: Yes (i.e. attached device has sent XOFF to the DECserver port), and has a Status of Connected (i.e. there is a LAT connection established to a remote host), and remote host is OpenVMS (other hosts may behave differently), attached device deasserts RTS signal.

In these circumstances, the DECserver treats the deassertion of the RTS signal as a framing error.

The logic flow of the DNAS code in dealing with the framing error is altered because the port has Access set to Remote and a Status of connected, resulting in the DECserver incorrectly detecting characters having been received on the serial port.

Until the deassertion of the RTS signal abates, the DECserver sends LAT "Run" messages to the OpenVMS host at a rate of ~12 per second (for each port in this state).

The LAT "Run" message:

A) Contains one Data_b slot with a final Parameter code of %X06 (Status), a Parameter length of %X02, and Parameter Data of %X03xx (the high-order byte 03 indicates Framing Error, and the low-order byte xx represents the character that was "received" during the framing error).


B) May contain one Data_a slot with:
o A non-zero Slot Data Byte Count value.
o The Slot Data field containing Slot Data Byte Count number of bytes which are characters that the DECserver dectected as having been "received".

Note that the low-order byte of the Parameter Data for the Parameter code %X06 in the Data_b slot does not necessarily match any of the bytes in the Data_a Slot Data field (such is the nature of bugs…)

The Framing Errors counter in the output of SHOW PORT n COUNTERS increases at a rate of >450 per second (when the port speed is set to 4800 baud) whilst the RTS signal is deasserted.

If the version of DNAS running on the DECserver is at a level where it records Input/Output Characters on a per-port basis, then:

The per-port count of Input Characters increases, but at no discernible rate (the attached device is not actually sending anything, so what the DECserver perceives as Input Characters is highly dependent on its handling of the Framing Errors).

The incrementing of the Input Characters count for the port is not necessarily relative to the number of characters sent in the Data_a slot (I observed SHOW PORT n COUNT reporting 42, but a Wireshark capture showed two Data_a slots containing 57 characters between them).

If the associated LAT terminal device has /TYPE_AHEAD or /ALTYPEAHD enabled, the accumulated characters across any Data_a slots sent by the DECserver to the OpenVMS host may cause the (alternate) typeahead buffer to become full, resulting in each further character sent to the OpenVMS host in a Data_a slot causes the OpenVMS host to send an ASCII <BEL> character telling the DECserver to stop sending it any input from that serial port, and the DECserver then attempts to output that character to the serial port.

The per-port count of Output Characters will likely initially reportedly increase by 32, with the DNAS code intending to output the 32x <BEL> characters received but no characters are physically output to the port (because the port is in an Output XOFFed sate of Yes).

[The DECserver intends to send a <BEL> character in response to each Framing Error it encounters, but because the port is in an Output XOFFed state of Yes, it buffers these <BEL> characters until its (very limited) buffer (of 32 characters) is full)]

When the attached device sends an XON character to the port and reasserts the RTS signal, the DECserver will then output the buffered 32x <BEL> characters to the serial port, along with any further <BEL> characters it generates in response to more Framing Errors that it encounters between emptying its buffer and transitioning to an Output XOFFed state of No, as well as any <BEL> characters sent by the OpenVMS host in response to a(n alternate) typeahead buffer full condition.

[Even at the lowest speed of 75 baud, it is likely that the Framing Error condition would occur for long enough for 32 Framing Errors to occur (resulting in 32x <BEL> characters requiring to be send in response]

The Frames Received, Frames Sent, Messages Received and Messages Sent counters output by the SHOW SERVER COUNT command will increase at a rate of ~12 per second for each DECserver port in this state (in addition to any normal LAT message traffic triggered by real input from or output to other serial ports, LAT service announcements &etc).

The SHOW SERVER STATUS command will report a CPU usage of ~17% for each DECserver port in this state.

 

4. Slow figure-of-8 / snake / rotating pattern #4
DECserver port is set to Access: Dynamic or Remote, and has Remote Modification enabled, and has Output XOFF: Yes (i.e. attached device has sent XOFF to the DECserver port), and has a Status of Connected (i.e. there is a LAT connection established to a remote host), and remote host is OpenVMS (other hosts may behave differently), attached device was previously deasserting RTS signal but is no longer deasserting it.

In these circumstances, the DECserver treats the deassertion of the RTS signal as a framing error.

The DECserver then outputs one ASCII <BEL> character to the serial port for each Framing Error it detects (although SHOW PORT n COUNTERS may not accurately reflect this because it might snapshot the counters part way through being updated, or the condition abates and causes the server to clear its queue of <BEL> characters to be sent).

The DECserver occasionally appears to get stuck in a loop (or a flag or bit is set within its internal data structure for that port) where - even after the RTS signal deasserted condition abates - it continues to "detect" Framing Errors, and continues to send <BEL> characters to the serial port.

The fact this occurs intermittently (in perhaps 20 tests, it occurred once) suggests it most likely is a result of some race condition or memory corruption within the DNAS code running on the DECserver.

The Framing Errors counter in the output of SHOW PORT n COUNTERS increases at a rate of >450 per second (when the port speed is set to 4800 baud).

If the version of DNAS running on the DECserver is at a level where it records Input/Output Characters on a per-port basis, then the per-port count of Output Characters increases at a rate of >= 450 per second(when the port speed is set to 4800 baud).

The SHOW SERVER STATUS command will report a CPU usage of ~17% for each DECserver port in this state.

The DECserver sends LAT "Run" messages to the OpenVMS host at a rate of ~12 per second (for each port in this state).

The LAT "Run" message:

A) Contains one Data_b slot with a final Parameter code of %X06 (Status), a Parameter length of %X02, and Parameter Data of %X03xx (the high-order byte 03 indicates Framing Error, and the low-order byte xx represents the character that was "received" during the framing error).

 

5. DECserver port "Input Characters" count does not match the number of characters delivered to the connected host.
DECserver port is set to Access: Dynamic or Remote, and has Remote Modification enabled, and has Output XOFF: Yes (i.e. attached device has sent XOFF to the DECserver port), and has a Status of Connected (i.e. there is a LAT connection established to a remote host), and remote host is OpenVMS (other hosts may behave differently), attached device deasserts RTS signal.

In these circumstances, the DECserver treats the deassertion of the RTS signal as a framing error.

The logic flow of the DNAS code in dealing with the framing error is altered because the port has Access set to Remote and a Status of connected, resulting in the DECserver incorrectly detecting characters having been received on the serial port.

Until the deassertion of the RTS signal abates, the DECserver sends LAT "Run" messages to the OpenVMS host at a rate of ~12 per second (for each port in this state)

The LAT "Run" message:

.A) Contains one Data_b slot with a final Parameter code of %X06 (Status), a Parameter length of %X02, and Parameter Data of %X03xx (the high-order byte 03 indicates Framing Error, and the low-order byte xx represents the character that was "received" during the framing error).

B) May contain one Data_a slot with:
o A non-zero Slot Data Byte Count value.
o The Slot Data field containing Slot Data Byte Count number of bytes which are characters that the DECserver dectected as having been "received".

The number of Input Characters reported by SHOW PORT n COUNTERS does not necessarily match the number of characters in any Data_a slots that the DECserver sends to the connected host (I observed SHOW PORT n COUNT reporting 42, but a Wireshark capture showed two Data_a slots containing 57 characters between them).

The Framing Errors counter in the SHOW PORT n COUNTERS command increases at a rate of >=450 per second whilst the RTS signal is deasserted (when the port speed is set to 4800 baud).

If the associated LAT terminal device has /TYPE_AHEAD or /ALTYPEAHD enabled, the accumulated characters across any Data_a slots sent by the DECserver to the OpenVMS host may cause the (alternate) typeahead buffer to become full, resulting in each further character sent to the OpenVMS host in a Data_a slot causes the OpenVMS host to send an ASCII <BEL> character telling the DECserver to stop sending it any input from that serial port, and the DECserver then attempts to output that character to the serial port.

The per-port count of Output Characters will likely initially reportedly increase by 32, with the DNAS code intending to output the 32x <BEL> characters received but no characters are physically output to the port (because the port is in an Output XOFFed sate of Yes)

[The DECserver intends to send a <BEL> character in response to each Framing Error it encounters, but because the port is in an Output XOFFed state of Yes, it buffers these <BEL> characters until its (very limited) buffer (of 32 characters) is full)]

When the attached device sends an XON character to the port and reasserts the RTS signal, the DECserver will then output the buffered 32x <BEL> characters to the serial port, along with any further <BEL> characters it generates in response to more Framing Errors that it encounters between emptying its buffer and transitioning to an Output XOFFed state of No, as well as any <BEL> characters sent by the OpenVMS host in response to a(n alternate) typeahead buffer full condition.

[Even at the lowest speed of 75 baud, it is likely that the Framing Error condition would occur for long enough for 32 Framing Errors to occur (resulting in 32x <BEL> characters requiring to be send in response]

The Frames Received, Frames Sent, Messages Received and Messages Sent counters output by the SHOW SERVER COUNT command will increase at a rate of ~12 per second for each DECserver port in this state (in addition to any normal LAT message traffic triggered by real input from or output to other serial ports, LAT service announcements &etc).

The SHOW SERVER STATUS command will report a CPU usage of ~17% for each DECserver port in this state..

 

6. "Uncommanded" ASCII <BEL> or <NUL> characters delivered to the DECserver port.
DECserver port is set to Access: Dynamic or Remote, and has Output XOFF: Yes (i.e. attached device has sent XOFF to the DECserver port), and has a Status of Connected (i.e. there is a LAT connection established to a remote host), and remote host is OpenVMS (other hosts may behave differently), and has /TYPE_AHEAD enabled on the OpenVMS LAT terminal device and the typeahead buffer is full.

or

DECserver port is set to Access: Dynamic or Remote, and has Output XOFF: Yes (i.e. attached device has sent XOFF to the DECserver port), and has a Status of Connected (i.e. there is a LAT connection established to a remote host), and remote host is OpenVMS (other hosts may behave differently), and has /ALTYPEAHD enabled on the OpenVMS LAT terminal device and the alternate typeahead buffer is full (or possibly within TTY_ALTALARM bytes of becoming so).

In both circumstances, the OpenVMS host sends a LAT message to the DECserver with a Data_a slot containing one or more <BEL> characters*, for each character it receives in a Data_a slot from the DECserver whilst the (alternate) typeahead buffer is full..

*When the virtual circuit is established, OpenVMS hosts send a LAT message with a Data_b slot of type "Set" which sets the Bell-on-discard preference parameter to 1, requesting that:

o <BEL> characters be output to attached devices if the devices attempt to send data to the serial port when the connected host has already sent an <BEL> character to the DECserver for that port, to specify that no more input should be sent to the OpenVMS host.

o <NUL> characters be output to attached devices if the connected host has previously sent an <BEL> character to the DECserver for that port (to specify that no more input should be sent to the OpenVMS host) then sends an <NUL> character to the DECserver for that port, to specify that the OpenVMS host is now capable of receiving/processing input once again.

This changes the Input XOFFed flow control state for the port from No to Yes.

 

OR

DECserver port is set to Access: Dynamic or Remote, and has Remote Modification enabled and has a Status of Connected (i.e. there is a LAT connection established to a remote host), and remote host is OpenVMS (other hosts may behave differently), and the attached device is deasserting the RTS signal.

In these circumstances, the DECserver treats the deassertion of the RTS signal as a framing error.

A) The DECserver itself will send a <BEL> character to the port for each Framing Error it encounters.

[However, the rate at which the Framing Errors occur, means that the DECserver buffers the characters for output.

When the deasserted RTS signal condition abates, it discards <BEL> characters that it would otherwise have output (if the version of DNAS running on the DECserver is at a level where it records Output Characters on a per-port basis, the SHOW PORT n COUNTERS command reports an Output Characters count that is less than the Framing Errors count)]

B) The logic flow of the DNAS code in dealing with the Framing Error is altered because the port has Access set to Remote/Dynamic and a Status of connected, resulting in the DECserver (occasionally) incorrectly detecting characters having been received on the serial port.

Until the deassertion of the RTS signal abates, the DECserver sends LAT "Run" messages to the OpenVMS host at a rate of ~12 per second (for each port in this state).

The LAT "Run" message:

a) Contains one Data_b slot with a final Parameter code of %X06 (Status), a Parameter length of %X02, and Parameter Data of %X03xx (the high-order byte 03 indicates Framing Error, and the low-order byte xx represents the character that was "received" during the framing error).

b) May contain one Data_a slot with:
o A non-zero Slot Data Byte Count value.
o The Slot Data field containing Slot Data Byte Count number of bytes which are characters that the DECserver dectected as having been "received".

The characters in these Data_a slots may cause further <BEL> characters to be sent by the host system to the DECserver for output to this port (as per the two OR combinations at the start of this point).

 

7. DECserver port remains in "Output XOFFed: Yes" state even when the attached device sends an XON to the DECserver port.
DECserver port is set to Access: Dynamic or Remote or Local, and has Output XOFF: Yes (i.e. attached device has sent XOFF to the DECserver port), and the attached device is deasserting the RTS signal, and attached device sends an XON to the DECserver port before it stops deasserting the RTS signal.

In these circumstances, the DECserver treats the deassertion of the RTS signal as a framing error, but either gets stuck in a tight CPU loop that prevents it from reading characters from the serial port, or it does not attempt to read characters from the port until the RTS signal has been reasserted.

The Framing Errors counter in the SHOW PORT n COUNTERS command increases at a rate of >=450 per second whilst the RTS signal is deasserted (when the port speed is set to 4800 baud).

The SHOW SERVER STATUS command will report a CPU usage of ~17% for each DECserver port in this state.

When a DECserver port gets into this state, issuing any DNAS command (e.g. SHOW SERVER STATUS, SHOW PORT n COUNT, SHOW PORT n STATUS - even if the n is for a port other than the affected one) will cause the DECserver to send an XOFF character to the serial port before executing the command, and an XON character to the port once execution of the command has completed.

In the case of the Printronix P8000, when the printer is returned from an Offline state (with the miswired cable that causes it to deassert the RTS signal in conjunction with the factory default setting for the HOST INTERFACE / Serial / Request To Send set to "On Line and BNF [Buffer Not Full]") it sends an XON to the DECserver port, and sends a second XON around 10 seconds later (seemingly in response to the XOFF/XON pair that the DECserver sent to it).

When a P8000 (with this configuration and miswired cable) is not involved, the only alternative is either to LOGOUT the port, or to disable and re-enable flow control.


Mark

[Formerly appearing as woeisme]
Dave Lennon
Advisor

Re: Significance of TTY$V_ST_CTRLS bit in UCB$L_TT_STATE1 longword in the UCB structure?

Hi,

   FYI DNAS 3.6 is the latest/last version. The release notes can be found at:

https://web.archive.org/web/20190402211634/http://vnetek.com/wp-content/uploads/2015/03/DNAS-V3.6-Release-Notes.pdf

My scan of that document doesn't seem to specifically address your issue, but I know more was done than was documented - we found a bug introduced in 3.6 where we had to fall back to 3.5 for our dial in modem connected decserver 700s.

I don't see where you could purchase that code, perhaps it was abandoned when vnetek went under. There is an older DNAS CD image on archive.org, I wonder if it would be kosher to put up the 3.6 copy that I have?

- Dave

Mark_Corcoran
Frequent Advisor

Re: Significance of TTY$V_ST_CTRLS bit in UCB$L_TT_STATE1 longword in the UCB structure?

Somewhat belatedly responding to your reply, Dave.

>I don't see where you could purchase that code, perhaps it was abandoned when vnetek went under.
I thought Cabletron took over Vnetek?

My recollection from a previous post to comp.os.vms (or maybe even here) is that whoever the keeper of DNAS was at the time (Digital Networks, Vnetek or Cabletron), they wouldn't offer anything other than whatever the latest version was.

Although 3.6 may be the latest version, I doubt the current keeper of DNAS even knows where soft copies of the most recent image (let alone historic ones) is, much less is prepared to offer it for sale.

 

>There is an older DNAS CD image on archive.org, I wonder if it would be kosher to put up the 3.6 copy that I have
Given the current "discussion" on comp.os.vms regarding Dan's post on OpenVMS sources, I suspect not.

Whilst I've found a copy of 2.4 online, downloaded it and burned it to CD, it's really only something I would consider for personal use, not for work...

I know that the latest DNAS CD used to be shipped with new purchases of DECservers, but I've never been a recipient of said CDs, and if their various recipients haven't thrown them out, they're hoarding them until the end of times - no sign of CDs being available for sale on Ebay or elsewhere.

It's possible to get a DECserver to downline-load DNAS via MOP, and to then tell it to replace the existing PCMCIA image with what has been downloaded.

Whilst there's no DECserver-commanded reverse of that operation, it got to the point that I was going to try to achieve it by other means...

Opening up our spare (working and non-working) DS700s to see if any of them had a PCMCIA card with DNAS software*, then purchase an Omnidrive Linear Flash PCMCIA card reader, so that I might be able to download historic versions of images onto a PC, and from there, make whatever adjustments are necessary to convert them into the likes of WWENG1.SYS

*Our rebuild procedure explicitly specified the load image of whatever the failed DECserver was, so even if the replacement DECserver had a newer DNAS on a PCMCIA card, the setting of an older/differently-named DNAS file would force it to downline-load that image from a load host.

I'm disinclined to purchase DNAS PCMCIA cards from Ebay for extracting older versions of DNAS (to build up an archive of DNAS versions), because I can't be certain that the code hasn't been modified.

 

On an unrelated note, following recent issues with LAT Virtual Circuits dropping to DECservers (across the site, not in the same cabinet), I've been doing some more testing this week, trying to capture the Ethernet frames to determine the logic flow of what was happening (at least, when I forcibly cause the circuits to be dropped;  I still don't know the underlying cause of recent disconnects, as there was no frame capture on the production network).

This was to see if I could determine what the trigger was (late responses? no responses?), whether or not the circuit being dropped by the DECserver exhibits a different behaviour to the LTDRIVER dropping it (which might help determine which side was causing the disconnect) and whether or not any of the few parameters that can be changed might have an effect.

It's been a bit of an uphill struggle persuading the DECserver not to see OpenVMS and vice versa...

The PC running Wireshark is getting the Ethernet frames SPANned from the DECserver port, so you can't remove the network cable or power it off - the switch no longer sees a device attached, so stops SPANning the frames to the PC with Wireshark (the fix here is to slide the NIC selector from RJ45 to AUI but leave the RJ45 cable connected;  the switches might be too clever for their own good, but they're not as clever as they think they are).

However, if you do this, then you can't see what the DECserver is doing when it drops the connection (as it would be trying to send frames out on the AUI (which is neither connected nor SPANned),) and you can't shut down LAT or OpenVMS (because that would trigger LAT Stop messages) - the fix is to halt CharonVAX or stop the CharonVAX service instance.


What I eventually found was that the retransmit behaviour and the circuit disconnect behaviour was different from both sides...

The DECserver would faithfully honour the Retransmit Limit defined by SET SERVER RETRANSMIT LIMIT (i.e. up to that many retransmits, in addition to the initial transmit), and would retransmit at one-second intervals.
The LTDRIVER on the other hand, would issue it at two-second intervals (apart from the first one, which appeared to be sub-second), and varied between 49-53 retransmits in addition to the initial transmit.

The DECserver would send a LAT Stop message after (SERVER RETRANSMIT LIMIT) + 1 seconds
The LTDRIVER would send a LAT Stop message after ~110 seconds (approximately 5.5x the LATCP Keepalive Timer)


I was surprised about both the interval between retransmits and the number of retransmits conducted by LTDRIVER...

I had always assumed that LTDRIVER would use the LATCP SET NODE /RETRANSMIT_LIMIT value, but further reading of the HELP and documentation suggests the qualifier appears only to relate to connections for LAT services, and isn't used for IO$M_LT_CONNECT connections to non-queued application LAT ports.

An 11-JUL-1994 comp.os.vms post by Michael D. Raspuzzi (then of DEC) indicated that LAT had a 1-second retransmit timer.


However, section 4.1.3.7 (Defined Parameters And Recommended Or Required Default Values) of the LAT specification (AA-NL26A-TE, JUN-1989) says that "a number of values must be specified before an implementation is operational" (including LAT_MESSAGE_RETRANSMIT_LIMIT, HOST_RETRANSMIT_TIMER and HOST_RESTRANSMIT_COUNTER).

It also says:
"If an implementation allows the values are settable within the ranges specified, the names used to refer to the parameters must be a reasonable facsimile of the name used below.

If the parameters are not settable, the recommended default values should be used. The architecture requires the values be within the ranges specified:"


Now, whilst LATCP permits SET NODE /RETRANSMIT_LIMIT, the qualifier appears only to relate to connections for LAT services, and isn't used for IO$M_LT_CONNECT connections to non-queued application LAT ports that target a DECserver port set up with Access: Remote

[The value on our systems is set to the OpenVMS default of 8, but in testing, captured Ethernet frames show the LTDRIVER retransmitting a variable number of times beyond the initial transmit (in my first test, it was an initial transmit and 50 retransmits; subsequent tests gave 49 to 53 retransmits)]


There is no qualifier like /RETRANSMIT_TIMER, but in the Wireshark capture, the retransmits were (apart from the first one) spaced at 2-second intervals, suggesting that LTDRIVER was at least implementing a (non-changeable) HOST_RETRANSMIT_TIMER value that was within the range 1 - 2 seconds (as the LAT specification implies; I think the implication is either 1 or 2 seconds, not some value in between like 1500ms).

It would be nice to confirm from sources what LTDRIVER has defined as these values, but I don't have and never have had access to sources (even at my first employer, which was a DEC VAR).

Given the current flame(ish) war in Dan's thread on comp.os.vms about sources, I'm disinclined to ask anyone who does have access, to take a quick gander at (presumably) .MAR files for LTDRIVER

Although if anyone happens to look... then whilst you're there... I've since determined that LATACP is mostly for managing the LAT service database, but it's not clear whether if LAT counters reported by LATCP are maintained by LTDRIVER, or by LATACP; knowing for certain would always be useful

 

 

Mark

[Formerly appearing as woeisme]
Volker Halle
Honored Contributor

Re: Significance of TTY$V_ST_CTRLS bit in UCB$L_TT_STATE1 longword in the UCB structure?

Mark,

the LAT source listings have been 'censored', i.e. they are NOT included in the more recent OpenVMS source listing kits.

But if you happen to have an old OpenVMS VAX V6.1 source listing kit, they can be found in facility [LAT], but excluding the LTDRIVER* sources.

Volker.

Mark_Corcoran
Frequent Advisor

Re: Significance of TTY$V_ST_CTRLS bit in UCB$L_TT_STATE1 longword in the UCB structure?

>the LAT source listings have been 'censored', i.e. they are NOT included in the more recent OpenVMS source listing kits.
>But if you happen to have an old OpenVMS VAX V6.1 source listing kit, they can be found in facility [LAT], but excluding the LTDRIVER* sources.

Sadly, I've never had any source listings on microfiche or anything newer-fangled.

If they only included sources for LATCP, they wouldn't be of any use to me in relation to this thread, though they would perhaps help on my other thread about how BYTLM is consumed by it, and why it seemingly isn't returned until an arbitrary number of msecs after image rundown.

I ended up using Wireshark again on Tuesday to get to the bottom of another issue that a colleague had when trying to test some code changes on one of our test systems, but which I never got during my testing.

[Ordinarily during system startup, for particular LAT terminal devices, we have SET TERMINAL LTAnnnn: /PERMANENT /qualifier1 ... /qualifierN to set whatever attributes are required, i.e. all qualifiers are combined on a single command.

In my testing, I was changing two terminal settings, and had separate SET TERMINAL commands with explanatory text for each of the two qualifiers, to explain why they were necessary.

On the second SET TERMINAL command, he was getting:
%SET-W-NOTSET, error modifying LTAnnnn:
-SYSTEM-F-HANGUP, data set hang-up

...whereas I never got that in my testing.

The problem was that our LAT ports are MC LATCP SET PORT LTAnnnn: /NOQUEUE, and he was copying & pasting the commands from the Word document;  the DECserver was still processing the (non-queued access) LAT Command message generated by the first SET TERMINAL command, and rejected (with a LAT Status message) the (also non-queued access) LAT Command message generated by the second SET TERMINAL command.

In my testing, I was (I think - this was some time last year) typing the commands in manually.

[The PCs are on a private VLAN without internet access, so Office 365 can't phone home to the "Redmond mothership" to ensure it is licenced;  it progressively disables features (including copy & paste), which I think forced me to type the commands in;  eventually we had to get "normal" Office installed, though much to my chagrin, even it is now whining about being unlicenced (although it so far hasn't disabled features)]

The delay introduced by typing commands manually, allows the DECserver to process the first the LAT Command (which only takes a few msecs) before it gets the second one;  but if you copy & paste, there's virtually no gap between the generation of the two LAT Command messages, and the second gets rejected.

I won't be changing the LAT ports to /QUEUED because therein lies other problems, and I want to keep the testing as two separate SET TERMINAL commands rather than combine all the qualifiers into a single command, so I'll just have to add a WAIT in between the two SET TERMINAL commands.

Interestingly, even for a port setting that can't be changed on the DECserver (irrespective of whether REMOTE MODIFICATION is enabled on the port), SET still triggers the LAT Command message (well, at least for SET TERM /(NO)ECHO)]

 

Mark

[Formerly appearing as woeisme]