Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

OpenVMS MAIL command hanging when sending to an SMTP%"user@domain" address

 
Mark_Corcoran
Frequent Advisor

OpenVMS MAIL command hanging when sending to an SMTP%"user@domain" address

I've been making some changes to code for a detached process so that when it detects a particular error condition (PLC stuck in a loop, sending the same message over & over again), that it will alert us to the condition once a configurable threshold number of duplications have occurred (but not then alert again until the condition abates and then recurs).

I'm generating OPCOM messages to suitably-enabled terminals with $BRKTHRU (the operator class is configurable via a logical name, naturally), and also sending an email to a distribution list (again, configurable with a logical name).

I wrote this code years ago, and I know it works, but after having conducted a number of tests using local OpenVMS accounts on the test system (a private VLAN with no gateway and no SMTP mail relay), I thought I should show in the test document the use of internet-style email addresses (which we would typically use, though the mail handling code uses an "address of last resort" (defined with a logical name, or hard-coded defaults if the logical is not defined) specifying a CSV list of local user accounts for OpenVMS mail)).

As there is no route gateway or SMTP gateway on the test network (but both are configured in UCX), I wanted to reproduce the behaviour of SMTP creating entries on an execution queue from a generic queue when it was unable to send an email, then getting the user to examine the contents of the two files for each entry (IIRC, one is the email body, the other is control/header information).

To quickly reproduce this, I used MAIL at the DCL prompt to access OpenVMS mail, then specify a destination address of SMTP%"someuser@somedomain", whereupon MAIL hung for what seemed a long time, before eventually prompting me for the subject header.

Subsequent testing reveals this "long time" to be 3 minutes, which seems to match the Send/Data Timeout value reported by UCX SHOW CONFIG SMTP (incorrect spacing due to web page trying to be helpful and convert multiple spaces to a single one, and I've already had my session time out once whilst typing this up):

Timeout Initial Mail Receipt Data Terminate
Send:         5    5       5    3        10
Receive:      5

 

When I tried getting the detached process to use the OpenVMS callable mail routines to do the same, it remained in an LEF state until the underlying UCX SMTP routine (presumably in SYS$LIBRARY:UCX$SMTP_MAILSHR.EXE) returns.

When the UCX SMTP client is unable to send an email (for whatever reason), it puts in on a queue for subsequent retries, but I have to say that I was a little surprised that UCX SMTP appears to be trying to reach either the route gateway and/or the SMTP gateway simply when the destination address is specified (before the subject header or email body is added).

Sending of an email is an adjunct to the OPCOM message, and I can't have the detached process stuck in a LEF state for 3 minutes if there is a problem with the SMTP gateway.

If SMTP is simply trying to reach the route gateway, then I would find it acceptable(ish) for it to hang for 3 minutes, because it means we have lost inter-VLAN access to the VLAN on which the servers run (so no FTP and no Telnet; depending on the nature of the network failure, we may also have lost access to DECservers).

I tried to determine whether or not SMTP was simply trying to access the route gateway by looking at Ethernet frames in Wireshark, and on hitting the RETURN key after entering the destination address of the email, I can start to see ARP requests asking "who has <address of route gateway>".

The systems are set up in a master/standby pair, and I tried changing the address of the route gateway to be what is currently the standby node which (it doesn't function as a route gateway, so would possibly reject/drop related messages, but it is the only other system on the private VLAN other than PC I use for connecting to the systems), just to see if SMTP was only trying to determine if the route gateway was available/reachable (but would then attempt to relay mail through it after the subject header & email body were entered).

I didn't have much joy with this, but I'm not sure if at some point, UCX determines that an address is not reachable, and ignores further requests to send anything to it (UCX SHOW ARP didn't show the route gateway address).

I guess I'm asking if anyone knows what UCX or TCP/IP Services is doing when it processes the destination email address, and whether there is any way of confirming that all it is doing at that point, is trying to access the route gateway and not from there, the SMTP gateway.

i.e. can I "safely" allow the detached process to use callable mail routines to send the email, on the basis that it should only hang if the route gateway is unreachable (sending emails would be the least of our problems)?

[I did try booting up a recently-retired 4000-400 with a network loopback on the AUI interface, and found that it similarly hung when I tried to use MAIL to send to an internet-style address (but I don't know if similarly, it is trying to access just the route gateway, or the SMTP gateway)]

I think I might have to remove a lot of the code I back-ported in, and simply get the process to submit a batch job to send the email (in which case, the batch job can take its own sweet time whilst UCX waits for the processing of the email address to time out, rather than delay processing of PLC messages for 3 minutes (even if the PLC is likely to still be sending the same thing over & over again that caused the triggering of the email in the first place)). :-(

 

 

Mark

[Formerly appearing as woeisme]
2 REPLIES 2
Steven Schweda
Honored Contributor

Re: OpenVMS MAIL command hanging when sending to an SMTP%"user@domain" address

> I guess I'm asking if anyone knows what UCX or TCP/IP Services is
> doing when it processes the destination email address, and whether there
> is any way of confirming that all it is doing at that point, is trying
> to access the route gateway and not from there, the SMTP gateway.

   Someone may, I don't, but I'd guess that, at the very least, it would
try a DNS look-up on whatever follows the "@".  (Perhaps first for an MX
record, then, if that fails, an A record.)  I would not bet that any
actual communication with the (proposed) destination is attempted by
MAIL.

   One might try something like Wireshark to watch the network activity.

Mark_Corcoran
Frequent Advisor

Re: OpenVMS MAIL command hanging when sending to an SMTP%"user@domain" address

[Reply edited 29-AUG-2018 to correct typos I noticed on re-reading it]

Thanks for your reply Steven - I spent a lot of time banging my head against a wall with this yesterday, and gave up when I couldn't get anywhere with Wireshark (yes, it sets the NIC in promiscuous mode, and I am seeing messages other than what I am sending, but I was forgetting that network technology has moved on from the days of thinwire Ethernet & DEMPRs - the PC and the two Charnonised systems are connected to a network switch which won't let me see non- multicast/broadcast messages between the two Charonised systems unless the switch port for either Charonised system is mirrored/SPANned to the port that the PC is connected to).

I've reluctantly used tcpiptrace on a non-test system, and can see that when the STMP%"user@domain" is specified at the "To:" prompt in VMS MAIL, it does indeed send out DNS requests (I've also been banging my head trying to manually decode these;  the way it is documented in IANA docs (sending you jumping backwards & forwards through a document then having to reference another document & doing the same), leaves a lot to be desired, and I'm not sure how much more hair I have left to tear out).

It's all a bit of a muddle because of historic acquisitions, but...

UCX v4.2 ECO 4 on OpenVMS v6.2 (from other design decisions in UCX / TCP/IP services, I suspect that the problem remains in future versions).

Domain in UCX is configured as name1.name2.com

Email addresses have a domain of name3.com for direct employees, and prefix.name3.com for contractors.

SMTP configuration has alternate & general gateways both defined to historic.name4.com

[The names are either the historic company name, or various contractions/bacronyms of merged company names]

Both the general & alternate SMTP gateways are defined in the local host database as A.B.C.66, and the IP address of the systems trying to send the email are A.B.C.34 and A.B.C.33

I must have misunderstood the concept of MX records - I had thought that doing SET MX destination /GATEWAY=name_or_addr /PREFERENCE=value resulted in destination being used as a domain (as UCX HELP SET MX parameters suggested) - it was only when I tried to delete an MX record that I temporarily created, and got the following error that I eventually (mostly) understood how it worked (I don't ever recall having set up internet mail gateways on OpenVMS - it's always already been configured on systems at companies I've joined, and I've never had cause/time to read up on configuration):

%UCX-E-ROUTEERROR, Error processing ROUTE request
-UCX-W-NORECORD, Information not found
-RMS-E-RNF, record not found

I then realised that although you might add something that has the appearance of a domain (which, in the real world, may well be load-balanced across multiple servers), SET MX is treating it as a single host/node.

[I also got the wrong impression about the functionality of UCX SET CONFIGURATION SMTP /GATEWAY - even after several re-reads of the documentation, I had thought that (SMTP) email sent to an address in the same domain as the zone specified by UCX SET COMMUNICATION /ZONE=domain would use the host specified by UCX SET CONFIGURATION SMTP /GATEWAY=GENERAL=host and that all other domains would use the host specified by UCX SET CONFIGURATION SMTP /GATEWAY=ALTERNATE=host

However, the help text for UCX HELP SET CONFIGURATION SMTP /ZONE seems to be at odds with UCX HELP SET CONFIGURATION SMTP /GATEWAY - possibly a case of programmers writing documentation rather than tech authors...]

The reason it wasn't using the MX record that I had added, was because the single host/node was not defined in the local hosts database, forcing SMTP to attempt to query the DNS name server (which is why MAIL was hanging).

With MX records for name3.com and prefix.name3.com and local hosts entries for the same, MAIL no longer hangs when the SMTP%"user@prefix.name3.com" or SMTP%"user@name3.com" email address is specified as the destination.

 

Of course, this leaves a bit of a dilemma...

I don't really want to pick some random IP addresses to add to the local hosts file simply to act as a placeholder for name3.com, lest the addresses subsequently be allocated in future.

Even if I did pick addresses that were "never" going to be allocated, and decided to tidy up the domain name configuration &etc., the problem would be that if the underlying mail library code routines were added into other code, and that code was passed ((a distribution list of) | (an)) email address(es), if someone happened to add another email address that had a domain name that didn't have a placeholder entry and the DNS servers were unreachable then callable mail routines would hang for 3 minutes until they timed out.

Hmm.  I think I'll need to ponder on this - I don't think there have been that many issues with the DNS servers in the 5 years I've been here, so I'm not sure whether or not it would still be acceptable for the detached process to potentially stick in a LEF state for 3 minutes (albeit due to 2 error conditions - PLC stuck in a loop and the network is down or the DNS servers are unreachable or otherwise unresponsive).

In the end, I opted to use another logical name that could be defined to indicate that no email attempts should be made, so this can be defined if there is a known issue with (the reachability of) the DNS servers (the logical is always checked at the decision point where the subject header would be formulated, so the logical can be defined on-the-fly, rather than require a process restart to pick up new definitions).

 

Mark

[Formerly appearing as woeisme]