Operating System - OpenVMS
1827319 Members
4501 Online
109961 Solutions
New Discussion

Re: OpenVms/Telnet problem with many concurrent interactive connections requests

 
SOLVED
Go to solution
Maurizio Rondina
Frequent Advisor

OpenVms/Telnet problem with many concurrent interactive connections requests

An old ES40 Alphaserver, with OpenVms 7.3 and tcpip 5.1, for many years don't have any problems with telnet connections, now, on the morning (ony few days, expecially on Monday) when over 100 users try to login on to the system, this don't prompt for the "Username", wait without any response for few minutes, and finally reject the connection. Ony few users at time can connect, and to login all the users half hour need.

After 30-40 minutes, the problem is solved. The normal time connection (less one second) is established, for any user want to open a new session.

I just verify the Telnet LIMIT parameter and other parameters: arUe all correct.
Unfortunately, for other problems, the customer cannot upgrade the OpenVms and TCPIP versions.

On the same network there are others OpenVms system, that normally reply to the telnet connections requests of the same clients, on the same time when ES40 don't reply.

All users use the Ericom PowerTerm525 Telnet Emulator, installed on all WindowsXp clients.
Note that also from the Windows command prompt, the problem is the same. To exclude name resolution problems (DNS name, Netbios name), i try directly with ip address, but the problem is the same.



25 REPLIES 25
Wim Van den Wyngaert
Honored Contributor

Re: OpenVms/Telnet problem with many concurrent interactive connections requests

$ ucx show nam
Is the "timeout" low, e.g. less 4 sec ?
What is the value of retry ?
Are all dns servers specified present (ping) ? Try swapping the specified servers (e.g. S1,S2 becomes S2,S1).

Note that for IP adresses you have reverse address lookup.

Can you do a tcptrace /prot=ip/fu/pack=10000 between VMS and a PC and post that ?

Wim
Wim
Steven Schweda
Honored Contributor

Re: OpenVms/Telnet problem with many concurrent interactive connections requests

> To exclude name resolution problems (DNS
> name, Netbios name), i try directly with
> ip address, [...]

If the Telnet server tries to do a reverse
(number-to-name) look-up on the client, then
it won't matter whether you specify the
server by name or by address.

> [...] and finally reject the connection.

Actual error message?
Maurizio Rondina
Frequent Advisor

Re: OpenVms/Telnet problem with many concurrent interactive connections requests

Wim, results of TCPIP show man parameters is:

Retry: 4
Timeout: 4
Servers: A1, 192.168.1.9

Retry and timeout, are the defaults.

Server A1 repond to ping, is the same openVms system

Server 192.168.1.9 do not respond to ping, so I remove it from the BIND Resolver, shutdown and restart the BIND tcpip service, but it remain on the output of TCPIP SHOW NAM command.... So I try with the TCPIP$BINDsetup.com procedure, and now the TCPIP SHOW NAM command give:
Server: LOCALHOST

The customer say me that the 192.168.1.9 server was replaced with 192.168.1.11 dns server, a newer Windows 2003 Active Directory DNS Server.

They says that now the BIND/DNS server is not more necessary on the ES40.

for the tcptrace command results, i cannot sent to you: the verb/symbol is not recognize on TCPIP services v.5.1, may be a command released on newer versions.

Thanks.




Maurizio Rondina
Frequent Advisor

Re: OpenVms/Telnet problem with many concurrent interactive connections requests

Steven, none error message are given...

the terminal emulator's "connect box" of Power Term is browsed again after few minutes, without messages.

In this moment the telnet server normally respond and i cannot say you what happen and the error message, but i think the response is "impossible to open connection to the host".

I try when the customer call me, in the moment there is the problem.
Wim Van den Wyngaert
Honored Contributor

Re: OpenVms/Telnet problem with many concurrent interactive connections requests

Ok, if DNS was not reesponding you should get your connection within about 10 seconds (as far as I remember of my DNS play time).

Start tcpip and do help. Is tcptrace in there ?

Wim
Wim
Robert Gezelter
Honored Contributor

Re: OpenVms/Telnet problem with many concurrent interactive connections requests

Maurizio,

I would recommend that the problem be approached by looking carefully at real data, not by changing parameters that have been quite acceptable in the past. While it is quite possible that changes may be needed, it is far safer to make those changes based upon facts.

I would obtain an appropriate Ethernet HUB (not a switch). If managed switches are in use, then there are likely settings allowing traffic to be replicated.

I would then put a network analyzer (e.g., WireShark; which is freeware and available for download) on the line between the OpenVMS system and the network infrastructure. When the problem occurs, it will then be possible to see precisely what is (and what is not) happening. Save the resulting trace file to disk for analysis.

Many things could be causing this problem. Not all of them involve the OpenVMS system.

Also if there is a shortage of internal expertise on looking at network traces, consider whether outside resources should be retained (disclosure: My firm provides such services, as do the firms affiliated with several other active contributors to ITRC).

- Bob Gezelter, http://www.rlgsc.com
labadie_1
Honored Contributor

Re: OpenVms/Telnet problem with many concurrent interactive connections requests

Hello

Do you get some data displayed at sys$output if you do
DEFINE TCPIP$SOCKET_TRACE 1
telnet nodename
or telnet @ip

If you prefer to get the display in a file
DEFINE TCPIP$SOCKET_TRACE - SYS$LOGIN:TCPIP$SOCKET_TRACE.LOG

and after your telnet, look at the file

Maurizio Rondina
Frequent Advisor

Re: OpenVms/Telnet problem with many concurrent interactive connections requests

to Wim, the tcpiptrace isn't available on tcpip 5.1, i just verify this on tcpip help.

when the problem happen, the waiting time is over than 10 seconds, may be over 1 minute, so i think the problem is due to a more complex network problem, as say Robert.

Thank to Robert for the reccomendation, that i already generally use. Inside my customer are not internal expertise, that can analize network traces.

Unfortunately, i don't know the complete network infrastructure, the customer ask me to solve this problem as "ES40" problem, because is the only server that have the problem.

Now i'm working remotely on the problem, if necessary i must go on-site few days at 8,15am o'clock, when in few minutes, about 150-200 login are request and hope that the problem occurs in that moment.

For labadie, i try with your suggestion: none log file are created. I think, as i say above to Wim, that tcpip 5.1 doesn't support the trace feature.

About the both suggestion of Wim and labadie, i think that the trace is needed on the client side, not on the ES40 side.

Thanks to all.










Robert Gezelter
Honored Contributor

Re: OpenVms/Telnet problem with many concurrent interactive connections requests

Maurizio,

With all due respect, my suspicion is that a network trace (with a monitor, not TCPTRACE) at the server will illuminate what is happening.

Tracing at the client side will only show a single client's perspective. The aggregated trace near the server will show the conversations that are working, as well as those that are not.

I would also suggest ensuring that OpenVMS accounting is enabled, at least for the period of the surge.

And yes, I have seen many problems reported as "server" problems that were actually problems with underlying infrastructure that were experienced as "server" problems.

- Bob Gezelter, http://www.rlgsc.com
Wim Van den Wyngaert
Honored Contributor

Re: OpenVms/Telnet problem with many concurrent interactive connections requests

tcptrace should be available according to documentation of 5.1. If not something is strange/wrong on your site.

Wim
Wim
Wim Van den Wyngaert
Honored Contributor

Re: OpenVms/Telnet problem with many concurrent interactive connections requests

According to doc you must execute SYS$STARTUP:TCPIP$DEFINE_COMMANDS.COM first to get the command. That's not the case in 5.3 but you can try.

There were also some DNS failover problems in some 5.1 versions. May be timeout was also handled differently (thus explaining the 1 min).

Wim
Wim
Volker Halle
Honored Contributor

Re: OpenVms/Telnet problem with many concurrent interactive connections requests

Maurizio,

consider to read the following article and see, if it applies to your situation:

http://h18000.www1.hp.com/support/asktima/communications/CTI_SRC021219001467.html

Check for symptoms with

$ ANAL/SYS
SDA> tcpip sysconfig socket
...
SDA> EXIT

What are the values for sobacklog_drops and
sobacklog_hiwat ?

Volker.

Maurizio Rondina
Frequent Advisor

Re: OpenVms/Telnet problem with many concurrent interactive connections requests

Wim, the TCPIP$DEFINE_COMMANDS.COM is already on my login.com, however i solve the problem of tcptrace, and now you can see the log on attachment: I trace the connection of the client 192.168.104.178 to the ES40 server 192.168.1.3, while it try to open a telnet connection. In that moment, with no problems, only few seconds wait.

Robert, i think that it's necessary my on-site visit to verify all points of view.
Rob Leadbeater
Honored Contributor

Re: OpenVms/Telnet problem with many concurrent interactive connections requests

Hi,

I've only a little experience of VMS, but this statement stands out:

> The customer say me that the 192.168.1.9
> server was replaced with 192.168.1.11 dns
> server, a newer Windows 2003 Active
> Directory DNS Server.

> They says that now the BIND/DNS server is
> not more necessary on the ES40.

I would ask your customer on what grounds they can make that decision, especially as it sounds like the server was still looking at the old resolver...

It would also be interesting to find out whether these problems started happening when the DNS server was changed.

Cheers,

Rob
Maurizio Rondina
Frequent Advisor

Re: OpenVms/Telnet problem with many concurrent interactive connections requests

Volker,

the HP Article is very intresting (also if it concern with tcpip 5.3eco1, and not not tcpip5.1), i follow his suggestion, when the problem happen.

The strange is that, the days this happen, no problems there are for the logins of the first users, than at 8.15am in 5 minutes over 100 simultaneous login are requests. They, all wait for the Username prompt, but someone have reply in 2 minutes, others in 5 minutes, other even after 30 minutes. Then, all new connection requests, are normally satisfacted in one second. Seem there is a congest queue of connection requests.

For the sobacklog_drops and
sobacklog_hiwat values this is the actual (without problems) situation:

sobacklog_drops 2058
soback_hiwat 8

Thanks.
Maurizio Rondina
Frequent Advisor

Re: OpenVms/Telnet problem with many concurrent interactive connections requests

Rob,

Few years ago the customer make the decison to carry the DNS to Windows2003 server, and leave the one on the OpenVms system, because they believe that on this platform, the management was more simply, also for their internal techicians, and not need a OpenVms specialist (in this region of Italy, i'm the only, and there are only few systems).

They realize the project, without consider the new OpenVms Dns role, and leave it as was.

the 192.168.1.9 server, instead, was an older proxy server to which the OpenVms system (that only store the local names for local zone) ask for the forwarding requests, for public names resolving.

Also the 192.168.1.9 proxy server was substituted by one other (now 192.168.4.9) without consider the OpenVms system.

The problems, however, don't started happening with this changes.. sure not for the change of the proxy server, may be for a recently Dns configuration operation on the actual Dns on Windows 2003 server, that i don't know.
Volker Halle
Honored Contributor

Re: OpenVms/Telnet problem with many concurrent interactive connections requests

Maurizio,

if the 'problem' exists for the TELNET connections and it is the backlog queue problem, other IP protocols (e.g. FTP) should be fine. But even a $ TELNET localhost will hang. You can use these tests to easily rule out other speculations...

From your data provided, it shows that you've hit this problem in the running system. Consider to watch the sobacklog_drops counter, if it increases during the time the problem is evident, it will confirm my analysis.

This is a very unusual scenario, but I've seen it before and HP service never found the problem, so I had to be called in.

Volker.
Wim Van den Wyngaert
Honored Contributor
Solution

Re: OpenVms/Telnet problem with many concurrent interactive connections requests

Your trace starts at 17:42:41.79 and has nothing abnormal. If you should trace again, also trace the first name specified in the name server list.

To add to Volkers solution : note that after a SYN drop the retransmit is done by the client after 5 seconds (surely some config item but could be hardcoded). And a timeout is 75 seconds.

Wim

Wim
Maurizio Rondina
Frequent Advisor

Re: OpenVms/Telnet problem with many concurrent interactive connections requests

Vim and Volker,

now I think I must wait the morning when the problem happen againg, and in that moment apply your suggestions, for more analysis.

Thanks.
Maurizio Rondina
Frequent Advisor

Re: OpenVms/Telnet problem with many concurrent interactive connections requests

Today customer tell me that since 6-may-2008 when I delete the 192.168.1.9 server from DNS configuration and from bind resolver, and rebuild the TCPIP$BIND.CONF with the wizard TCPIP$BINDSETUP, all connections are very fast.
So today afternoon, to make a test, i try to stop and disable the BIND service from TCPIP$CONFIG.COM, to look if the problem is in effect a DNS problem as seems.
Only few hours (in the afternoon there are few users that try logins) and customer call me to say that all connection now are newly very slow.
I connect to the system, and launch the tcptrace (see the file in attachment) before try the telnet connection. Also for me the connection wait over 20 seconds.
Note that in the tcpip trace, there are no packets after the first ACK of the client, for 21 seconds.

...(follow)
Maurizio Rondina
Frequent Advisor

Re: OpenVms/Telnet problem with many concurrent interactive connections requests

.....then I enable and start the BIND server on TCPIP$CONFIG, and in few minutes the connections are newly very fast: see in the tcptrace in attachment, that only in 1 second there is the reply of the server to the client ACK/SYN.

I think the problem is solved, i wait only Monday (usually critical morning), when over one hundred logins requests will ask to the system in few minutes.

I will inform you...

Thanks
Maurizio Rondina
Frequent Advisor

Re: OpenVms/Telnet problem with many concurrent interactive connections requests

This morning 176 telnet connection was opened in 15 minutes without problems, and without delay time.

Now I'm sure that the problem was a "DNS telnet reverse lookup problem" due to modifications of the network configuration of external Microsoft consultants (new dns transfer server "any" move from 192.168.1.9 to 192.168.4.9. new Windows 2003 DNS server inserted with ip 192.168.1.11, new local zone "customer.local." created with pointer CNAME in the original "customer." zone the only that the OpenVms system know) without consider OpenVms System configurations.
Now I tune the TCPIP$BIND.CONF dns configuration file, modifying all relevant errors, and all name resolution go well: local names, zone "customer."'s names, zone "customer.local."'s names and external internet names.
The connection time now is very low.

The only doubt remains, is why the problem occurs only few times (occasional problem), and not any time (persistent problem).

If you can suggest something...

i leave open the thread for this.

Thanks to all.



Robert Gezelter
Honored Contributor

Re: OpenVms/Telnet problem with many concurrent interactive connections requests

Maurizio,

A packet-by-packet trace will show precisely what is happening (or course, one does need the time and budget to go through the messages one at a time).

Two possibilities come to mind:

- are the problem surges preceeded by periods of idleness? If so, perhaps the DNS cache is covering the problem.

- is there a network problem that is causing the DNS resolution requests to be dropped somewhere in the network?

- Bob Gezelter, http://www.rlgsc.com
Neelmani Pandey
Frequent Advisor

Re: OpenVms/Telnet problem with many concurrent interactive connections requests

Yup I too believe that DNS cache was covering the issue,
As the client which is being connected may be resolving using the cache.