Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

UCX intermittent telnet connection problems

 
SOLVED
Go to solution
John A. Beard
Regular Advisor

UCX intermittent telnet connection problems

Hi,

We are getting reports lately of users having intermittent connection problems on a number of older Alpha servers running OpenVMS 7.1. The version of UCX is Digital TCP/IP Services for OpenVMS Alpha Version V4.1 - ECO 10.

These servers are coming to the end of their scheduled life, and there are no plans to upgrade to newer versions of the operating system or layered products. That said, we still need to find out what is causing this problem and try nd fix it.

There are no error messages when the user's Telnet session is rejected, and I could not see any OPCOM messages indicating a network issue.
We had sombody at one site location log on via the console and perform a shutdown/reboot, and this allowed all Telnet connections back into the server again. As of today on another server, the reboot option has not being played out, as connections became available again after a 10-15 minute period.

I'm attaching the output from as many commands as I hope will show where the potential problem lies. From previous threads related to this particular topic, I'm seeing WAITS under Large Buffers as something that could well be causing the problem. If this is correct, or if other values need to be adjusted, then I would be grateful for any instructions as to how and what I should be setting.



Glacann fear críonna comhairle.
14 REPLIES 14
labadie_1
Honored Contributor
Solution

Re: UCX intermittent telnet connection problems

There has been a pool expansion of Npagedyn, and how many pool expansion failures ?

Can you check with
$ mc agen$feedback
$ sea sys$system:agen$feedback.dat fail

And it seems your ucx large buffers have hit the maximum value, so you should expand them.

But first correct the npagedyn allocation, by adding a little to npagedyn and rebooting.
You could put the medium value of starting Npagedyn and actual Npagedyn, so around 30000000.
Put a min_npagedyn=30000000 in sys$system:modparams.dat, run autogen, check it.

Read the thread of Dario Karlen and the crashes as the same advices apply (for npagedyn, npagevir...)
http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1221038

Good luck
Hakan Zanderau ( Anders
Trusted Contributor

Re: UCX intermittent telnet connection problems

Just to be sure.....

Check the TELNET service that the session limit hasn't been reached.

$ UCX SHOW SERVICE TELNET /FULL

Compair LIMIT and PEAK.

regards,

Hakan Zanderau
HA-solutions
Don't make it worse by guessing.........
John A. Beard
Regular Advisor

Re: UCX intermittent telnet connection problems

Hakan... that information was contained in my attached document.
Glacann fear críonna comhairle.
John A. Beard
Regular Advisor

Re: UCX intermittent telnet connection problems

Labadie,

Thanks for all your suggestions...I have only just got back from lunch, so here is the first part of what you wanted to see.

CS/FPAXP1> mc agen$feedback
CS/FPAXP1> sea sys$system:agen$feedback.dat fail
PAGEDYN_ALLOCFAIL = 0
PAGEDYN_ALLOCFAILPAGES = 0
NPAGEDYN_ALLOCFAIL = 0
NPAGEDYN_ALLOCFAILPAGES = 0
CDT_ALLOCFAIL = 0
GH_EXEC_CODE_FAIL = 0
GH_EXEC_DATA_FAIL = 0
GH_RES_CODE_FAIL = 0
GH_RES_DATA_FAIL = 0
Glacann fear críonna comhairle.
John A. Beard
Regular Advisor

Re: UCX intermittent telnet connection problems

Labadie,

Before I adjust the minimum value of NPAGEDYN, can you please confirm the value of 30,000,000. I'm just asking because autogen came back with the following + (and this shows my lack of knowledge of memory constraints here) that figure is almost equal to some of our more current Alphaservers running with 8gig of memory.

CS/FPAXP1> sh mem

System Memory Resources on 16-APR-2008

Physical Memory Usage (pages): Total Free In Use Modified
Main Memory (256.00Mb) 32768 12437 18304 2027

Virtual I/O Cache (Kbytes): Total Free In Use
Cache Memory 20000 696 19304

Granularity Hint Regions (pages): Total Free In Use Released
Execlet code region 512 0 472 40
Execlet data region 96 4 92 0
S0/S1 Executive data region 349 0 349 0
S2 Executive data region 160 0 160 0
Resident image code region 1024 0 804 220

Slot Usage (slots): Total Free Resident Swapped
Process Entry Slots 250 193 57 0
Balance Set Slots 248 193 55 0

Dynamic Memory Usage (bytes): Total Free In Use Largest
Nonpaged Dynamic Memory 3162112 1043584 2118528 110080
Paged Dynamic Memory 2416640 1163168 1253472 1013440



NPAGEDYN parameter information:
Feedback information.
Old value was 2801664, New value is 30000000
Maximum observed non-paged pool size: 3162112 bytes.
Non-paged pool request rate: 0 requests per 10 sec.
- AUTOGEN parameter calculation has been overridden. The calculated value was 3072000. The value 30000000 will be used in accordance with the following requirements:
NPAGEDYN minimum value is 30000000.
Glacann fear críonna comhairle.
Hakan Zanderau ( Anders
Trusted Contributor

Re: UCX intermittent telnet connection problems

John,

Sorry,.....didn't notice the attached info.

regards,

Hakan Zanderau
HA-solutions
Don't make it worse by guessing.........
John A. Beard
Regular Advisor

Re: UCX intermittent telnet connection problems

Not sure if this comes into the equation, but there is also a hefty increase in the newly calculated NPAGEVIR

NPAGEVIR parameter information:
- AUTOGEN parameter calculation has been overridden.
The calculated value was 120000000. The value 120342000 will be used in accordance with the following requirements:
NPAGEVIR has been increased by 342000.
NPAGEVIR minimum value is 5000000.

CS/FPAXP1> mc sysgen
SYSGEN> SHOW NPAGE
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ---------
NPAGEDYN 2801664 1048576 163840 -1 Bytes
NPAGEVIR 12132352 8388608 163840 -1 Bytes


CS/FPAXP1> search modparams.dat; npagevir
min_npagevir=5000000
add_npagevir=342000

CS/FPAXP1>
Glacann fear críonna comhairle.
labadie_1
Honored Contributor

Re: UCX intermittent telnet connection problems

Depending upon the Vms versions, autogen sets npagevir=4 * npagedyn, or something similar (this is the case here).

Npagevir is the absolute maximum Npagedyn is allowed to expand to.

I see nothing weird with this value of Npagevir.

To set your ucx large buffers to a higher value, if memory serves me, do
$ ucx
set comm/large=max:300
to do it in volatile memory, and
set conf comm/large=max:300
to still have it at the next reboot

Anyway
$ ucx
help set comm

will show you the correct syntax
Jim_McKinney
Honored Contributor

Re: UCX intermittent telnet connection problems

First off, I don't have any idea what your issue is with telnet - but, I can say that is not nonpaged pool exhaustion.

OpenVMS V7.1 on node FPAXP1 16-APR-2008 18:11:47.31 Uptime 135 08:40:23


Nonpaged Dynamic Memory (Lists + Variable)
Current Size (bytes) 3162112 Current Size (pagelets) 6176
Initial Size 2801664 Initial Size (pagelets) 5472
Maximum Size 12132352 Maximum Size (pagelets) 23696
Free Space (bytes) 1094592 Space in Use (bytes) 2067520


The system has been up for 135 days - at some time nonpaged pool grew to 3,162,112 bytes from the initial allocation of 2,801,664. The current NPAGEVIR would permit it to grow to a size of 12,132,352 if there was demand. Since the peak, usage has fallen back to where 1/3 of your pool is now free (1,094,592). Adding more to the pool now will do nothing. If you choose to add to the pool that's fine - but it won't address your telnet issue. If the growth of the pool from the initial allocation was a one time thing (perhaps a network broadcast storm flooded your system with packets that it had to sort through and they arrived faster than could be processed - and maybe discarded) then there's no point in increasing the initial allocation. If you typically see that you're using all of your pool (unlike the snapshot provided) then increasing that initial allocation is a good thing as any expansion has a cost. But that NPAGEVIR (the max size in your snapshot) is there as a safety net to handle those occasional unexpected events when the initial size of the pool is inadequate.

Why do you suspect that the issue is with the VMS host rather than at some other location in the network? Why did you choose to reboot? Could it be coincidence that telnet became functional after the reboot and there was no cause and effect relationship? Just asking...
John A. Beard
Regular Advisor

Re: UCX intermittent telnet connection problems

Nothing in life is ever simple...

Jim,

I'm going to have to extend this into tomorrow, but here's a bit of history to what happened at one of the sites reporting a problem connecting into the server (which by the way is in Austria - I'm in Canada). I could not even ping the box, so I had to rely on a non VMS tech at the plant to log in and reboot. There had been no Telnet sessions for approximately 8 hours, so the folks there were anxious to get things up again a.s.a.p. The local tech person did confirm that there the led on the NIC card was showing green and that the switch was not showing problems with other connections going through it. I ran anal/err on the box after it came back and I couldn't see any error messages relating to a 24 hour window for the time of the problem. Accounting showed that batch jobs continued to run unaffected during this period also.


I was not aware of any of the commands included in my attached document at the time, as these were taken from other threads linked to similar problems after I started googling for similar problems. I should just mention that the attached output was from another server in France that denied Telnet sessions for periods of up to 10-15 minutes (I believe this happened more than once).

Glacann fear críonna comhairle.
Jim_McKinney
Honored Contributor

Re: UCX intermittent telnet connection problems

> at the plant to log in and reboot

Here're some things to think about - and this is not to suggest that the issue isn't solely telnet related - just trying to clearly define the issue.

Are the users local or remote to the system? Do they all follow the same network path for entry? Are any other network protocols used and were they affected? Were users that were already connected unaffected?
John A. Beard
Regular Advisor

Re: UCX intermittent telnet connection problems

Jim,

The majority of the users would have been local, although myself and some other European support people would have been trying to connect remotely.

I don't know about the path that everyone would have been using


UCX is the principal protocol, with the majority of tasks being either Telnet or FTP. Decnet is used for internal mail box communication only.

From what I was told, the Tech at the plant maintained that no other interactive sessions showed up when he issued the command $sh us/int command from the console.


Glacann fear críonna comhairle.
Jim_McKinney
Honored Contributor

Re: UCX intermittent telnet connection problems

I suggest that if/when this occurs again that you attempt to connect via FTP. You noted that there were no OPCOM messages related to network issues. I presume that you also didn't observe any messages noting that there were no more PCBs available indicating that you'd run out of process slots (you also noted that someone was able to access the system from the console so if this wasn't already logged in it would have required a free PCB)? If possible, should this occur again, and there is some telnet ready system on the LAN that hosts this Alpha server, have the local folks attempt to telnet in. You might also have the local folks access the system on the console and attempt to initiate an outbound session. Also, rudimentary things like SHOW MEMORY/FULL and a SHOW SYSTEM might be informative while the event is occurring. Right now, you might take a look at the NICs counters in LANCP ($ MCR LANCP SHOW DEVICE/COUNT) - I don't know how to access the internal TCP counters using UCX (MultiNet is my stack of choice). During suspected network driven denial-of-service incidents TCP ARP and connection tables often prove interesting (and again I don't know how to tell you to view them using UCX).
Volker Halle
Honored Contributor

Re: UCX intermittent telnet connection problems

John,


I was not aware of any of the commands included in my attached document at the time


If the data in your first attachment was from another server or even from a server, on which the TELNET connection problem has happened, but collected after the reboot, this information is mostly useless.

You need to obtain this kind of information at the time the problem exists or at least afterwards before a reboot.

If the problem exists and you can't access the server remotely (consider to log on another OpenVMS server in the LAN and use SET HOST or SET HOST/LAT), consider to force a crash while the problem exists instead of just rebooting the node. The dump might then allow for the problem to be analyzed lateron.

Volker.