Operating System - OpenVMS
1752616 Members
4277 Online
108788 Solutions
New Discussion юеВ

TCPIP services do not always react

 
Willem Grooters
Honored Contributor

TCPIP services do not always react

Customer problem....
two VMS machines, not clustered. Both VMS 7.1-2, machine A TCPIP 5.0A ECO1, machine B TCPIP 5.1
On machine A services a number of external applications. Actually, it's the same commandprocedure, the same user for each, accessing the same data, but on behalf of different systems (Unix, Windows). A third instance services requests from an application on machine B. These three services are (of course) on different ports: on 3011 (S1), 3012 (S2) and 3013 (S3) and have different processname; each has a limit of 15.
The rpogram starting behind it will keep the session opened, and will handle each subsequent request.
S1 has been activated a total of 12 times, S2 the full 15. So I have 12 times a process named S1_, 15 times S2_, all active for days.
However, process S3 can only be invoked a 2-3 times, but the next one will not even produce a logfile. The process (on system B) that tries to access this service gets some error on return. For what reason, I cannot tell (at least: for now, since the program needs to be altered for that). But giving the error and the fact that the service on system A does not produce ANY logging, I conclude that the service isn't even started.
The question is why. Since it DOES work appearently (S1 and S2 DO run, and S3 does for some seeions at least) there must be something that limits the ability for opening extra channels. But where?
Willem Grooters
OpenVMS Developer & System Manager
19 REPLIES 19
Lokesh_2
Esteemed Contributor

Re: TCPIP services do not always react

Hi ,

Check MAXPROCESSCNT sysgen parameter. May be system has reached that value.

Thanks & regards,
Lokesh
What would you do with your life if you knew you could not fail?
Lokesh_2
Esteemed Contributor

Re: TCPIP services do not always react

Or check for maximum no. of ucx device sockets :

$ucx sho comm


hope this helps,
Lokesh
What would you do with your life if you knew you could not fail?
Antoniov.
Honored Contributor

Re: TCPIP services do not always react

Hi Willem,
sorry to can't help you; I only can encourage and I try give you a clue: some service on VMS have limitated connection; for example, if you type TCPIP SHOW SERVICE /FULL you can see limit: nn where nn is the max telnet connection to server. I'm happened on my customer this value were 3 and the 4th PC cannot log in without any error in any log.
Your trouble sounds like this limitation; look at your service characteristic to discover limit value.
Bye
Antoniov
Antonio Maria Vigliotti
Willem Grooters
Honored Contributor

Re: TCPIP services do not always react

Lokesh:
MAXPROCESSCNT is big enough. It _could_ be the limit, but when I asked it had just happened again, MAXPROCESSCNT is over 1200 and number of processes as that moment was less than 600.
Numnber of sockets _could_ be a problem, just look at attachment (most likely happening on NODE3 - the requestor). But how to increase it? I dug into the documentation but didn't get a clue...
Antonio:
/LIMIT is not the point. It happens if way below that number....

Anyone - I've been told by a collegue it could very well be a matter of buffer exhaustion. But again: I cannot find a clue on how to increase this.

Attached: some info from each node involved. I don't know which node on the cluster invokes the problem, I tend to suspect the sender....
Willem Grooters
OpenVMS Developer & System Manager
Lokesh_2
Esteemed Contributor

Re: TCPIP services do not always react

Hi Willem,

I have just posted a new thread about UCX SHO COMM command's output difference in older and newer versions of TCPIP. In older version, the Maximum, current & peak no. of device sockets were displayed, whereas in newer one it do not.

To count the no. of active devices sockets on system is - counting the no. of BG devices on the system. But question is how to find the maximum no. of device sockets in newer version ???

Best regards,
Lokesh
What would you do with your life if you knew you could not fail?
Willem Grooters
Honored Contributor

Re: TCPIP services do not always react

Some more info was given:

The service is only enabled on one node of the cluster (NodeB).

If the problem occurs, it will happen that the sender (NodeC) will hang for some time. If the service on port 3013 is disabled and enabled on nodeB, the problem is over - for some time. But after a few requests have been sent, the problem turns up again.
I have requested somne more info and updated the document - again attached (plain text).

Idea: could it be that the limit exists on NodeC? Since the service on NodeB is not started at all (not even a message!) it could be possible the request was never sent?
Willem Grooters
OpenVMS Developer & System Manager
Antoniov.
Honored Contributor

Re: TCPIP services do not always react

Hello Willem,
as I told prior I'm not sure about the reason of your trouble.
Service on port 3013 is limited to 15 connection; perhaps, if I understand, you need less then 15 connection concurrently; if you suppose some connection are not right close, may happens (after 15 connection) your sistem hangs because NodeB has exhaused resource due prior active (also unused) connection. I realize that I simply a lot the problem but you can check this quickly if you set service limit to 50 (for example): if your problem happens later (because it happpens however) you could investigate why some connection stay alive.
Remember if you change service limit you must stop service and restart it.
At moment I've not any other idea about.
Antoniov
Antonio Maria Vigliotti
Willem Grooters
Honored Contributor

Re: TCPIP services do not always react

Antonio,
LIMIT this is not the problem. See attachement, I tried to explain in more detail.
But I appriciate your new thread, it gives me a next request for information ;-)
Willem Grooters
OpenVMS Developer & System Manager
Willem Grooters
Honored Contributor

Re: TCPIP services do not always react

New info leaking in..
Setup testmachine with same application environment, this machine has no problem at all, where NodeC hangs time after time. Even when NodeC request is waiting to be connected, this testmachine's request is served! It can NOT be replicated there.
It seems intermittendly going wrong. One time the request comes on NodeB, a request issued just a few moments later will end in falure but repeated, it _may_ succeed. There's no guarantee it will. We didn't find a pattern. It seems the requests is never leaving NodeC, since we don't see anything happen on NodeB - where we DO see that the testmachine IS serverd. (Number of active services is increased).

So we concluded so far:
* The problem is NOT on NodeB, otherwise there would be problems with other systems as well, and the testsystem has no problems.
* The problem is NOT the NIC on nodeB - for the same reason
* The problem is NOT the NIC on NodeC - for the same reason
Remains: some setting on NodeC.

We're open for suggestions WHAT to change....

I included ana/sys output from both nodes, and the current SYSGEN parameters on NodeC. BTW: The application uses an RDB database on that machine. For that reason, some parameters will have quite high values.
Willem Grooters
OpenVMS Developer & System Manager