1827849 Members
1938 Online
109969 Solutions
New Discussion

Re: SYSMAN questions

 
Willem Grooters
Honored Contributor

SYSMAN questions

(AXP 7.3-2, I64 8.3, Cluster)
I have a few questions concerning the SYSMAN utility, because of a problem we encountered.

A command procedure (started interactively)removes user environments and related processes (STOP/ID maninly) but by means of a global symbol, set up during login, will prevent the user's environment to be removed (and therefor, the user process will NOT be deleted).

Running exactly the smae procedure using SYSMAN on a different node however, seems to miss this symbols - causing the environment being removed and the process being killed.
Sysman does NOT return an error - but hangs. ^Y and STOP works, once. Trying SYSMAN again, just setiing the environment to the other node, cases SYSMAN to hang completely. Even ^Y has no effect.
The process running SYSMAN is in LEF state, but there are no channels busy at all. This has been observed on Itanium.
he same issue exists on Alpha, where even a simple reboot of the node didn't help: the system needed to be powered off completely.

Login procedures for this user are the same on both systems. It could be that there is something in these procedures that is bypassed when the system is accessed suing SYSMAN (when f$mode <> INTERACTIVE, for instance). Still, that does not explain why SYSMAN hangs in LEF state, without a channel opened, and what special handling is done when accessing the remote system in terms of login-procedures (if any).
Another think I'm curious about are the interaction between SYSMAN on one system and the process on the other system, usage of resources and how SYSMAN handles them.
Willem Grooters
OpenVMS Developer & System Manager
12 REPLIES 12
Jan van den Ende
Honored Contributor

Re: SYSMAN questions

Willem,

to begin with, only a partial answer.

SYSMAN processes do _NOT_ execute any LOGIN or SYLOGIN. They do _NOT_ get any logicals, except CLUSTER- SYSTEM- and applicable GROUP-table ones. They do NOT ingerit any symbols.

Any such process-environment setup can however easily by done by SYSMANINI.
Define that logical to point to the desired "SYSMAN LOGIN" file in the *SYSMAN* table (can not check the correct name right now, but SHOW LOG/TABLE=*SYSMAN* will tell you.

hth

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Dean McGorrill
Valued Contributor

Re: SYSMAN questions

also things are not 'sticky' between commands. Dean eg.

SYSMAN> do defin xxx "foo"
%SYSMAN-I-OUTPUT, command execution on node ENGDS1
SYSMAN> do sho log xxx
%SYSMAN-I-OUTPUT, command execution on node ENGDS1
%SHOW-S-NOTRAN, no translation for logical name XXX
SYSMAN>
Jan van den Ende
Honored Contributor

Re: SYSMAN questions

Yes,

Dean just showed a good example.

Doing the DEFINE in SYSMANINI, you DO get the translation.
Especially for clusterwide unattended batch jobs, we use the SYSMANINI mechanism.
Just put ALL desired instructions in a temporary file, define that as SYSMANINI.
Start SYSMAN, and on the next dataline give EXIT (without a leading $ !!).
SYSMAN inits with doing all you want, and exits.
Oh, do not forget the SET ENVIRONMENT.

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Willem Grooters
Honored Contributor

Re: SYSMAN questions

I already found out that the process does not execute any login, and I can cope with that more easily for the case I need it (just anoter parameter to be passed). I also found out that on the remote system, a detached process is started dor the duration of the command DO and that this is not a persistant process so the context is lost when the command is DOne.

Still, there must be some context required and set up for creating the process, e.g. the login directory (SYS$LOGIN:). If SYSUAF states a logical location (eg. MGRDEV:) that is usually set up as a /SYSTEM logical, there is no problem so a command procedure can be run remotely. But if this procedure removes the logical and kills the process (that is what happens, I think), SYSMAN hangs and will not be able to recreate the process since there is no such logical anymore. What causes it to hang?
A next invokation of SYSMAN and SET ENVIRONMENT to the next node hangs SYSMAN beyond repair: but what is the cause of the hang?

(My original problem is now understood and solved, but this still bothers me)
Willem Grooters
OpenVMS Developer & System Manager
Willem Grooters
Honored Contributor

Re: SYSMAN questions

It's getting even more bizar.

See attachement - this is in a 4-node cluster (Itanium, VMS 8.3), sharing system disk (inclusing UAF and the apllication used), with the same SYSGEN parameters)

SYSMAN can now be used where it hung before (on NodeB), but the node that was accessed when the problem arose (NodeC) can not be reaced usuing SYSMAN - from ANY node in the cluster. A normal login on NodeC is, hoever, not blocked.

So there are more questions:
What causes NodeC to be unreachable by SYSMAN, from any node in the cluster?
How to make NodeC reachable again (without reboot, preferrably)
Willem Grooters
OpenVMS Developer & System Manager
Karl Rohwedder
Honored Contributor

Re: SYSMAN questions

Willem,

the attachement is missing, btw. is SMISERVER process running on all nodes, perhaps a process dump in SYS$SYSTEM?

regards kalle
Willem Grooters
Honored Contributor

Re: SYSMAN questions

Oops....


Willem Grooters
OpenVMS Developer & System Manager
Willem Grooters
Honored Contributor

Re: SYSMAN questions

We just tried (!) to reboot nodeC, it _seems_ to shutdown. The node is not accessable using DECnet ("node currently unreachable") but the other nodes signal it as "MEMBER" with the previous transition time, as if shutdown did not appear at all.

I have no access to the console so will have to wait for system management to create a crash dump....

WG
Willem Grooters
OpenVMS Developer & System Manager
Willem Grooters
Honored Contributor

Re: SYSMAN questions

additional:
We assumed the system was down since the normal sequence showed up - including disconnecting the terminal. However, we just found out that TCPIP was still running, we could reach it uisng telnet. Shutdown was done just partially: queue manager was stopped, DECNet seems down (no REMACP process) but TCPIP is still up-and-running. Accounting gave us no clue on the processes run by SYSMAN (but that could well be a configuration issue)

Weird....
Willem Grooters
OpenVMS Developer & System Manager
Volker Halle
Honored Contributor

Re: SYSMAN questions

Willem,

SYSMAN in a cluster communicates via the CLUSTER_SERVER process. The CLUSTER_SERVER process on the remote node sends the command to the SMISERVER process via mailbox communication. SMISERVER then spawns a subprocess to execute the remote DO ... commands.

If mailbox communication between SMISERVER and CLUSTER_SERVER is hung, your remote remote will not be reachable with SYSMAN.

Check the state of those 2 processes, both should normally be in HIB. If they are not, there may be a problem...

Volker.
Volker Halle
Honored Contributor

Re: SYSMAN questions

Willem,

did you try restarting the remote SMISERVER process ?

$ STOP/ID=xxx
$ @SYS$SYSTEM:STARTUP SMISERVER

Volker.
Willem Grooters
Honored Contributor

Re: SYSMAN questions

We decided to disable the specific function, because the offered solutions are not possible (the SYSMAN command is issued in a spawned procedure, started from an executable).
Willem Grooters
OpenVMS Developer & System Manager