cancel
Showing results for 
Search instead for 
Did you mean: 

sysman stuck

mrityunjoy
Advisor

sysman stuck

We have a four nodes cluster  where openvms 7.3-2 is running. One day we noticed that sysman was not working on NodeB althogh SMISERVER process is running.

 

 

$  mcr sysman

   set environment /cluster

    

Generates the following error:

 

-SMI-E-PROTOCOL, remote protocol error - data packet w/o INIT

 

 

Restarted SMISERVER process on NodeB

 

Command still failed

 

Attempted reboot of NodeB. It stuck after showing the below message on console.

 

%%%%%%%%%%%  OPCOM  13-JUL-2011 05:34:36.82  %%%%%%%%%%%

Message from user SYSTEM on NODEB

%JBC-W-SYSERROR, SYS$MANAGER:JBC$DST_COMMAND.COM daylight savings time process failed system service error at PC 00011EA0

 

%%%%%%%%%%%  OPCOM  13-JUL-2011 05:34:36.82  %%%%%%%%%%%

Message from user SYSTEM on NODEB

-JBC-W-NOTIMZONRUL, SYS$TIMEZONE_RULE logical not defined, Daylight Savings Time clock adjustments are not possible

 

Could not boot NODEB

 

Restarted SMISERVER process on NODEA

 

Booted  NODEB successfully

 

SYSMAN command then worked ON NODEB. NO PROBLEM ON NODEC and NODED found.

 

 

We need to know why this happened and how we can avoid it in the future.

 

Mrityunjoy Kundu -AST (TCS)
8 REPLIES
Volker Halle
Honored Contributor

Re: sysman stuck

Hi,

 

if you 'need to know' something about a problem you've seen with OpenVMS, you better log a call with HP instead of relying on a public forum, where OpenVMS users try to help other OpenVMS users...

 

Volker.

mrityunjoy
Advisor

Re: sysman stuck

I am new in this forum. It would be a great help if someone give us the idea why it is happening..
Mrityunjoy Kundu -AST (TCS)
Volker Halle
Honored Contributor

Re: sysman stuck

You normally get these %JBC-W-SYSERR -JBC-W-NOTIMZONRUL messages, when a node has been booted with STARTUP_P1="MIN". It indicates, that the SYS$TIMEZONE_RULE logical had not been created during boot. What could have caused this to happen on NODEB ?

 

There was a discussion about this SMI-E-PROTOCOL message in comp.os.vms back in DEC-2004. At that time, it had been determined, that the node failing to run SYSMAN with this error had had a problem with the disk (was in MountVerifyTimeout), on which the SYSUAF file did reside. Could that have been the case on your NODEB ?

 

Did the boot of NODEB just continue, after you restarted SMISERVER on NODEA ? Or did you halt and >>> boot NODEB again ? If so, you might have better forced a crash of NODEB, as that would allow after-the-fact analysis of this situation.

 

Did you carefully check OPERATOR.LOG and console output of these nodes immediately preceeding the error ?

 

Volker.

mrityunjoy
Advisor

Re: sysman stuck

Hi Volker,

 

NodeB just  continue to booting ,  when we restarted SMISERVER process on NODEA. There was no error on operator.log or console for this.

Mrityunjoy Kundu -AST (TCS)
Volker Halle
Honored Contributor

Re: sysman stuck

As a first step in determining where NODEB was hanging during startup, you need to closely examine the console output from the 'hung' boot and compare it to the console output of the last 'good' boot. Also consider looking at accounting data from the early startup phase, if you can find out, which process/image was maybe hanging for an extend amount of time. You can get the exact time of the restart of SMISRVER on node NODEA from the accounting data on NODEA.

 

All of this may appear to be a lot of work, but if you want to find out what happened, you have to invest your own time to do the analysis. Or wait to get lucky, if someone else sees this problem, analyses it, finds this thread and provides the results of the analysis here...

 

Volker.

 

Hoff
Honored Contributor

Re: sysman stuck

SYSMAN isn't a particularly reliable choice for performing distributed system activities during a cluster startup.  

 

Cluster and distributed startups are inherently asynchronous, and tossing distributed requests around during the bootstrap sequences can sometimes get stuck in odd states.  

 

And yes, you're going to have to debug your startup.  Figure out what the particular SYSMAN operation is doing, and work from there.  Figure out some other way to do it.

This SYSMAN processing is sometimes an operation involving (remote) disks, and creating a local copy and some local customizations of the SYS$EXAMPLE:MSCPMOUNT.COM example procedure can sometimes provide an alternative to SYSMAN-based disk processing.

 

--

 

More of these random forum errors: "Your post has been changed because invalid HTML was found in the message body. The invalid HTML has been removed. Please review the message and submit the message when you are satisfied."  

Dennis Handly
Acclaimed Contributor

Re: sysman stuck

>More of these random forum errors: "Your post has been changed because invalid HTML

 

I've seen it several times.  (Better to put these in the feedback forum.)

Hoff
Honored Contributor

Re: sysman stuck

Dennis, thanks for the reply, and please feel free to log that bug on my behalf.   (If you'd like to chat offline about this bug and related topics, let me know.  That's not be relevant to sysman and startup, though.)