Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

%JBC-F-SYSFAIL

Jim_McKinney
Honored Contributor

%JBC-F-SYSFAIL

Environment is a two node cluster running OpenVMS V8.3-1H1 on I64 (a production cluster that can not be rebooted to experiment). The last two times that one of the two members of this cluster has booted, a “%JBC-F-SYSFAIL, system failed during execution” failure has occurred on a SYNCHRONIZE command within one of the command procedures that executes during startup. The other cluster member has not experienced this issue though it uses the same startup procedure. The startup consists of a collection of batch jobs that are submitted by SYSTARTUP_VMS and whose work often overlaps and is sometimes paused while some complementary procedure readies a necessary piece of the environment. In this instance, a batch job that that starts the installed layered products is dependent upon another that mounts the cluster’s disks. During each of these two episodes, the disk mounting job failed to complete successfully due to some fiber channel issue. The dependent job that starts the layered products, which had been paused via a SYNCHRONIZE command waiting for the completion of the disk mounting job, then aborted with the above error message. VMS HELP explains this failure as “The system crashed during execution of a batch or symbiont process.” Well, the system didn’t crash. The batch queue is set to retain on error and both of these jobs were retained with the error logged as the above SYSFAIL – the layered product startup job also contains evidence of the SYSFAIL in its log file but the disk mount job does not. Can anyone help me understand this? Is this error just signaling that job that the argument of the SYNCHRONIZE command failed? Or perhaps it’s an issue with the JOB_CONTROL process itself? Anyone seen it before?
6 REPLIES
Volker Halle
Honored Contributor

Re: %JBC-F-SYSFAIL

Jim,

could the Mount-job have incurred a non-fatal bugcheck ? Check ERRLOG.SYS.

Also note that there seems to be a little problem with JBC$JOB_CONTROL.EXE at least in OpenVMS Alpha V8.3 patches VMS83A_JOBCTL-V0100 or higher (see a recent 7-MAY-2009 entry in c.o.v) with wrong status code reporting. This may also exist in OpenVMS I64...

Volker.
Hoff
Honored Contributor

Re: %JBC-F-SYSFAIL

Given the "During each of these two episodes, the disk mounting job failed to complete successfully due to some fiber channel issue", that's very good candidate for triggering all sorts of errors here.

Get that "fiber channel issue" fixed.

That can cause pretty much anything in the startup or the job controller or elsewhere to go off the rails. Badly.

Beyond the FC SAN resolution, do confirm that the logical names listed in SYLOGICALS.TEMPLATE are all correct and consistent, particularly if any of these files are located off the system disk.

SYSUAF, RIGHTSLIST, the queue database directory logical name and everything, need to be set up, or there can be weird problems with Job Control.

But a bad FC SAN can cause near-infinite badness.

Jim_McKinney
Honored Contributor

Re: %JBC-F-SYSFAIL

Thank you Volker and Hoff for your input - this cluster is at a remote location and I won't be able to access it until the sun rises in that part of the world. System files (SYSUAF, RIGHTSLIST, queue db, etc) are all in the default locations. It's a homogeneous cluster with shared startup files and the other node has been without issue thus far. Once I get access I will confirm that which I just stated is (still) true. It is certainly suspicious that there was a hiccup in the FC at approximately the same time that the SYCNH command SYSFAILed. Folks are currently working on locating the source of the FC issue. I had also read in cov of the issues of the Alpha JOB_CONTROL patches - not a match for these symptoms but it is certainly possible that the issues of the ECO kit goes beyond what was immediately apparent - will keep that in mind. I will also check that there was no non-fatal bugcheck associated with the event once I get access to the system.
Hoff
Honored Contributor

Re: %JBC-F-SYSFAIL

Reading (much) into your "awaiting sunrise" reply, I'd look to get remote server access and remote console-level access implemented here, too.
Jim_McKinney
Honored Contributor

Re: %JBC-F-SYSFAIL

> I'd look to get remote server access and remote console-level access implemented here

Yes, it exists - but I'm not usually involved with support of this particular system and so don't have access enabled (though other remote folks do) - were this an emergency I could have access enabled but in this instance I'll just wait until local folks wake up. fwiw, I was asked for an opinion on the issue described... and find that it is something that I had never encountered.
Jim_McKinney
Honored Contributor

Re: %JBC-F-SYSFAIL

No non-fatal bug checks; nor do there appear to be any anomalies with the system logical names - will continue to pursue the FC issues and investigate possible issues with JBC ECOs.