Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

CMS problem: %CMS-F-BUG

 
John McL
Trusted Contributor

Re: CMS problem: %CMS-F-BUG

Brad, the answer is of course no, or there would be no problem.

I have a list of the differences that I'll have to work through, but that will take some time now that the person with this problem will be away for the rest of this week.

Can we work back from the other end and maybe get some clues as to what we should look for, especially given that SYS$SCRATCH and SYS$LOGIN look okay? What exactly does the error message mean and why is it returning a non-specific "BADPARAM" message? I'm surprised that CMS doesn't check parameters or doesn't catch the error and produce something more informative.
John Gillings
Honored Contributor

Re: CMS problem: %CMS-F-BUG

John,

> I'm surprised that CMS doesn't check
>parameters or doesn't catch the error and
>produce something more informative.

Can't check everything! Think about what's happening here. (note that I know very little about CMS, so I may make some invalid assumptions...)

CMS SHOW HISTORY is presumably pulling some data out of one or more files, formatting and displaying the output. I'd assume all the accesses to CMS files are read only, so that really just leaves the displaying output part as a suspect for WER and BADPARAM.

What are the devices SYS$OUTPUT, SYS$ERROR, SYS$COMMAND and SYS$INPUT for your apache process? Is CMS assuming the output is a terminal device perhaps? Maybe it's using some terminal driver function code which isn't working?

If the CMS command is in a command procedure, a quick check might be something like:

$ CMS SHOW HISTORY/OUTPUT=tmpfile
$ TYPE tmpfile

or even:

$ PIPE CMS SHOW HISTORY | TYPE SYS$PIPE

Maybe there are qualifiers, or logical names which "dumb down" the CMS output (like DFU$NOSMG)?

You could also try SET WATCH/CLASS=MAJ for clues.
A crucible of informative mistakes
John Gillings
Honored Contributor

Re: CMS problem: %CMS-F-BUG

John,

>What exactly does the error message mean
>and why is it returning a non-
>specific "BADPARAM" message?

Just expanding on this...

$ HELP/MESSAGE BADPARAM
...

BADPARAM, bad parameter value

Facility: SYSTEM, System Services

Explanation: A value specified for a system function is not valid. Several conditions can cause this error:
...(bunch of possibilities, none of which look like good candidates for your case)...

$ HELP/MESS WER
...
WER, file write error

Facility: RMS, OpenVMS Record Management Services

Explanation: An error occurred during an RMS file system write operation.

User Action: The status value (STV) field of the RAB contains a system status code that provides more information about the condition. Take corrective action based on this status code.

So the sequence of events is...

a system service, probably $QIO found something wrong and returned BADPARAM to RMS, which put that in the STV and returned WER to CMS. The CMS output layer built a signal array with the RMS and system service conditions, then added NOQIO and signalled it. Since there weren't any condition handlers which recognised the condition, the CMS last chance handler caught it, added CMS$_BUG and resignalled to VMS.

It's non-specific because it was detected inside $QIO. You're in inner mode, possibly at high IPL when the condition is detected. You don't have time or cycles to be more specific, it's just a case of "let's get out of here safely". Maybe $QIO and/or the lower level device drivers could be changed to give better, more specific messages, but realise that they're not signalling the condition (if you did, you'd crash the system), so you've only got the return status and the IOSB to communicate (rather than a signal array with space for parameters).

That means you would need to define specific condition codes for each possible error condition. Remember there are lots of device drivers with different uses for different parameters. It's a combinatorial explosion of things that can go wrong!

Generic conditions like BADPARAM, NOPRIV, EXQUOTA and others are a pain in the proverbial, but the reality is, it's not always possible to do much better.
A crucible of informative mistakes
John McL
Trusted Contributor

Re: CMS problem: %CMS-F-BUG

Rather than provide log files, which would be useful. the person with the problem emailed me a html file of differences.

From that I see the following logicals for the failing job:
"SYS$COMMAND" [super] = "_BG53295"
"SYS$COMMAND" [exec] = "_NLA0:"
"SYS$DISK" [super] = "apache$root:"
"SYS$DISK" [exec] = "apache$root:"
"SYS$ERROR" [super] = "_BG53300"
"SYS$ERROR" [exec] = "_BG53297:"
"SYS$INPUT" [exec] = "_NLA0:"
"SYS$OUTPUT" [super] = "_BG53297:"
"SYS$OUTPUT" [exec] = "_BG53297:"
"SYS$SCRATCH" = "APACHE$ROOT:[000000]"
"TT" = "_NL:"

Where nulls (NL: NLA0:) appear here, the process that teh comparison is made to, although it looks to be interactive, has legitimate devices (e.g. terminal).

David B Sneddon
Honored Contributor

Re: CMS problem: %CMS-F-BUG

Have you recently upgraded CMS?
A while ago I upgraded DECset to the latest and
it broke the callback mechanism.
I reinstalled the previous version of CMS.
I seem to recall the error was also a BADPARAM error.
On investigating it, it seems some parameters were pushed on the stack in the wrong order.
Don't know why it would have changed but there you go.
I'll see if I can track down my notes on it.

Dave
John Gillings
Honored Contributor

Re: CMS problem: %CMS-F-BUG

John,

Note that your SYS$OUTPUT points directly to a network device. Perhaps CMS is writing to it as if it were a terminal? That might account for a BADPARAM. Why that might happen for one user and not others, I don't know.

See if dumping the output to a temp file and TYPEing the file makes a difference.

Is everything happy with 5 digit unit numbers?
A crucible of informative mistakes
John McL
Trusted Contributor

Re: CMS problem: %CMS-F-BUG

The person has returned to work and I've now done some further investigation.

As John G suggested, it looks like CMS doesn't like writing to a device with the characteristics listed below (as per a SHOW DEV/FULL SYS$OUTPUT)

Device BG11125:, device type unknown,
is online, mounted,
record-oriented device,
carriage control,
network device, mailbox device.

and "Default buffer size 32767"

I wonder what the bad param is on the QIO call ... an unknown device type, buffer size??

The workaround is to direct the CMS output to a file and then just TYPE the file (to copy it to SYS$OUTPUT).
John McL
Trusted Contributor

Re: CMS problem: %CMS-F-BUG

See posting above