Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

WEBES CA.A.x - big amount of processes

 
Ronny Hartwich
Occasional Visitor

Re: WEBES CA.A.x - big amount of processes

This problem occured on 5 of our 28 servers within the space of 24 hours between March 26 & 27.
Of significance is that we are running VMS 7.3-2, not 8x.
In each instance the system reached max process count within about 30 minutes of creating a new errorlog.
I have shutdown DESTA for the time being and will examine writing a procedure as suggested above.
Hoff
Honored Contributor

Re: WEBES CA.A.x - big amount of processes

So an HP management tool tips over and takes out a production server when youy're performing documented and supported and routine system maintenance.

From the manuals...

"20.3.2 Maintaining Error Log Files
Because the error log file, SYS$ERRORLOG:ERRLOG.SYS, is a shared file, ERRFMT can write new error log entries while the Error Log utility reads and reports on other entries in the same file.

ERRLOG.SYS increases in size and remains on the system disk until you explicitly rename or delete it. Therefore, devise a plan for regular maintenance of the error log file. One method is to rename ERRLOG.SYS on a daily basis. If you do this, the system creates a new error log file. You might, for example, rename the current copy of ERRLOG.SYS to ERRLOG.OLD every morning at 9:00. To free space on the system disk, you can then back up the renamed version of the error log file on a different volume and delete the file from the system disk."

You're correct that the "That at the time when we are creating new errlog.sys the wccproxy is stuck. Solution is to stop webes before this action and start it after." seems an explanation of the trigger for the bug in WEBES, and not the bug itself. I'd also interpret that approach as a workaround for the bug here, as well.

I'd expect that the resulting fix here would mean that WEBES would deal with the temporarily missing file correctly, or that the OpenVMS documentation would be updated to indicate that the RENAME is no longer supported, or that you need shut down WEBES around the RENAME.

It'd be nice if there was a throttle that avoided run-away process creation within WEBES; that's seldom goodness on a production server.

(As a local workaround, I'd look to set a PRCLM value under the WEBES username(s). Though I've not tried this, it might somewhat limit the damage the next time the WEBES process creation logic goes berserk. This if WEBES is running under its own username, and I don't have a box running WEBES to check that right now. The boxes I deal with don't run WEBES.)