Fox,
>Once the processes attains 100% CPU
>usage,it goes into suspended mode....
>How does this happen?
As others have said, just because SHOW PROCESS/CONTINUOUS says a process is suspended, doesn't mean it is in SUSP state. It just means that, for whatever reason, SHOW PROCESS can't access process private data. Use:
$ WRITE SYS$OUTPUT F$GETJPI(pid,"STATE")
to determine the real state of the process.
To reduce the impact on your system, perhaps you should really suspend the process:
$ SET PROCESS/SUSPEND/ID=pid
or lower its priority
$ SET PROCESS/PRIORITY=0/ID=pid
Note that this may not reduce the processes consumption of CPU, but it will put it at the back of the queue of processes competing for CPU. Also realise that just because the system says a process is using 100% CPU, if that were really the case, you couldn't do anything (because you use the CPU to do stuff). A runaway compute bound process will use whatever CPU is left over. OpenVMS does a fairly good job of reducing the impact of a runaway process by giving other processes priority boosts.
You can use ANALYZE/SYSTEM to see what the process is doing. Start with:
$ ANALYZE/SYSTEM
SDA> SET PROCESS/INDEX=pid
SDA> SHOW PROCESS/CHANNEL
This should tell you what program or procedure it's running. If the process is suspended, you can use SHOW CALL and SHOW CALL/NEXT to examine the call stack. If you have a link map and source code, with a bit of perseverence it's possible to identify exactly where in the program it's looping.
Another trick is SET PROCESS/DUMP=NOW this should write a process dump, which you can then examine using DEBUG and/or ANALYZE/CRASH.
Final point - the danger of having BALSETCNT much lower than MAXPROCESSCNT is that once you have more than BALSETCNT processes, the excess MUST be outswapped. In itself, not a problem, but if you have processes that scan the system (say an Idle Process Killer), if it's not written very carefully, it will "chase" the outswapped processes down the PID array, resulting in excessive swapping activity. If the scan interval is short enough, you can put your system into a thrashing state. FWIW, from your symptoms I don't believe this is happening, but with the resources you have, I can't think of a valid reason for limiting BALSETCNT so severely.
A crucible of informative mistakes