1828892 Members
2760 Online
109985 Solutions
New Discussion

Re: SYSGEN

 
FOX MULDER_2
Frequent Advisor

SYSGEN

Hi,
Can you please let me know how to find process that are swapped out of memory?

The users sessions are getting hanged often & CPU usage is ~ 100%,so I took a lok at balsetcnt...

The value of MAXPROCESSCNT is 521 & BALSETCNT is 207...which I find is not correct.
Total memory is ~ 1GB

But,when I run
$ Show mem/slots

swapped is zero.
So,I guess no swapping is occuring.
Is the value of balsetcnt right or should be changed to 519....Please advice..

Thanks,





21 REPLIES 21
Wim Van den Wyngaert
Honored Contributor

Re: SYSGEN

Bal set should be maxproc - 2. So 519.

Help show sys/state will help you.
The state you need is RWSWP but it could be one of the others too.

Wim
Wim
Wim Van den Wyngaert
Honored Contributor

Re: SYSGEN

Uwe Zessin
Honored Contributor

Re: SYSGEN

It's been a long time I have played with this and I am a bit short on time - others can be more helpful, but allow me a quick warning:

Please do NOT blindly raise BALSETCNT !
You risk that your system does not boot.
This can be fixed via conversational boot, but you do need access to the console.
.
Steven Schweda
Honored Contributor

Re: SYSGEN

I've heard that there's this AUTOGEN thing
which can help when changing SYSGEN
parameters.
Uwe Zessin
Honored Contributor

Re: SYSGEN

I heard too ;-)

But maybe it was AUTOGEN which lowered BALSETCNT because there is a large VA configured and the system might otherwise run out of S0 space. I have seen this on OpenVMS VAX V7.1 many years ago and I don't recall what platform FOX is working on.
.
Hoff
Honored Contributor

Re: SYSGEN

I would encourage a system-wide evaluation of resources, current load, system capabilities, and local requirements.

Based on the available information, the system requires some tuning, additional hardware, reduced load, replacement, or a combination.

Looking at the slots and processes is a factor, but there's a whole lot more information required. It's not clear (to me) around the thought sequence that led from from hung processes and max CPU load to the balance set. (You're going to have to explain that.)

Start with either the T4 tools or a monitor recording pass (or both), and start gathering baseline data on the current system load. And at what the application(s) are doing, and at the particular hangs and aberrant behavior.

Given the discussion of the balance set slots and the reported physical memory, I have to assume that this is an OpenVMS VAX box, and specifically VAX 6000 Model 600, VAX 7000, or VAX 10000 series box. Old and slow. Big and power-hungry, too. By coincidence, I posted some details of VAX XPA and XVA just a week ago, over at: http://64.223.189.234/node/131

Do you have the source code for your applications, or a way to move off these boxes? You're working very near or even at the architectural limits of VAX here, which means that if tuning doesn't work, you are left to either off-load or re-schedule or migrate the applications to OpenVMS Alpha or OpenVMS I64. An rx2660 would likely greatly outperform this VAX box, for instance.

Stephen Hoffman
HoffmanLabs
labadie_1
Honored Contributor

Re: SYSGEN

A swapped process can be in a state like HIBO or LEFO (even COMO).

But first some tuning is needed, and may be some memory should be added.
FOX MULDER_2
Frequent Advisor

Re: SYSGEN

Thanks to all for prompt replies.

The problem is a suspended process is taking ~100% CPU usage.

There is a parent process which is creating a sub process & after a while the sub process becomes non existent but when monitored using mon proc/topcpu that process is consuming 100% cpu usage.

I guess that is the reason other user is getting hanged sessions.

The VMS box is DS20E running on 7.3-2.
It was working fine but only after patching with update 11....these problems are occuring.

So,I while going through sysgen parameter noticed balsetcnt.....

Any suggestion would be very helpful.

Thanks
Jan van den Ende
Honored Contributor

Re: SYSGEN

Fox,

so it is Alpha - that is good news.
The balsetcount vs. VA problem does not apply.

My guess: either someone changed SYSGEN params without using AUTOGEN, or, in MODPARAMS MAXPROCESSCNT as well as BALSETCNT are hard-coded values, and the AUTOGEN warnings were ignored.
In the latter case, simply remove BALSETCNT from modparams.dat (and files invoked by it, if applicable) and let AUTOGEN recalculate it.

Probably it is best to first do a
@AUTOGEN SAVPARAMS TESTFILES FEEDBACK
which reports, does not yet change anything, and review the report.

hth

Proost.

Have one on me.

jpe

Don't rust yours pelled jacker to fine doll missed aches.
Hoff
Honored Contributor

Re: SYSGEN

Stay away from SYSMAN PARAMETER, AUTOGEN and MODPARAMS.DAT for the present. (Please don't take that the wrong way -- while there might well be a parameter issue here -- it appears far too early to be tweaking parameters.)

A "suspended" process accumulating CPU would initially appear to be a lost I/O, or lost quota, or other kernel-level error. A much more detailed look at the process and at the loop is going to be required here. This may well prove to be a bug in the UPDATE kit, for instance.

If these systems have a support contract in place, you will want to avail yourself of its benefits; to contact the HP customer support center.
Wim Van den Wyngaert
Honored Contributor

Re: SYSGEN

"sub process becomes non existent but when monitored using mon proc/topcpu that process is consuming 100% cpu usage."

Is the process gone or isn't it ? Use sh proc/id=xxx to verify. Is mon proc/topcpu giving the same pid for the process ? What do you find in process accounting ? Operator log ? Audit trail ?

Did you reboot after installing the patches ?

Are all slots taken when you do show mem/slot ?

Wim
Wim
Rob Young_4
Frequent Advisor

Re: SYSGEN


Thanks to all for prompt replies.

The problem is a suspended process is taking ~100% CPU usage.

---

See this thread for similar spinning process
discussion:

http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=1105743
Andy Bustamante
Honored Contributor

Re: SYSGEN


If it worked before the ECO and doesn't work after . . .

Did you install the patch with /save_recovery? Type:

$ PRODUCT SHOW RECOVERY_DATA

You can use PRODUCT UNDO PATCH to restore your system to it's previous state, this can be a quick way to eliminate the ECO as the cause or confirm the cause issue.

I've used UNDO and it really annoys the Windows types.

Andy

If you don't have time to do it right, when will you have time to do it over? Reach me at first_name + "." + last_name at sysmanager net
Hoff
Honored Contributor

Re: SYSGEN

>>>
I've used UNDO and it really annoys the Windows types.
<<<

Microsoft Windows XP restore points have provided this capability for some time now.


Jess Goodman
Esteemed Contributor

Re: SYSGEN

A truly suspeneded process should not be using the CPU. Do SHOW SYSTEM and see what state the process is really in. Also go into ANALYZE/SYSTEM and SHOW that PROCESS there.

DCL's SHOW PROCESS command gives the highly-misleading message "process is suspended" if you use it on a process stuck in any of the MWAIT states (MUTEX, RW*) or for any process that has the delete-pending bit set.
I have one, but it's personal.
Uwe Zessin
Honored Contributor

Re: SYSGEN

> Microsoft Windows XP restore points have provided this capability for some time now.

For the whole system, yes. But even on Windows 2000 I was able to roll back (uninstall) a single patch.
.
FOX MULDER_2
Frequent Advisor

Re: SYSGEN

Hi,
When I issue the command
$ sh proc/id=XXXXXXX
The message appears suspended and after a while it becomes non existent but,when
$ sh sys
is issued that process is listing as current.
Each time the process becomes non-existent, anew sub-process is created.This event is repeating.

Memory slots are not taken completely,there are free slots available.
I did install the patches using /sav qualifier.

Once the processes attains 100% CPU usage,it goes into suspended mode....How does this happen?
I think for this reason other users are affected.

Thanks.
Volker Halle
Honored Contributor

Re: SYSGEN

Hi,

SHOW PROC/ID=xxx may issue a 'supended' message, if it can't get data from the remote process. Only SHOW SYS will tell the true status of that process.

Do you have a system with more than 1 CPU ? Only in that case, you could see another process in a CUR state - otherwise, it will always be your process, in which you have issued the SHOW SYS command.

Are you saying, that there is some process in your system, which seem to be creating sub-processes, which then consume 100% of CPU time and then disappear ?

You can look at accounting to find out, why those subprocesses disappear (exit status) and how long they were active and how much CPU they consumed:

$ ACC/FULL/SINCE=time/TYPE=PROCESS/PROCESS=SUBPROCESS

Volker.
Wim Van den Wyngaert
Honored Contributor

Re: SYSGEN

What is "after a while" ?
I've seen process termination taking 5 seconds (due to cleaning up ?).

Did you reboot ?

Wim
Wim
John Gillings
Honored Contributor

Re: SYSGEN

Fox,

>Once the processes attains 100% CPU
>usage,it goes into suspended mode....
>How does this happen?

As others have said, just because SHOW PROCESS/CONTINUOUS says a process is suspended, doesn't mean it is in SUSP state. It just means that, for whatever reason, SHOW PROCESS can't access process private data. Use:

$ WRITE SYS$OUTPUT F$GETJPI(pid,"STATE")

to determine the real state of the process.

To reduce the impact on your system, perhaps you should really suspend the process:

$ SET PROCESS/SUSPEND/ID=pid

or lower its priority

$ SET PROCESS/PRIORITY=0/ID=pid

Note that this may not reduce the processes consumption of CPU, but it will put it at the back of the queue of processes competing for CPU. Also realise that just because the system says a process is using 100% CPU, if that were really the case, you couldn't do anything (because you use the CPU to do stuff). A runaway compute bound process will use whatever CPU is left over. OpenVMS does a fairly good job of reducing the impact of a runaway process by giving other processes priority boosts.

You can use ANALYZE/SYSTEM to see what the process is doing. Start with:

$ ANALYZE/SYSTEM
SDA> SET PROCESS/INDEX=pid
SDA> SHOW PROCESS/CHANNEL

This should tell you what program or procedure it's running. If the process is suspended, you can use SHOW CALL and SHOW CALL/NEXT to examine the call stack. If you have a link map and source code, with a bit of perseverence it's possible to identify exactly where in the program it's looping.

Another trick is SET PROCESS/DUMP=NOW this should write a process dump, which you can then examine using DEBUG and/or ANALYZE/CRASH.

Final point - the danger of having BALSETCNT much lower than MAXPROCESSCNT is that once you have more than BALSETCNT processes, the excess MUST be outswapped. In itself, not a problem, but if you have processes that scan the system (say an Idle Process Killer), if it's not written very carefully, it will "chase" the outswapped processes down the PID array, resulting in excessive swapping activity. If the scan interval is short enough, you can put your system into a thrashing state. FWIW, from your symptoms I don't believe this is happening, but with the resources you have, I can't think of a valid reason for limiting BALSETCNT so severely.
A crucible of informative mistakes
FOX MULDER_2
Frequent Advisor

Re: SYSGEN

Hi,
Thanks to all for your prompt replies.
The problem got solved today,it was due to a corruption in application file due to which the process was looping.

However,I triggered autogen & it suggested a few changes which I think should have done long back.

Thanks.