Re: poor performance after memory upgrade

Ian Miller. · ‎05-24-2004

INSTALL REPLACE imagefilename/OPEN/HEADER/SHARE
would be the usual. Note this consumes gblpages and gblsections. See
INSTALL LIST/G/SUM to see how many gblpages free.

____________________
Purely Personal Opinion

Antoniov. · ‎05-24-2004

--------------------------------------------------------------------------------
Peter,
reading you account attachment I known you have about 75 loginout every quarter of hour!
This means abou 300 user every hours when you have 650/700 user average connected. Every two hours all your user made turn over!
You have a big problem of hard PF due excessive loginout as posted by a few members.
Also, your user make many create/file and delete, so your HD become very fragmented and you have heavy I/O.
I think you need a good defragmenter and to know used shared library monit any user running QUIZA810C2 and QTPA810C2 application.

Have a great day!

Antonio Maria Vigliotti

Peter Clarke · ‎05-24-2004

Antoniov,

I have a defragmenter and have planned to run this coming weekend.
I guess this will improve i/o and performance.
Regarding the LOGINOUT image it looks to me as is this image is installed already.
How could i improve this??

Peter

Ian Miller. · ‎05-24-2004

the problem is not the loginout image but the fact that it indicates lots of logins hence process creation. Process creation is an expensive operation on vms.
However don't focus on technical numbers but find a business metric visable to the users e.g. response time for a specific operation that users do a lot. This means you have to talk to your users and this can be like communicating with sheep sometimes :-) but you have to try and workout what they do and what's slow for them. Then work out what resource bottleneck is involved and fix it. This all may be obvious but its worth repeating as it is too easy to get focused on the technical numbers like hard page fault rate and forget what is visable to the lusers.

____________________
Purely Personal Opinion

Jan van den Ende · ‎05-24-2004

Peter,

Yes, LOGINOUT will definitely already have been INSTALLed, and unless you REALLY do know the details of WHAT you are doing, AND you REALLY need it (and I cannot think of any good reason yet), then DON'T mess with LOGINOUT!!!
It is SO easy to blow your system security away, and/or make your system unaccessable!!

If you review your Image Accountng data, then the images that could benefit are those that are _NOT_ in sys$system nor in sys$library.
If you do the INSTALL /SHARE, then the code will be in memory, and image activation requires only DZRO (DemandZeRO) pagefaults, which are soft faults.
The two images that immediately spring into view are QUIZA810C2 and QTPA810C2.

The issue with disk defragmenting may or may not give real gain. If most of the disk access is to create, and shortly thereafter delete, temporary files, then defragging will be little help. The reported window turns will probably be gone after defragging though.
There is a long-standing (as far is I know still-unresolved) debate about the effects of defragging vs. using Cathedral Windows, as to the cost for improving performance. Defrag costs (a lot) of CPU & IO, but can be done in off-hours. Cathedral windows cost some extra memory.
Another (maybe) usefull suggestion:
If you know more or less what the average size of your temporary files is, then check the default extent size of the volume.
Use SET VOLUME/EXTENT=.. on every node in the cluster to make reasonably sure you very seldom need more than one extent!
BTW, maybe stupid question, but you DO have HIGHWATERMARKING turned OFF, I assume? If not, then do so! (unless you are a Top-secret Military site, but then, you could not have posted what you already have).

Hth,

Jan

Don't rust yours pelled jacker to fine doll missed aches.

Ian Miller. · ‎05-25-2004

"Vols in Full XFC mode 0 Vols in VIOC Compatible mode 6"

You won't see any volumes in Full XFC mode just yet as every node runs in VIOC compatible mode in current versions of VMS.

If this an alpha and you determine activating certain sharable images is causing lots of hard page faults and these page faults make a noticable difference to your users then you may wish to look at resident sharable images (see INSTALL ADD /RESIDENT). Images storaged resident are loaded in to memory once, are mapped in system space and cause 0 pages faults when accessed.

____________________
Purely Personal Opinion

Brian Reiter · ‎05-25-2004

Hi Peter,

I performed a similar exercise on one of our systems. Although the system was DS10 based, it ran up around a 110 cooperating tasks. During startup the hard page rate was in the thousands. (about 12,000 as I recall). During normal operation the hard paging peaked around 200 per second

The initial reduction in hard paging was gained by migrating the object libraries all the tasks linked against to shareable installable images. These shareable inmages were then installed as resident during system initialisation - the size of each task image decreased by at least 50%. Note this requires tasks to be relinked (/SECTION_BINDING=(CODE,DATA)). Images which were run up multiple times were then installed as header resident.

Further gains were achived by tweaking the Working Set Defaults and Quotas to match the average required. This may not be feasible in your case.

Hope this helps

regards

Brian

John Eerenberg · ‎05-25-2004

Peter,

Could you still post a
$ Monitor Page
?
There are some stats on this display that don't show up in the excel spreadsheet you posted last Friday.
Maybe run the monitor page for a period of time equal to that 11:30AM - 12:00 noon collection you have along with an updated spreadsheet for the same time frame. If you run it for 30 minutes, all I need is the last screen shot. I know this is a bit of work to do, but I think it will be usefull for double checking demand zero page faults, etc. as well as system page faults. So far, system page faults have not been mentioned but this can be a killer of performance even worse then image activation.

The main reason is to sanity check some sysgen parameter's performance and help ensure they are not artificially low.

Jan,
> Sorry John,
> there is NO problem there!
I have to disagree with you a little since the number of processes are not close to the number of processes indicated in the spreadsheet.
The way I look at it is this: there are roughly 300 process at 9 in the morning. There are 2 to 3 times that many in the data collected in the spreadsheet. Therefore, the page file could be used at 11:30AM but not at 9AM (though, it does look odd to me that the pagefile has absolutely no usage). My comment about the modified page list is based on experience with other systems and a gut feel that this might be the case. Without knowing the MPW parameters, this is speculation on my part; I appologize if I seemed to have made a definitive conclusion. At this point, it just something to explore and the page write IO rate will help clarify.

Brian,
> During startup the hard page rate was in the thousands. (about 12,000 as I recall).
Are you sure these are hardfaults. If so, that means your disk IO is handling 12,000 IO/second with less than 0.1ms response time. Are your hardfaults getting cached somewhere?
In the controllers maybe?

john

It is better to STQ then LDQ

Brian Reiter · ‎05-25-2004

John,

It was a while a go (6 months) during a period of redevelopment on a legacy soft-realtime system. It could be I got the figure wrong, however the page fault rate (soft and hard) was so excessive that you could watch a single event cause hard faults on every task that was part of the processing.

The overall problem that the system had never been tuned since conversion from the VAX system, it worked OK for small capacity installations. For the large installation we're planning moving common code (tasks and libraries) to installed sections and correcting the size of Working Sets and Page File quotas gave an immediate improvement, paging went down to less than 1 hard fault a second, soft paging at 20 or so and overall system performance and throughput increased dramatically.

cheers

Brian

Peter Clarke · ‎05-25-2004

Can anyone tell me why when i look at the uaf records the WSextent has a value of 16384,however when i look at the process in availability manager the WSextent for that user's process has a value 151552 why are these value's different??

Reg

Peter

Ian Miller. · ‎05-25-2004

PQL_MWSDEFAULT,QUOTA,EXTENT system parameters impose a minimum. For example the actual WSDEFAULT for a interactive process is the larger of the value in the UAF record and PQL_MWSDEFAULT. In yr case PQL_MWSDEFAULT is larger so thats what you see.

____________________
Purely Personal Opinion

Peter Clarke · ‎05-25-2004

Hi John,

Attached are requested attachments.

As you will see i have done a few of the ideas suggested here.I have increased the size of the SGA by about 300MB i have also turned file highwater_marking off on all volumes and have run autogen again.
I few people have mentioned about the frequent image activations.....
What would cause this ??
Can it be improved??
Is it the programming??

Also as you will see am still getting high hard page faults any more ideas??
Have checked the working sets and all seem to be ok apart from the subprocesses where do these get the ws values from the uaf?? because the user working set is ok but that user's subprocess is reporting working set quota too small in availabilty manager??

Peter Clarke · ‎05-25-2004

mon page report attached....

Ian Miller. · ‎05-25-2004

image activiations results in hard page faults because thats how the image gets loaded. So high image activation rate leads to high hard page fault. Best idea is to run the images less often. If not install the image /open/header/shared or parhaps /resident to reduce the overhead required by image activation.

____________________
Purely Personal Opinion

Richard Helmke_2 · ‎05-26-2004

I may have missed this item in the previous replies, but did you reserve memory for the Oracle SGA using SYSMAN?

In chapter one of the VMS Installation Guide for Oracle, they describe the process:

$ MCR SYSMAN
SYSMAN> RESERVED_MEMORY ADD ORA_TEST_SGA -
/SIZE=/ALLOCATE/ZERO/PAGE
SYSMAN> EXIT

where is the number of MB to reserve. Then run AUTOGEN and reboot.

In my experience as a VMS/Oracle DBA this is a big win.

Wim Van den Wyngaert · ‎05-26-2004

Based upon the accounting info :

Check what the SBOLTON process is doing.
A process is created, some reporting is done using QTP and QUIZ and the process dies. It restarts immediately.

Keeping the process a live instead of restarting it will save you some process creations (1 every 15 seconds).

Install the images but may be it is better to replace the job by 1 program that does the job every 15 secondsn (not using qtp and quiz).

Is the wsdefault of that SBOLTON process sufficient ? If the program needs memory it will need to pagefault BEFORE it gets more memory. By that time, the task is finished.
(I had this problem with a show queue command that was very slow)

Wim

Jan van den Ende · ‎05-27-2004

Peter,

you did implement several suggestions, you wrote.
How IS the impact on performance so far?

Wim's last response did wake up another detail.
Your processes ARE running several images in sequence, right?
Then the difference between WSDEFAULT & WSQUOTA can come into play.
WSQUOTA is what you get allocated (at least) during an image run, WSDEFAULT is what you are guaranteed during DCL processing.
Ending an image, you get to DCL, your WS may get decreased, and upon next image activation, it has to increase again. All extra pagefaults...
Setting SYSGEN param PFRATL to 0 (zero) turns this off. It is a dynamic param, so with SYSGEN WRITE ACTIVE you can manipulate it without reboot.
(I just saw that nowadays the default IS 0, so maybe all this is not applicable, but many systems carry long-time inherited params, so, worth checking anyway)

Jan

Don't rust yours pelled jacker to fine doll missed aches.

Wim Van den Wyngaert · ‎05-27-2004

Jan,

Isn't it PFRATH that must be set to 0 ?
If the number of pagefaults per 100 ms is higher that this value (8) then you get wsinc (2400) pages extra. But this means after 0.1 second !!! And if you need e.g. 30.000 pages, you will need a whole second.

Wim

Wim

Ian Miller. · ‎05-27-2004

Your working set limit will be reduced to WSDEFAULT on image rundown. The working set limit is the maximum number of working set list entries that can be used i.e. the max number of process and global pages that the process can address. The actual number of pages in your working set is never more than this.

____________________
Purely Personal Opinion

John Eerenberg · ‎05-27-2004

Peter,

Thanks for posting your stats. I think I have a pretty good idea of your system bottlenecks. From what I see so far, DKA1 is the problem child (I don't know about CPU -- I presume it is okay).

> A few people have mentioned about the frequent image activations.....
When one sees a high hard page fault rate (as is your case) and has enough memory then, in general, image activations becomes suspect.

> What would cause this ??
The way the system is used. Anything from your sylogin.com to the .COM files associated with the application as well as process creation/deletion, etc. will cause this. It has to. That is by design. The question is, is performance acceptable? In your case it is not.

> Can it be improved??
Yes. It just depends on how (in)efficiently your application runs. I find in situations such as yours that, on most systems. it is not the application code itself, but rather the way in which .COM procedures run tasks. Several have mentioned this before.

> Is it the programming??
Of the .COM files? Could well be in your case. Is the application code itself? It could be, but problably not. Don't know for sure.

> Also as you will see am still getting high hard page faults any more ideas??
Looking at the disks on your "Av Disk I/O Queue Length" (which I call Qd) report, your goal should be an average of 0.50 or less. The exception to this would be if you understand the IO on that disk. For example, one of my database disks has a huge Qd. Normally I would work like a banshee to keep Qd below 0.50, but since I have a lot of knowledge of the application, I let it run with a high Qd. I don't even worry about it until it hits double digits. Again, I understand very well how that disk is used.

The question for you is "what is on DKA1?" You'll have to get familiar with how DKA1 is being used. Is DKA1 the system disk? Is so, a number of us have techniques for solving that problem; image activiation may not be the only problem for a system disk; many things are on the system disk. If DKA1 is not the system disk, then your course of action will be different.

Look at Hein's May 22, 2004 17:28:30 GMT post. This is where you want to focus some/most of your activity.

So, what's the deal with DKA1?

john

It is better to STQ then LDQ

Jan van den Ende · ‎05-27-2004

Wim,

the trick is not to speed up ws growth, but to switch of ws shrink.

Ian,
are you sure image rundown always shrinks ws back to WSDEFAULT? I had the impression that the shrinking is done by AWSA - Automatic Working Set Adjustment, under control of the various WSx and PFRx params?
IF you are right, then in an environment like Peter's it will be advisable to increase WSDEFAULT to WSQUOTA for the relevant accounts. (Somebody have the clarifing answer to this?? )

Jan

Don't rust yours pelled jacker to fine doll missed aches.

Martin P.J. Zinser · ‎05-27-2004

Hi,

if you really want to dig into what XFC is
doing with you memory, have a look at the
XFC SDA extension
$anal/sys
SDA>xfc
XFC>help

Still, installing frequently activated images
should make a major difference in the page fault rate.

Greetings, Martin

Ian Miller. · ‎05-27-2004

re working set list size reset on image exit - acording to The Book of Ruth (IDSM) the $RUNDWN system service resets the working set list size back to default. The contents of P0 are deleted. Remember WSDEFAULT is the nummber of entries in the working set list not the number of pages in the working set. Lots of image activation and subsequent rundown is best avoided. Installing images is a way of reducing the image activation overhead. It does not prevent the reduction of the working set list etc

____________________
Purely Personal Opinion

Lawrence Czlapinski · ‎08-27-2004

Peter,
As others have stated, you want to be able to measure user performance. Would be nice to hear an update on the progress you've made.
1. Make a change and observe for awhile whether it helps or not.
2. Since I had a lot of demand zero faults on my most heavily used system, I tried increasing SYSGEN parameter ZERO_LIST_HI which is dynamic. It "improves the performance of allocating such pages". It helped on my system. I would suspect this a low priority activity so would be done when the CPU is not otherwise being used.
3. I would MON PROC/TOPFAULT at a /DISP=x which allows you to note which processes are faulting a lot. You can also @WORKSET.COM to look for candidates for increasing WSDEF and WSQUOTA through SYSUAF. The idea is to give the processes the memory they need more quickly. Since you have lots of memory available, you may even test raising PQL_MWSQUOTA by 10% at a time in SYSGEN which is a dynamic parameter. I don't like raising PQL_MWSDEF because it requires a reboot to change. It would be nice if it didn't. You have a high global page fault rate which is a soft fault. So increasing the WSQUOTA won't necessarily mean the process will consume that much more memory since some of the pages are already in another process(es) memory. The process will just be able to get the pages faster and easier without needing to exceed PFRATH. Lowering PFRATH will increase working sets by WSINC faster but the processes (such as DECW$TE processes won't necessarily increase their working set page count as quickly as WSINC).
4. When you get time, attach a more recent MEM PAGE. Remember to let it run x min. ( a higher number of minutes usually gives a better indication) before your changes during a peak time period. Then restart after your changes and run about x minutes or more after your changes.
Lawrence

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: poor performance after memory upgrade