1748227 Members
4172 Online
108759 Solutions
New Discussion юеВ

Re: High load

 
Ivan Ferreira
Honored Contributor

Re: High load

Please post the output of:

collect -scpm -om -S -n 10


Collect this information for some time, compress and attach the file.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Vladimir Fabecic
Honored Contributor

Re: High load

[kernel idle] is a catch-all in the Tru64 UNIX kernel. It gets all of the internal overhead kernel threads, including things like sync-ing the disks, environmental monitoring, some aspects of disk I/O, some memory management overhead, and so on. Basically, it's a "catch all" for the things the kernel is doing on behalf of the system as a whole that can't be blamed on any specific user "job" or process.
So many things can cause your problem.
I/O would be my first guess.
So please send output of what Ivan said.
Did you reboot other machine?
In vino veritas, in VMS cluster
Srikanth Arunachalam
Trusted Contributor

Re: High load

Hi Consty,

Look at the "load profile" in the statspack,

(1) you have large hard parses (7.85 per second).

(2) The number of executes (187 per second) and transactions are also very large.

There is heavy load on the system. I will be thinking of increasing the shared pool size to give Oracle change to store more DML execution plan on the memory. If your shared pool is small, it has to device execution plan for your transactions and hence more time speant.

Look at the "Instance Efficiency Percentages (Target 100%)"

(1) I am not pleased with the Library Hit ratio of "95.96" (expect it to be more)

If the library Hit ratio was low, it could be indicative of a shared pool that is too small, or just as likely, that the system did not make correct use of bind variables in the application.

(2) The Soft Parse % is also very low (93.96), it is expected to nearly 100.

The Soft Parse % value is one of the most important (if not the only important) ratio in the database. For a typical OLTP system, it should be as near to 100% as possible

So, take a look at your application, make good use of bind variables and increase the shared pool size to larger value.

Let me know what is your physical memory and another statspack during heavy load and light load.

Thanks,
Srikanth
Srikanth Arunachalam
Trusted Contributor

Re: High load

Hi Consty,

More findings on your statspack. Refer "Top 5 Timed Events" section.

(1) The CPU Time is very large (1368/s)

CPU time is not really a wait event (hence, the new name), but rather the sum of the CPU used by this session, or the amount of CPU time used during the snapshot window. In a heavily loaded system, if the CPU time event is the biggest event, that could point to some CPU-intensive processing (for example, forcing the use of an index when a full scan should have been used), which could be the cause of the bottleneck.

(2) The "Db file sequential read" is also very large (2,334/s) and waits (138,824) is more on it.

Db file sequential read - This wait event will be generated while waiting for writes to TEMP space generally (direct loads, Parallel DML (PDML) such as parallel updates. You may tune the PGA AGGREGATE TARGET parameter to reduce waits on sequential reads.

(3)"Db file scattered read" -> waits of 138,824 and Time of 2,334/s.

This happens generally happens during a full scan of a table. You can use the Statspack report to help identify the query in question and fix it.

Thanks,
Srikanth
Consty
Frequent Advisor

Re: High load

Thanks so much to all of you,

Ivan, you'll find here attached the output
you asked for server A. I am going to send the one for server B in the next message.

Regards

Consty
Consty
Frequent Advisor

Re: High load

Message followed,
Error, the previous file was for server B (active node) , here is the one for server A.
Thanks
Regards

Consty
Consty
Frequent Advisor

Re: High load

Ivan,
This is the text version of the "collect" output, I think it's more simple that way.
Thanks and Regards
Consty
Consty
Frequent Advisor

Re: High load

Hi Vladimir,
Yes, the second machine was rebooted many times.
Regards
Consty
Ivan Ferreira
Honored Contributor

Re: High load

Obviouslly in nodeA ecallprog running under ngominf is taking all CPU, is this normal? What is doing this program?

And the nodeB, has too much CPU used in system time, this is not normal and I saw this behaviour when too much traffic is gone between the nodes via the interconnect or the systems is paging/swapping. In your case it seems that the system is not paging.

What is the output of drdmgr for all your data disks? Both nodes have direct I/O to the disks?

Is your application trying to access "cross" database information frecuently?
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Consty
Frequent Advisor

Re: High load

Hi Ivan,

I'll give you more information when I'll go back to the site, in the meantime, NodeB is the active node while node A is passive. Both servers are accesing the same MSA disk bay directly. You have seen the very problem, i.e NodeB has too much CPU used in system time.
That is what I wanted to describe in my original message "I am facing a problem of very slow response from system Tru64,4 CPU. The load is always high around 100%Cpu"
I do not know what the problem is, I am suspecting the installation the dba did one day before the problem occured.

Thanks
Consty