1748181 Members
3416 Online
108759 Solutions
New Discussion юеВ

Memory & swap space

 
SOLVED
Go to solution
Hein van den Heuvel
Honored Contributor

Re: Memory & swap space

>>> The problem is memory pressure.

Great find Chris!
Please explain how you concluded that, such that we can all know next time.
What were the specific clues I missed in the data presented?

The main memory is measured in gigabytes.
The irpcd is complaining about a few kilobytes.

Somehow I think that this system was suddenly short a few kilobytes.
'normal' memory pressure would surely cause a somewhat gradual deterioration of overall service first, and would be visible in many more spots than just this IRPCD.

Best regards,
Hein
KINGSLEY_1
Regular Advisor

Re: Memory & swap space

Dear Sirs,

Thank you very much for the wonderful help. At this point i think have to lodge complaint with HP.

Best Regards

Kingsley
chris huys_4
Honored Contributor

Re: Memory & swap space

Hi,

> Great find Chris!
> Please explain how you concluded that,
> such that we can all know next time.
> What were the specific clues I missed in
> the data presented?

first hint.
Customer describes the application as "hanging".

K> Our banking application hangs during peak
K> hours
"hanging" equals in most customer
"speaking" to "slow application response time.

Second hint, memory utilisation shows at the time of the hang, via glance, a percentage of 99%.

K>And the memory ultilisation rises to 99% when monitered with glance.

Together the hints are strong enough, to check if there is some sort of memory bottleneck going on..

So check if diskspace was used as RAM..

> TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
> dev 16384 2972 13412 18% 0 - 1 /dev/vg00/lvol2

USED lvol2 device space is different from 0, so at one time, there was certainly a problem with not enough RAM available..

However a definitive indication that the system is paging is the vmstat output.

> #vmstat
> procs memory page
> faults cpu
> r b w avm free re at pi po fr de sr in
> sy cs us sy id
> 1 2 0 1174466 47810 124 2 10 10 1 0 84 5840
> 45456 1522 7 5 88
pi/po equals to 10 pages, free memory equals to 47810 pages or 186Mbyte.

With regards to how the scheduler behaves in low memory situations, at least on HP-UX 11.11 and Im assuming it remains the same on HP-UX 11.23 and only has changed with HP-UX 11.31, there are 4 variables who are important.

lotsfree

gpgslim <-- vhand starts paging "processpages" out to disk (device swapspace)

desfree

minfree <-- processes gets deactivated

lotsfree is the only "constant" value and is set at 1/64th part of physical ram. i.e. in this case 16GB * 1024 / 64 = 256Mbyte.

desfree value, is at least 4 times less then lotsfree, so at most 256Mbyte/4 = 64Mbyte,
minfree is still a lot lower then the desfree value..

gpgslim, varies between lotsfree and desfree, and as when vmstat was measured, pi/po was different of 0, gpgslim is probably set a bit higher then the 186Mbyte mark..

So was the system paging, pages out, from processes, at the time of the "hang", certainly, will that cause a sudden drop of application response time, if the "wrong" pages of the wrong "process" get paged out, and now must be serviced from disk instead of RAM, offcourse..

What would temporarly cause the problem not to occur at the same point in time as now, is setting the dbc_max_pct to lower then 1gb, i.e. dbc_max_pct = 6%. (as at the time of the "hang", swapinfo ,looks to indicate that the buffercache still was around 1.5Gbyte in size).

What will find out the rootcause of the drop in memory, is probably executing kmeminfo/kmeminfo -user/vmstat 1 60 every minute to see where the memory "goes" to just before the "hang" occurs.. i.e if it goes to kernel or to buffercache/user processes, if the latter then needs to be checked if its "normal increased operation" that caused the free memory to drop beneath gpgslim and then probably additional memory needs to be added to cope with it..

Greetz,
Chris
PS. Its a call, not a complaint. ;)
Hein van den Heuvel
Honored Contributor

Re: Memory & swap space

Kinsgly>> Thank you very much for the wonderful help.


You are welcome. I hope our questions helped you define the problem better.

Can you do us a favor and update this topic with a conclusion or resolution at some point?
Much appreciated!

Chris >> "hanging" equals in most customer
"speaking" to "slow application response time.

Right, for some. For others it mean no response for no user. That's why I asked for clarification. Similar for the opening line: " so that my database does not crashed." What does that really mean? I think it means it has crashed, because typically one can not tell that a database will 'almost crash'. So that statement is indicative of more information being available, but not yet shared.

We'll see... hopefully. Hein

TTr
Honored Contributor

Re: Memory & swap space

Not enough memory is definitely an issue but is the physical memory enough for this environment? I would like to see what happens to lvol2 when the memory is at 99%.

True, there is 2792 of used disk swap but at that same time the memory is at 27%. So at least at that time, whatever is on the swap disk appears to be idle. The question is what happens to lvol2 when the memory is at 99%. If memory is observed at 99% most likely it will go to 100% and heavy lvol2 i/o would indicate that.

As pointed out before, what is using the memory when it is at 27%? Are all processes needed or are there processes and services that can be stopped or removed?

The accept errors may be an indication of high user activity so increasing the tcp_conn_request_max would allow more user connections and more user processes which almost certainly would demand more memory.

Looking at the process mix and identifying the large memory using processes and cleaning up would be the first thing to do. If no cleanup is possible then adding more memory is unavoidable.
KINGSLEY_1
Regular Advisor

Re: Memory & swap space

Dear Sirs,

I am at a loss now. I really don't know what to do. Do I reduce 'dbc_max_pct' which stands at 50 to a smaller value? If yes, what value, to be precised? What about 'dbc_min_pct' which stands at 5.

Thank you.

Kingsley
Hein van den Heuvel
Honored Contributor

Re: Memory & swap space


>>> I am at a loss now. I really don't know what to do. Do I reduce 'dbc_max_pct' which stands at 50 to a smaller value? If yes, what value, to be precised? What about 'dbc_min_pct' which stands at 5.

Yes, the consensus so far is to reduce that significantly. You don't have to get it right, just smaller. All along you should be evaluating the effect. Specifically, reducing the max buffer cache _could_ result in higher physical IO rates and slower response. But if you main application is truly Oracle, then the buffer cache is not too important.

As a first step I would suggest dbc_max_pct = 10 and dbc_min_pct = 3.
It is a dynamic param.
So just kctune it and observe (glance)!
No need to bounce the system or Oracle to witness effects (or lack thereof).

As optional next step you can tweak up (20/5?) or down (5/2?) based on observed behavior.


This feels like a good knob to turn, but really I have seen convincing evidence either way.

- need futher description of the deteriorating/hang behavior.
- need more vmstat, not just 1 long term average line, but a minutes when the system is not behaving.

But really, watching the lack of progress in this thread, you are encouraged to escalate the issue, internally, or externally (HP, Consultants) as appropriate.
Myself, I would start with taking a close look at how Oracle is behaving, what it is trying to tell you, but others prefer to start at the HPUX angle.




btw... when I mentioned "tcp_conn_request_max" is was just one of the params to check out. By no means did I imply that to be the actual issue.
TTr
Honored Contributor

Re: Memory & swap space

I would like to see a "ps -ef" listing to see what is running on this server. Preferably during the times that the memory is at 99%.
I concur on reducing dbc_mac_pct. With 16GB or more server memory, I would set it to 4% and dbc_min_pct to 2%.
chris huys_4
Honored Contributor

Re: Memory & swap space

Hi,

> #swapinfo -tam
> Mb Mb Mb PCT START/ Mb
> TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
> dev 16384 2972 13412 18% 0 -
> 1 /dev/vg00/lvol2
> reserve - 6322 -6322
> memory 16363 4488 11875 27%
> total 32747 13782 18965 42% - 0 -

Just to get everyone on the same page.
The "memory" line in swapinfo, has nothing to do with physical ram. It only describes the amount of pseudoswap that the system has.
Available pseudoswap, 16363 MB, used pseudoswap 4488 MB, still free "to allocate" pseudoswap 11875 MB or in percentage, used pseudoswap, is only 27% of total available pseudoswap.

The 27% of used pseudoswap do not in anyway tell how much physical ram was in use at the moment that the swapinfo -tam was issued.

The reason so much pseudoswap is still left on the system at the moment this swapinfo -tam was taken, is because all userprocesses have only about 9.244Gbyte of swapspace reserved/allocated, the 6322Mbyte of USED reserve increased with the 2922 Mbyte of USED dev, together make up for 9244Mbyte of swapspace, which is still a lot less then the total amount of device swap that the system has defined, i.e. the 16Gbyte of vg00/lvol2 device swapspace, and as device swap in normal circumstances gets "taken up" before pseudoswap, the system has not only still 6Gbyte of device swapspace left, but also all of the available pseudoswapspace..

There is btw still more to derive from the above output especially from the "memory aka pseudoswap" line.
Because this system has ample device swap defined, the "used pseudoswap" space equals to the sum of "memory taken by the kernel" + "memory taken through part of the buffercache defined by min_dbc_pct", i.e. as min_dbc_pct was apparantly 5%, and 5% of 16Gbyte of physical ram is around 800Mbyte, the amount of memory taken by the kernel, is 4488 Mbyte - 800 Mbyte = 3688 Mbyte.

There can be still more derived from the "memory" line, but that part Im not to sure about, anyway Im guessing that at the moment, that the swapinfo -tam was taken, the system had close to 0% physical ram left..

Greetz,
Chris
PS. For kingsley, could it be that the system was only "hung" for like 10 or 20 minutes and then resumed back normal operation, or was there anything needed to be done on the application side, to get the "hang" resolved ?

PPS. kctune dbc_min_pct=1;kctune dbc_max_pct=5 .. should be ok.. and try to get kmeminfo from hp support...

PPPS. changing dbc_min_pct/dbc_max_pct can be done online on HP-UX 11.23
ux2 # kctune dbc_min_pct=1
WARNING: The automatic 'backup' configuration currently contains the
configuration that was in use before the last reboot of this
system.
==> Do you wish to update it to contain the current configuration
before making the requested change? y
* The automatic 'backup' configuration has been updated.
* The requested changes have been applied to the currently
running system.
Tunable Value Expression Changes
dbc_min_pct (before) 5 Default Immed
(now) 1 1
KINGSLEY_1
Regular Advisor

Re: Memory & swap space

Dear Sirs,

I have kctune at 1,5 on my test environment and the memory usage has dropped from 93% to 74%.

I am planning to apply it on the live environment later today will get back to you on the findings.

Meanwhile I thank you very much for your dedication, help, contribution etc.

Best Regards.

Kingsley