Re: Memory & swap space

KINGSLEY_1 · ‎09-28-2010

Hi Sirs,

We are running T24 banking application. At peak period the memory ultilization is 99% and the swap is 42%. What should i do to reduce the memory ultilization so that my database does not crashed.

Thank you very much.

Hakki Aydin Ucar · ‎09-28-2010

in the first place you have to check which OS you have:

# uname -a

if you have 11iv3, check this info;

# machinfo

in generally speaking, you need to extend memory (RAM) , reduce the number of non-critical application schedule non- busy hours if available.

KINGSLEY_1 · ‎09-28-2010

Sirs,

The OS is 11.23 and phisical memory is 16GB.

Thanks.

Kingsley

James R. Ferguson · ‎09-28-2010

Hi:

You don't offer any details. Do you get ENOMEM errors? What does 'dbc_max_pct' and 'dbc_min_pct' look like?

# kmtune -q dbc_max_pct
# kmtune -q dbc_min_pct

What does swap configuration and utilization look like?

# swapinfo -tam

Do you see page-out activity (indicating swap pressure)?

# vmstat ...

Do you have a database backend with its own buffer cache (e.g. Oracle)?

Regards!

...JRF...

sarfaraj ahmad · ‎09-28-2010

what is the buffer cache status now in the syetem and how much memory it is taking?

also check dbc_max_pct and dbc_min_pct kernel parameter in the system.

closely monitor through glance and check is there any unnecessary process is running on that time?

also let us know process or load has been increased on the server now?

KINGSLEY_1 · ‎09-28-2010

Dear Sirs,

The following are the readings.

Thank you.

Kingsley

dbc_max_pct 50 Default Immed

dbc_min_pct 5 Default Immed

#swapinfo -tam
Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 16384 2972 13412 18% 0 - 1 /dev/vg00/lvol2
reserve - 6322 -6322
memory 16363 4488 11875 27%
total 32747 13782 18965 42% - 0 -

#vmstat
procs memory page
faults cpu
r b w avm free re at pi po fr de sr in
sy cs us sy id
1 2 0 1174466 47810 124 2 10 10 1 0 84 5840
45456 1522 7 5 88

Database: oracle 10g

KINGSLEY_1 · ‎09-28-2010

Dear Sirs,

Below is the readings.

Thank you

kingsley

dbc_max_pct 50 Default Immed

dbc_min_pct 5 Default Immed

#swapinfo -tam
Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 16384 2972 13412 18% 0 - 1 /dev/vg00/lvol2
reserve - 6322 -6322
memory 16363 4488 11875 27%
total 32747 13782 18965 42% - 0 -

#vmstat
procs memory page
faults cpu
r b w avm free re at pi po fr de sr in
sy cs us sy id
1 2 0 1174466 47810 124 2 10 10 1 0 84 5840
45456 1522 7 5 88

Database: oracle 10g

Patrick Wallek · ‎09-28-2010

>>dev 16384 2972 13412 18% 0 - 1 /dev/vg00/lvol2

Since you are actually using some device swap space, I would add more RAM to your system. You are using almost 3 GB of device swap, so you need at least that much more RAM. For sufficient growth, if your system can handle it I would add 8 GB or 16 GB more RAM for a total of 24 or 32 GB.

Hein van den Heuvel · ‎09-28-2010

>> so that my database does not crashed.

Did it? What errors? ORA-600 ?

>> At peak period the memory ultilization is 99%

It is easy to say, 'just add more memory', and 'turn down the unified buffer cache'.
And maybe that is the solution.
But please before you go there take a step back, test and verify.
What is using the memory?
- process memory?
- shared memory? ( check: ipcs -m )
- ...

That said, here would be my WAG towards what _might_ be a component of the issue.

You mention Oracle database.
That is likely significant, please tell us more.
How many instances? How much memory is each allowed to take ? (sga_max_size/sga_target)
Do they really need all of that (check SGA Target Advisory in AWR report)
Do they all need to be there are the same time?

Maybe, just maybe, it is simply over committed memory: 16 GB, 3 Oracle instances of 5 GB don't fit.
Maybe reducing those footprints to 3 GB each and all it well.

Or maybe you had PROD, DEV and QA all set the same and DEV and QA can be reduced to 1/2 the size of PROD.

Or maybe shutting down one DB during a critical window is the way to reduce the need.

One needs to understand the way the system is used to understand the options.

If the DBA is not forthcoming with help, then you can start understanding this studying the ipcs -m output.

Hope this help some,
Good luck!
Hein van den Heuvel
HvdH Performance Consulting

KINGSLEY_1 · ‎09-28-2010

Dear sirs,

This are the details you asked for:

Physical memory: 16GB

instance: 1

instance size: 126GB

sga Maxsize: 6GB

SGA Target: 4GB

#ipcs -m
IPC status from /dev/kmem as of Tue Sep 28 15:32:16 2010
T ID KEY MODE OWNER GROUP
Shared Memory:
m 0 0x411c11ec --rw-rw-rw- root root
m 1 0x4e0c0002 --rw-rw-rw- root root
m 2 0x412016a9 --rw-rw-rw- root root
m 3 0x06347849 --rw-rw-rw- root root
m 262148 0x0c6629c9 --rw-r- root root
m 32773 0x4914685c -rw-r--r-- root root
m 4718598 0x7d0a116c -rw-rw oracle oinstall

Dennis Handly · ‎09-28-2010

>#swapinfo -tam
memory 16363 4488 11875 27%
total 32747 13782 18965 42%

I don't see that 99% memory usage? Was that during the peak?

KINGSLEY_1 · ‎09-28-2010

dear Sirs,

I see the 99% when i use glance.

Thanks

Hein van den Heuvel · ‎09-28-2010

> Physical memory: 16GB
> instance: 1
> instance size: 126GB
> sga Maxsize: 6GB
> SGA Target: 4GB

Ok, 6GB SGA for 16 GB system feels alright. So that's not directly the image.

We'll have to figure out where the rest of the memory went. With dbc_max_pct @50, it could be sitting in the buffer cache to the tune of 8GB, ready to help or or give back under pressure. With 11.31 the giving back works well. With 11.23 it is my understanding that it is less effecient, and thus you may want to reduce dbc_max_pct to say 10%.

#ipcs -m

Ooops, I should have asked ipcs -ma or -mb
That will give a size column

Anyway, I'm not yet convinced there even is a problem. So the memory is in high use, with low free. Great! The system is using the memory that was bought for it. Excellent.

What is the problem?
Performance issues? How do you tell? Tell us.
Instability (why is that due to low memory)?
Is there a high PI or PO rate during regular work?

Regards,
Hein

KINGSLEY_1 · ‎09-29-2010

Dear Sirs,

Our banking application hangs during peak hours. And the memory ultilisation rises to 99% when monitered with glance.

If i may ask , can i reduce " dbc_max_pct 50 Default Immed" to let us sy 15 or 20? And what will be the implication on the banking application and also the oracle database?

Thank you.

Kingsley

James R. Ferguson · ‎09-29-2010

Hi (again):

If i may ask , can i reduce " dbc_max_pct 50 Default Immed" to let us sy 15 or 20? And what will be the implication on the banking application and also the oracle database?

I think that would be a good move to make. Since Oracle has its own buffer cache, caching more of the same in the UNIX buffer cache is wasteful. You could monitor read-hit ratios with 'sar -b' before and after adjusting the 'dbc_max_pct' if you like. Empirical measurements will tell you the implication of your change in *your* environment.

Regards!

...JRF...

Hein van den Heuvel · ‎09-29-2010

>> Our banking application hangs during peak hours.

define 'hang'! We all have our own understanding/expectation for that.

I suspect that here it means that all end user applications became unresponsive.

But what it the system doing at that time besides using memory?

Why do you thing high memory use is relevant at all for the purpose of the hang? Please be specific!

- Is the CPU busy? top?
- Apparently you can use GLANCE.
- What else can you use? df? ls -l for Oracle files? df ?...
- What can you NOT use... (errors, or getting stuck).
- Are all Networks operational (ping, telnet, ftp, ssh, tnsping, remote non-application table oracle access)
- What does Oracle (Enterprise Manager) report when/while it hangs?
- Is Oracle responsive to local SQLplus queries at all? (select * from v$instance)
- Can local Oracle execute some banking query

Just the fact that you only mentioned 'hang' without specifics other then a potentially irrelevant high memory usage suggest to me that you may want to escalate this issue, and get more external and internal help.

You may want to prepare some answers to the questions above to get the best possible help. For now it is not clear whether yo shoudl engage an Oracle, Network, Storage or HPUX expert. Personally I would engage them in the order I list them. yes, that impleis that I think this is least likely to be an HPUX problem, allthough a good HPUX resource may help yo pinpoint the problem.

Good luck
Hein

KINGSLEY_1 · ‎09-29-2010

Dear Sirs,

One more hint. During peak hours, when the banking application slows, syslog.log reads

"Sep 28 16:12:25 t24dbpro inetd[1057]: chargen/tcp: accept: No buffer space available

Sep 28 16:12:25 t24dbpro inetd[1057]: discard/tcp: accept: No buffer space available

Sep 28 16:12:25 t24dbpro inetd[1057]: daytime/tcp: accept: No buffer space available

Sep 28 16:12:25 t24dbpro inetd[1057]: telnet/tcp: accept: No buffer space available

Sep 28 16:12:25 t24dbpro sshd[931]: error: accept: No buffer space available

Sep 28 16:12:25 t24dbpro inetd[1057]: ftp/tcp: accept: No buffer space available

Sep 28 16:12:26 t24dbpro inetd[1057]: swat/tcp: accept: No buffer space available

Sep 28 16:12:26 t24dbpro inetd[1057]: dtspc/tcp: accept: No buffer space available

Sep 28 16:12:26 t24dbpro inetd[1057]: printer/tcp: accept: No buffer space available

Sep 28 16:12:26 t24dbpro inetd[1057]: auth/tcp: accept: No buffer space available

Sep 28 16:12:26 t24dbpro inetd[1057]: exec/tcp: accept: No buffer space available

Sep 28 16:12:26 t24dbpro inetd[1057]: shell/tcp: accept: No buffer space available

Sep 28 16:12:26 t24dbpro inetd[1057]: login/tcp: accept: No buffer space available

Thank you

kingsley

Hein van den Heuvel · ‎09-29-2010

Hmm,

Does that perhaps also suggest that _some_ (new) end users hang, but others can keep working)?

This may be a TCP configrutation issues such as : tcp_conn_request_max

See for example:
http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=237561

In there Rick Jones (albeit in 2003) wrote
"Indeed, 99 times out of ten, this message has nothing to do with availability of memory. It means that by the time the server application got around to calling accept() on the listen socket, the remote client had given-up (for wahtever reason) and aborted the connection. The exact reason why an ENOBUF is returned in this case in HP-UX 11.X is starting to get lost in the mists of time"

also....

https://forums.sdn.sap.com/thread.jspa?threadID=1477203

http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=969702

good luck,
Hein

KINGSLEY_1 · ‎09-29-2010

Dear Sir,

tcp_conn_request_max=4096

Thank you

Kingsley

chris huys_4 · ‎09-29-2010

Hi,

The problem is memory pressure.

Log a case with HP support, get the kmeminfo utility, issue kmeminfo;kmeminfo -user;swapinfo -tam, during a "hang", and this info should give enough indication as to what causes the problem and how to resolve it.

Greetz,
Chris

Hein van den Heuvel · ‎09-29-2010

>>> The problem is memory pressure.

Great find Chris!
Please explain how you concluded that, such that we can all know next time.
What were the specific clues I missed in the data presented?

The main memory is measured in gigabytes.
The irpcd is complaining about a few kilobytes.

Somehow I think that this system was suddenly short a few kilobytes.
'normal' memory pressure would surely cause a somewhat gradual deterioration of overall service first, and would be visible in many more spots than just this IRPCD.

Best regards,
Hein

KINGSLEY_1 · ‎09-30-2010

Dear Sirs,

Thank you very much for the wonderful help. At this point i think have to lodge complaint with HP.

Best Regards

Kingsley

chris huys_4 · ‎09-30-2010

Hi,

> Great find Chris!
> Please explain how you concluded that,
> such that we can all know next time.
> What were the specific clues I missed in
> the data presented?

first hint.
Customer describes the application as "hanging".

K> Our banking application hangs during peak
K> hours
"hanging" equals in most customer
"speaking" to "slow application response time.

Second hint, memory utilisation shows at the time of the hang, via glance, a percentage of 99%.

K>And the memory ultilisation rises to 99% when monitered with glance.

Together the hints are strong enough, to check if there is some sort of memory bottleneck going on..

So check if diskspace was used as RAM..

> TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
> dev 16384 2972 13412 18% 0 - 1 /dev/vg00/lvol2

USED lvol2 device space is different from 0, so at one time, there was certainly a problem with not enough RAM available..

However a definitive indication that the system is paging is the vmstat output.

> #vmstat
> procs memory page
> faults cpu
> r b w avm free re at pi po fr de sr in
> sy cs us sy id
> 1 2 0 1174466 47810 124 2 10 10 1 0 84 5840
> 45456 1522 7 5 88
pi/po equals to 10 pages, free memory equals to 47810 pages or 186Mbyte.

With regards to how the scheduler behaves in low memory situations, at least on HP-UX 11.11 and Im assuming it remains the same on HP-UX 11.23 and only has changed with HP-UX 11.31, there are 4 variables who are important.

lotsfree

gpgslim <-- vhand starts paging "processpages" out to disk (device swapspace)

desfree

minfree <-- processes gets deactivated

lotsfree is the only "constant" value and is set at 1/64th part of physical ram. i.e. in this case 16GB * 1024 / 64 = 256Mbyte.

desfree value, is at least 4 times less then lotsfree, so at most 256Mbyte/4 = 64Mbyte,
minfree is still a lot lower then the desfree value..

gpgslim, varies between lotsfree and desfree, and as when vmstat was measured, pi/po was different of 0, gpgslim is probably set a bit higher then the 186Mbyte mark..

So was the system paging, pages out, from processes, at the time of the "hang", certainly, will that cause a sudden drop of application response time, if the "wrong" pages of the wrong "process" get paged out, and now must be serviced from disk instead of RAM, offcourse..

What would temporarly cause the problem not to occur at the same point in time as now, is setting the dbc_max_pct to lower then 1gb, i.e. dbc_max_pct = 6%. (as at the time of the "hang", swapinfo ,looks to indicate that the buffercache still was around 1.5Gbyte in size).

What will find out the rootcause of the drop in memory, is probably executing kmeminfo/kmeminfo -user/vmstat 1 60 every minute to see where the memory "goes" to just before the "hang" occurs.. i.e if it goes to kernel or to buffercache/user processes, if the latter then needs to be checked if its "normal increased operation" that caused the free memory to drop beneath gpgslim and then probably additional memory needs to be added to cope with it..

Greetz,
Chris
PS. Its a call, not a complaint. ;)

Hein van den Heuvel · ‎10-01-2010

Kinsgly>> Thank you very much for the wonderful help.

You are welcome. I hope our questions helped you define the problem better.

Can you do us a favor and update this topic with a conclusion or resolution at some point?
Much appreciated!

Chris >> "hanging" equals in most customer
"speaking" to "slow application response time.

Right, for some. For others it mean no response for no user. That's why I asked for clarification. Similar for the opening line: " so that my database does not crashed." What does that really mean? I think it means it has crashed, because typically one can not tell that a database will 'almost crash'. So that statement is indicative of more information being available, but not yet shared.

We'll see... hopefully. Hein

TTr · ‎10-01-2010

Not enough memory is definitely an issue but is the physical memory enough for this environment? I would like to see what happens to lvol2 when the memory is at 99%.

True, there is 2792 of used disk swap but at that same time the memory is at 27%. So at least at that time, whatever is on the swap disk appears to be idle. The question is what happens to lvol2 when the memory is at 99%. If memory is observed at 99% most likely it will go to 100% and heavy lvol2 i/o would indicate that.

As pointed out before, what is using the memory when it is at 27%? Are all processes needed or are there processes and services that can be stopped or removed?

The accept errors may be an indication of high user activity so increasing the tcp_conn_request_max would allow more user connections and more user processes which almost certainly would demand more memory.

Looking at the process mix and identifying the large memory using processes and cleaning up would be the first thing to do. If no cleanup is possible then adding more memory is unavoidable.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Memory & swap space

Memory & swap space