Operating System - Linux
1752590 Members
3442 Online
108788 Solutions
New Discussion

Re: RHEL OS cache memory behavior

 
Shelendra Agarwal
Occasional Contributor

RHEL OS cache memory behavior

Hi All,

We are facing a strange issue in one of the customer production system. OS is RHEL 5.5. Application is written in c++. When Application starts it connects to Oracle DB and caches some data (few thousand rows) in to process memory. Usually this caching takes few minutes. Until the caching is completed the application doesn’t starts processing data.

2-3 times in last few months customer reported that application startup is taking huge time: close to few hours. While investigating it was found that the caching process is slow and due to that startup is delayed. Investigation was performed using GDB tracing as well as application debug log.

To immediate overcome this issue server restart was recommended and after server restart application started normally (within few minutes caching is completed and data is started processing).

Note that application is not demanding bulk memory at one go, it caches one DB row and when it goes for the next row (using a cursor) it demands the memory that would be required to cache that one row only.

There is a theory that might explain this behavior and I would like to get expert opinion if this theory sounds good.

Server has 128 GB of RAM and usually application consume around 30-40 GB memory. Initially we see huge free memory and very minimum OS cached memory (in top). Over a period of time OS cached memory increases and after some weeks most of the memory is found in OS cache.

At this moment when we try to restart the application it takes huge time giving an impression that application is hanging.

The theory is that when application is demanding the memory in order to cache the DB data at process memory it is requesting the same from the OS cache but de-allocation from OS cache and allocation to application is slow, this is resulting is delay. Does this sound possible cause?

Thanks & Regards,

Shelendra Agarwal
Business Solutions/CMS
RTBSS-CentralView
HP-India
+91-9945056319
shelendra.agarwal@hp.com
7 REPLIES 7
Dennis Handly
Acclaimed Contributor

Re: RHEL OS cache memory behavior

>When application starts it connects to Oracle DB and caches some data in to process memory.

 

Caches how?  Using C++ new or malloc?  Or using Oracle tricks?

If the former, why not read the DB data and just toss it.  Then see how long that takes.

If that takes a long time, then it's an Oracle DB issue.

Shelendra Agarwal
Occasional Contributor

Re: RHEL OS cache memory behavior

Hi Denis,

We are using malloc not any of the Oracle feature.

 

By the way I attached the sysctl with this thread. May be it can give some hints.

Thanks & Regards,

Shelendra Agarwal
Business Solutions/CMS
RTBSS-CentralView
HP-India
+91-9945056319
shelendra.agarwal@hp.com
Jimmy Vance
HPE Pro

Re: RHEL OS cache memory behavior

Dennis Handly
Acclaimed Contributor

Re: RHEL OS cache memory behavior

>We are using malloc

 

Have you tried the experiment with fetching from your DB and then throwing the data away and see how long that takes?

 

But it looks like your theory about the OS cache is correct.

TwoProc
Honored Contributor

Re: RHEL OS cache memory behavior

In general, malloc() can/will leave you with your memory very cut and sliced up. In these cases, when a programmer makes many thousands of calls to malloc() one should consider using/creating a memory mgmt library to coalesce these disparate memory segments into fewer/larger segments. This means writing/finding a library that would pre-allocate larger chunks (say roughly the size of a thousand single-line mallocs) - and then deliver pointers from that pool as pointers to your individual lines. This isn't that big/hard a concept, and can change what you're dealing with dramatically. I've had to implement these myself in bygone years. 

 

The problem centers around the fact, that as you cut up/add/release memory - it becomes harder and harder for your OS to find a continguous segment large enough to hold your data. So, it must search for a spot to land your malloc(). When you restart your app, or bounce the server - you've freed up the main memory into larger more contiguous pieces, and it's easier for the OS to find you a spot for your malloc(). 

 

Lastly, consider NOT pulling data from Oracle a single line at a time via cursor - this is very slow compared to pulling all rows, subsets of rows, etc. Takes some tuning, but you could easily see  a thousand fold increase vs line-at-a-time for program loads, depending on lots of parameters, etc. of course.

 

TL/DR; Just use larger chunk sizes for your malloc() operations, and allocate space within these pre-allocated chunks yourself to your pointer list via your own calls (e.g. make a library called "mymalloc()" that you can use to generate memory for your lines of returned data). 

 

 

We are the people our parents warned us about --Jimmy Buffett
Dennis Handly
Acclaimed Contributor

Re: RHEL OS cache memory behavior

>malloc can/will leave you with your memory very cut and sliced up.

 

Yes, heap fragmentation.  Which unfortunately quacks like a memory leak.  :-(

 

>it becomes harder and harder for your OS to find a continguous segment large enough to hold your data.

 

This is typically done by libc and not the OS.  Unless you are using mmap(2) for each request.

 

 

TwoProc
Honored Contributor

Re: RHEL OS cache memory behavior

I find it more likely that the programmer is accepting the data into data structures that he/she created - one row at a time.  Getting data from a cursor generally does not work in the way you state - in fact, it will grab the number of rows and store them with the number of rows in the array_size parameter. And, I know of practically no one in the world that sets it to one.

 

Anyways, there is a feature called the "result cache" that would more than likely resolve your problems - have a look at the following paper to work with the programmer and tune the code to more efficiently use the cache area, especially for repeated queries:

 

https://docs.oracle.com/database/121/TGDBA/tune_result_cache.htm#TGDBA626

 

 

We are the people our parents warned us about --Jimmy Buffett