Operating System - Linux
1752781 Members
6433 Online
108789 Solutions
New Discussion юеВ

Re: Mem Huge Pages (vm.nr_hugepages ) and Oracle

 
SOLVED
Go to solution
Alzhy
Honored Contributor

Mem Huge Pages (vm.nr_hugepages ) and Oracle

I know it has been recommended best practice to peruse of huge pages as it will mean Oracle will be dealign with less number of memory pages - hence less work and efficiencies on memory transactions (TLBs).. Specially on LARGE memory and SGA environments.


What is the implication if it is not enabled? Are there measurable differences considering that current X86 servers have very fast memory subsystems and even shorter paths due to on-chip integration of memory controllers?

More importatnly -- will it case a HANG on the system if not enabled?

One of the areas being looked at with our problems upgrading to 11G is to enable huge mem pages. We've been on 10G for close to a year, no huge mem pages enabled and everythin was humming along just fine. Our woes only started with 11G - system hanging for this one big DB with 45GB of SGA on a 24-core 128GB RAM system.

I agree implementing this and already have my vm.nr_hugepages numbers and limits.conf settings but I just want to find out more about huge pages.
Hakuna Matata.
9 REPLIES 9
dirk dierickx
Honored Contributor
Solution

Re: Mem Huge Pages (vm.nr_hugepages ) and Oracle

it will not hang the DB when not in use, however using huge pages will improve the speed of the database.

even though the memory architecture/bus might be fast, when using the traditional way of managing pages, it just takes a lot of resources from the kernel to keep track of all these pages. with hugepage, it is possible to only have one page instead of ten thousand, for example - thus reducing overhead.
Reiner  Rottmann
Frequent Advisor

Re: Mem Huge Pages (vm.nr_hugepages ) and Oracle

Simply do the math:

1 pagetable (1K) may address 64K of memory.
To address 45GB memory, you will need around 800MB of ram.

By using hugepages (4K each), you may reduce this to ca. 90MB as one hugepage may address 2MB (default for x86_64).

Without hugepages you may be faced with seemingly strange swapping behaviour with more and more users connecting to your system.

You will see that kswapd is occupying the cpu trying to free up space by attempting to
swapout memory only to swap them in seconds later.

Soon you will have difficulties connecting to the box and eventually the box may crash.

This is because the processes that communicate with your database will **each** need lots of memory to virtually address the SGA and PGA areas where they want to retrieve their data from.

In the worst case: number of processes connected to the database times the memory needed for the page tables of the whole database areas.

Moreover the size of the SGA and PGA is often sized without calculating the needed space for addressing the db areas. So things are even worse on boxes without huge pages!
Alzhy
Honored Contributor

Re: Mem Huge Pages (vm.nr_hugepages ) and Oracle

Looks Like HugeMem Pages is a MUST and it seemed to have vastly improved things over at my shop. We also tweaked swappiness to 10 and adjusted vm.dirty params as follows:

vm.nr_hugepages=21570 (for a 45GB SGA!)

vm.swappiness=10
vm.dirty_background_ratio=3
vm.dirty_ratio=15
vm.dirty_expire_centisecs=500
vm.dirty_writeback_centisecs=100

After the tweak, GlancePlus tool now reports oracle process memory stats "properly" -- no blaot in RSS memory.

Hakuna Matata.
Reiner  Rottmann
Frequent Advisor

Re: Mem Huge Pages (vm.nr_hugepages ) and Oracle

As hugepages are not swappable, we did not tune the swapping parameters very much.

We focused on other pitfalls like a NUMA bug of kswapd that struck us. (Our CPUs have NUMA architecture and by default, the mem is partitioned for dedicated use with specific cpu cores. If one mem zone is under pressure, kswapd ridiculously tries to free up mem in zones that have plenty. As a workaround, we have deactivated it with numa=off kernel parameter)
TwoProc
Honored Contributor

Re: Mem Huge Pages (vm.nr_hugepages ) and Oracle

Good post Rottman,

however numa=off is recommended for 10g, but supposedly, is a great enhancement for 11g.

So, with that in mind, did you find it was best to turn off the numa setting even for Oracle 11g as well?
We are the people our parents warned us about --Jimmy Buffett
Reiner  Rottmann
Frequent Advisor

Re: Mem Huge Pages (vm.nr_hugepages ) and Oracle

Setting NUMA off, will reduce the performance. But as we are using quite powerful boxes, it was no loss of any significance.

We did not measure it, though.

(Approximately the loss is around 10 percent)

However we sleep a lot better now with this workaround in place...
V. Nyga
Honored Contributor

Re: Mem Huge Pages (vm.nr_hugepages ) and Oracle

Sorry for being off-topic.

TwoProc - may I invite you to this thread:
http://h30499.www3.hp.com/t5/General/Congrats-to-Pharaoh-TwoProc/m-p/5264266#M178233


I'd like to close it ;-)

V.

*** Say 'Thanks' with Kudos ***
Alzhy
Honored Contributor

Re: Mem Huge Pages (vm.nr_hugepages ) and Oracle

We turned NUMA off on HP-UX Superdomes as it was resulting in only a few CPUs to be used.

We're running Oracle VLDBs on 4 socket Dunningtons (not Nehalems) -- am not sure if a Dunnington based system is considered a NUMA system -- is it? I thought I saw a post for an IBM x3950 M2 with 4 nodes ( 16 sockets) but I am not sure if such a system can be called a NUMA system.
Hakuna Matata.
TwoProc
Honored Contributor

Re: Mem Huge Pages (vm.nr_hugepages ) and Oracle

Most of the IBM systems are CUMA, not NUMA. Though I'm not sure of that exact model you mentioned. In NUMA systems, you run the data exchanges on the same board as fast as you can, while distant data exchanges off to other boards do/must run slower (bus interconnect limitations). On CUMA systems, everything runs at the same speed local data exchanges, interconnects, distant data exchanges.

Both have benefits and drawbacks, and both use lots of buffering to dampen the issues.

Personally, I prefer systems that don't use multiple cell boards, but that doesn't even exist hardly any more, except in smaller systems. But hey, that's just my uninformed opinion.

Supposedly, if you turn off numa, then your data will be striped across all boards and memory echelons evenly. Meaning that going for data locally will be handled (by Oracle) the same as going for data on a remote board. So in general, if you've got 4 cell boards, then 25% of the time, you'll have really fast access to memory. If you leave NUMA on, then Oracle will try to use memory local to a CPU when creating/reusing segments for access.

Here's where it falls apart for me, and I'll give a simplified example: Example server with say, 100Gb of ram in it, on 4 cell boards. So, 25 GB of ram per board. If I bring up an Oracle DB of 75GB, it's spread across 3 boards already, and much of what I'm doing can't benefit from Oracle NUMA assignment optimization rules anyways.

That leaves thinking that ~possibly~ the only way to benefit here would be to have each Oracle DB instance smaller than a board, and then let Oracle NUMA optimizaton make ram and cpu task be in the same cell boards. The problem is, massive Oracle databases don't tend to be sizable on a single board... so I don't see the gain, and I'm left paying the price of NUMA across the whole Oracle area. UNLESS!!! using smaller shared memory segment sizes somehow convinces Oracle to better align individual segments with CPUs, in which case we might get something out of it.

Sadly Alzhy, this all means that for 11G and Oracle numa settings, all I currently have, until I can test, is wonderment of how to optimally configure ... :-(
We are the people our parents warned us about --Jimmy Buffett