Operating System - HP-UX
1752643 Members
5871 Online
108788 Solutions
New Discussion юеВ

Re: High System or Kernel Memory Usage

 
SOLVED
Go to solution
Don Morris_1
Honored Contributor

Re: High System or Kernel Memory Usage

You've got 4 issues in the system space.

1) The default overhead of around 8.6% of physical memory plus filecache_min. This is a known cost in v3 unless you set base_pagesize higher (a setting of 16 would get around 6% of memory back right off the bat). Requires a reboot, should be done in a testing environment first to make sure you don't have applications that assumed the page size was always 4096 instead of using sysconf/getconf or other interfaces as they should.

2) Your file related caching layers are big users (spinlock, region, vx global, vx inodes). This implies that vx_ninode may be too high (man 5 vx_ninode). As mentioned in the man page, by default this is rather aggressive and can be reconfigured.

3) Probably related to (2), your Super Page Pool cache layer is significant. This signifies that large page kernel translations are built and then large parts are freed, but not all -- and new allocations may not be able to use what is freed for one reason or another. This is often aggravated by the (2) issue due to the caching in the file system / file system interaction [VM] layers. Fix (2) and (3) may go down (likely not away, but down is good).

4) You've got a lot of memory tied up in Async disk driver structures. This may be required for performance -- but it may be excessive (I can't really say since I don't know how the async disk driver client is configured). Consult your Oracle documentation regarding async disk driver configuration.
Michael Steele_2
Honored Contributor

Re: High System or Kernel Memory Usage

Hi Don:

Could I see the references and citations for all this?
Support Fatherhood - Stop Family Law
Don Morris_1
Honored Contributor

Re: High System or Kernel Memory Usage

Math/kwdb and looking at the data provided.

sizeof(pfd_t) on v3 - 200 bytes. Base pagesize reported 4096 bytes. (200 * 100) / 4096 = 4.8828125. Hence pfdat mandatory cost [being a per-page structure] is 4.88%. In practice, you pick up a little more fluff for the other levels of the table, so 5% is a good rule of thumb.

The VHPT tries to be about 1% of memory [sizeof(pte_t) / 4096 = 0.78%, but there is a little extra+ rounding].

The Overflow PTEs are another 1% or so [sizeof(ovfl_pte_t) / 4096] and this tries to be one per page plus extra for Alias translations and Memory Mapped I/O.

The system critical pool area works out to about 0.2% (I don't think that heuristic is documented, nor should it be).

The PFN2V area is around 0.4% (sizeof(pfn2v_entry_t) / 4096 = .39%).

So that's 5% + 1% + 1% + .4% + .2% or 7.6%.

There's a metadata cache to help the filecache that's about 1%, so that's 8.6%.

Of all that -- the PFDAT, PFN2V, VHPT and Overflow PTE sizes are all per-page, so raise the base pagesize, you have fewer pages -- the cost is less.

And filecache_min is a flat reservation as stated in `man 5 filecache_min`: "The amount of physical memory that is specified by the filecache_min tunable is reserved and guaranteed to be available for file caching." Hence kmeminfo will report it as used (since from the System perspective, that amount is -- even if the particular physical pages aren't chose yet).

(3) is a matter of the dump provided showing almost 1Gb in the Super page pool. Barring this being a ccNUMA based system with 17 localities and the submitter leaving that little fact out -- that's a lot of memory hanging around on a 16Gb system (kmeminfo gives the total SPP layer memory, but there may be multiple distinct caches in play based on the ccNUMA layout). Since memory only hangs around in the SPP layer when some of it is in use and Free memory is close to 0% (hence Garbage Collection should be in play) that strongly implies the arenas using memory are also preventing SPP coalescing [GC is pushing what it can to the SPP layer, but it isn't going further]. Since the VM white paper both predates the whole Super Page Pool implementation and I don't believe goes into any real detail on Kernel Dynamic Memory anyway -- there's nothing external to cite here.

(2) is granted a matter of experience and knowing that all the top arenas other than Async are File System/File Caching related and are all cached above the Arena layer when inodes are cached. Hence if you reduce inode caching, you reduce the caching of these other objects. Reduce the caching of the other objects, the memory gets freed to Arena -- the GC can find it... and hence (3) can clear up as well. The reference to it being aggressive by default is in the man page as I mentioned.

Alternately, you could start from http://docs.hp.com/en/7779/commonMisconfig.pdf and then realize that in v3 there are several additional structures in VM (VAS, pregion, region plus UFC-specifc stuff) -- making the section on the VxFS inode tuning more important.

You can ask me how I know, I suppose -- but since the answer is "I read the UFC design and implementation and have had to triage v3 for years now" I don't see how it does much good or is anything beyond an argument from authority at heart.


(4) is purely a matter of that's the arena for the async driver and hence checking the configuration of said driver would be the only way that memory load could be reduced. That load may be appropriate and required for Oracle performance, of course -- hence the recommendation to check said documentation on what Oracle seeks to do with this driver.
Michael Steele_2
Honored Contributor

Re: High System or Kernel Memory Usage

Don:

If this isn't certified by the manufacturer, then you are putting the box into an unknown, uncertified by the manufacturer state.

Why should anybody want to put their companies box, a box often relied upon by thousands of users, a box often responsible for a million dollar a day payroll, INTO AN UNKNOWN STATE????
Support Fatherhood - Stop Family Law
Don Morris_1
Honored Contributor

Re: High System or Kernel Memory Usage

I'm sorry -- I'm rather missing your point here.

You asked me to cite my reasoning for the statements I made that "This is likely why your kernel is using memory in this way, this is what you'd want to investigate doing".

I *specifically* said that base_pagesize (which is a documented tunable from HP, mind you) should be validated in a non-production environment as application issues may arise.

I also cited official HP documentation that vx_ninode (the man page, the white paper) is aggressive. Both give HP's recommendations for the tunable -- if a customer wants to hold to Oracle's recommendations instead, that's their business. I'm simply stating that inode caching can cause this sort of kernel memory caching. I said nothing about what to set it to beyond the documents in question.

Same with async -- I don't see how you can construe "consult your documentation regarding this configuration" as "twiddle this knob and pray for the best".
Michael Steele_2
Honored Contributor

Re: High System or Kernel Memory Usage

Don:

You do see that there's an oracle database on this box. So beside putting the O/S into an unknown state, you also want to use uncertified by oracle kernel parameters.

Are you even aware that oracle provides kernel parameters settings for the HP-UX O/S, as this is what they have tested and certified and recommend for working best with their products?

Do you even have a HP server that you administrate? Or is there some virtual box out their that exists on paper and got built based upon assorted university text books and internet whitepapers?

Let's take another example, SAP. Ever worked with an SAP server before or a team of basis administrator's?

Support Fatherhood - Stop Family Law
kenj_2
Advisor

Re: High System or Kernel Memory Usage

Following up on Don's comment about the asyncdsk arena looking high in the kernel. In fact there is a known problem with this issue on Oracle 11g. The problem is documented in Oracle bug 8965438. A fix is planned by Oracle, but is not available at the present time.

On Oracle 10g, when Oracle configures each asyncdsk port for a process, it sets the max_concurrent value for most of the asyncdsk ports to 128. This max_concurrent value limits the number of parallel I/Os to a given asyncdsk port to 128. The asyncdsk driver then allocates a buffer header for each of the potential 128 I/Os. Each buffer header is 896 bytes, resulting in approximately 128*896 bytes, or 112 Kb per asyncdsk port. Typically, each Oracle process (shadow processes, dbwriters, logwriter, etc) will have one asyncdsk port open. So if there are 1000 processes, then the memory used by the asyncdsk driver is ~110 MB.

On Oracle 11g, Oracle uses a max_concurrent value of 4096, which results in 4096*896 bytes or 3.5 MB per asyncdsk port. So if there are 1000 Oracle processes, the asyncdsk driver can consume ~3.5 GB of memory. Also, due to the large and odd sized kernel memory allocation, the kernel's Super Page Pool becomes fragmented and also consumes large amounts of memory.

Ken Johnson
HP
Michael Steele_2
Honored Contributor

Re: High System or Kernel Memory Usage

And given what HP support is today, citations and references and web links please, else, ....
Support Fatherhood - Stop Family Law
RahulS
Occasional Advisor

Re: High System or Kernel Memory Usage

For time being I have added extra 16GB RAM on each of the DB nodes, to augment the high memory usage.

Thanks Michael, Dennis, Emil, Wayne, Don for your valuable suggestions.
Horia Chirculescu
Honored Contributor

Re: High System or Kernel Memory Usage

And your problems vanished only by adding extra memory?

Horia.
Best regards from Romania,
Horia.