Perf T'shooting - Large Linux DB Farm

Alzhy · ‎04-21-2011

Large Oracle DB Farm (Support/DSS Environments). 256GB of RAM, 48-way X86_64, RHEL 5.6, hosting about 25 DB instances (various SGA sizes), ASMLib/ASM storage layout on high End Array, 10 FC Channels (effective separation of I/O channels assumed). Whenever our clients start their test activities with onlyhalf odf the DBs engaged, the system crawls - not much I/O issue perceived, ample memory, ample swap -- but system practically crawls. mu

Meminfo:

# cat /proc/meminfo
MemTotal: 263637940 kB
MemFree: 588616 kB
Buffers: 3544 kB
Cached: 136354500 kB
SwapCached: 244740 kB
Active: 151189624 kB
Inactive: 148124 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 263637940 kB
LowFree: 588616 kB
SwapTotal: 176125268 kB
SwapFree: 22138520 kB
Dirty: 24 kB
Writeback: 0 kB
AnonPages: 14876964 kB
Mapped: 136125920 kB
Slab: 2229836 kB
PageTables: 23221264 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 265448236 kB
Committed_AS: 407476480 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 552160 kB
VmallocChunk: 34359185507 kB
HugePages_Total: 41500
HugePages_Free: 6840
HugePages_Rsvd: 6758
Hugepagesize: 2048 kB

VMSTAT:

# vmstat 5 10
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
111 9 154000768 577728 4520 136483232 9 8 8063 201 1 0 10 12 65 13 0
134 8 153996016 575964 4740 136496080 5480 600 10630 1730 3646 47206 8 92 0 0 0
140 9 153996224 578332 4408 136497104 1826 388 4972 1207 1240 14632 7 93 0 0 0
159 11 153998688 583756 4200 136492864 2198 370 8512 1293 1180 16805 6 94 0 0 0
129 7 153998752 580988 4344 136496192 2880 173 8716 1186 1243 15501 5 95 0 0 0
135 15 153999168 583804 4260 136492928 1406 107 4415 1728 1096 15180 5 95 0 0 0
177 6 154002592 583172 4424 136493712 3333 229 16154 2335 2827 28490 5 94 0 0 0
209 11 154005152 578976 4380 136494016 1894 94 7613 6044 1569 16472 7 93 0 0 0
154 6 154008976 577332 4376 136491328 2873 327 14338 8224 1741 18215 6 94 0 0 0
212 7 154010400 579120 4240 136493280 2266 215 4201 960 1208 16502 6 94 0 0 0

Our single Instance mega servers are doing just fine. Are there any recipes for scaling large linux servers to better handle these kind of workloads?

Or with these vast number of DBs - it is not just possible.

I am thinking of partitioning this server into perhaps 4 or 5 KVM or ESXi virtual machines to address this issue as it is likely RHEL is not meant to handle so large of interrupts and multipathed sessions (there are over 1400 multipath'ed devices on this server).

Hakuna Matata.

Alzhy · ‎04-21-2011

Some more observations:

swap around 180GB allocated is 80% used.
kswapd0/01 are both cpu active

Total SGA SHM segments allocated for the 25 Oracle DB instances ~ 413GB

Hugemem is enabled per the above.
Hugmem Alloc is 41500x2MB ~ 82GB

COuld we be we have massive memory trashing?

How does the SGA fit in memory if HugePage Alloc is only at 80GB? Is this the reason swap is active? Should I increase my HugePages again to perhaps 90% of my 256GB of RAM?

Hakuna Matata.

dirk dierickx · ‎04-22-2011

attached is a document about oracle tuning, check it out.

TwoProc · ‎04-22-2011

1) when you look at "ipcs -am" how many segments is Oracle using per instance? Keep in mind that you want this number below 7. 1 is better.

2) What is statspack telling you that you are spending time on during this period?

3) in the above output (vmstat) - notice you're spending most of your cpu time on executing kernel code (92-94%). This tells you're pretty busy. Swapping maybe. Memory accessing maybe, or both.

4) look at the vmstat system segment: your number of context switches is pretty high, in one case it was 47k. High context switch values means you're just thrashing your cpu, and not getting much real work done. My guess if you watch this when it climbs high, it tells you when your system is about to slow.

5) Did you notice that when your max number of context switches were up, you also had a high number of blocks being read in? Oracle I/O requests.

Suggestion: Pay attention to that swappiness value as suggested by another posted answer, regarding all of the touchpoints on tuning Oracle on Redhat pdf document. It's a good start.

Suggestion: Set shmmax and shmmegs, etc such that you haven't cut up your Oracle space across too many segments, that is, less than 7.

Suggestion: precache these data areas as much as possible before the bulk of users (maybe virtual users) get on. If the test isn't relatable to real world, then the easiest thing to do, is just run a small number of test scenarios before running the actual test, this will load up your cache areas. Also, consider putting cache hints on the tables being pulled in and disposed of often. Ditto your code that's being dumped in and out, you should "pin" the oracle code that is experiencing being purged from the sga and then reloaded often.

Suggestion: look at the size of your redo_buffer_cache. This being small can very much affect your ability to be concurrent. Test by doubling its size and see if it helps.

Suggestion: check your scsi queue depth average (sar -d 1 1). If the avg scsi queue depth is high, you'll need to do increase this parameter in the kernel (max_scsi_queue depth or similar name).

Suggestion: Tables being called: are these writes? If so, you need to review the value of the "inittrans" value. If you've got a lot of writes going to these tables at the same time - you need to increase the value to the max value of the number of concurrent writes needed to the table at the same time. If this is the beginning of the test scenario, it easily be just the logging of the sign-in logging table in your application. Don't! forget to include in your analysis indexes used by these key concurrent queries! If all of these writes are by user_id and a primary key happen at the same time, then the index that supports this function needs a high initrans value as well. The default for initrans in your database is usually 2... lack of this being higher can cause huge delays in your system.

We are the people our parents warned us about --Jimmy Buffett

Alzhy · ‎04-22-2011

Thanks.

I've actually gotten hold of the Oracle 10G guide for RedHat Linux but the Summit presentation is concise and direct.

The server is 256GB RAM... HugePage size is 80GB. Total SGA for all 25 DBs is~ 280GB. And our DBAs are wondering why viaHP GlancePlus - we have ~ 130 GB free memory (wasted in file cache)

So I recommended to bump up HugePages to 200 GB and adviced the DBAs to trim down SGA. Server was restarted and:

From /prc/meminfo:
HugePages_Total: 102400
HugePages_Free: 66187
HugePages_Rsvd: 34271
Hugepagesize: 2048 kB

Total Shared Mem Usage (SGAs, etc) is from ipcs -m: ~162GB

But meminfo does not seem to jive. HugePage_Free should be at ~38GB free or ~19000 pages free!

And I still feel some strangeness with the system. Sluggish it seems. The DBs have not been hit by our apps folks since the changes/reboot but I am pretty sure they will still complain of severe degradtion in performance.

Reading through the docs again, it seems we missed the following:

/etc/security/limits.conf:
oracle soft memlock
oracle hard memlock

/etc/sysctl.conf:
vm.hugetlb_shm_group=`id -g oracle`

Would the above mater if missed with HugeMem Pages enabled?

Hakuna Matata.

Alzhy · ‎04-22-2011

TP, Danke.

Yeah CPU context switches were high and they're still high.. Averaging in the 50K.

Thee were 31 Oracle SHM Segments out of 25 DB instances. SHMMAX was set at 27GB from an old config but we doubled it ~54GB (I know we should really be at 75 to 80 % of RAM).

I'm still puzzled how SHM is gobbled up by SGA. Are you saying even with multiple DB instances - The ideal is to have all SGAs using just 1 SHM Segment? I thought it's 1 segment per DB instance? Need enlightenment here sir.

Also just posted something on what we have vis a vis the Redhat Oracle Doc. What do you think?

Hakuna Matata.

Michael Ehrman · ‎04-23-2011

We had very similar problems on a 32-way x86_64 RHEL 5.5 server with 128 GB of RAM. The database was an Oracle 11g database but we found these things to work with 10g also.

This is what we did to turn performance from poor to better than ever.

Set SGA_Target and SGA_Max to the same value for each database

We started with the total of the SGA at 50% of total memory on the server but found that it could be increased past that
- with 256 GB of RAM, you can probably go up to 75% but you might want to start with 50%

Set HugePages to 50% of total memory on the server to start

Monitor HugePages Free
- if it goes to zero, increase it
- if there is always a large number of HugePages Free, you can reduce it

Do anything you can to prevent swapping. That totally destroyed our database performance.

I also suggest opening a ticket with Oracle Support. Oracle has learned a lot about running databases on Linux servers with a lot of memory over the last couple years.

Alzhy · ‎04-23-2011

If I did not have the following set, but all of the above are, will I have issues?

/etc/security/limits.conf:
oracle soft memlock
oracle hard memlock

/etc/sysctl.conf:
vm.hugetlb_shm_group=`id -g oracle`

We seem to hae missed setting these values as recommended from the Redhat Oracle Perf Tuning Guide.

Thanks!

Hakuna Matata.

Alzhy · ‎04-25-2011

So anyone on my previous post?

This morning, we've had another episode on one of our large Prod Servers wherein it just went down hill, slowed down and hung that we just resorted to the 2 finger salute.

load was very high -- reached 400 (system is a 128GB RAM, 24-way RHEL 5.6, 180GB swap)
Kjournald and kswapd went berserk it seems that that system CPU time was >90%

HugeMem is set as well on this server except for the above limit.conf and sysctl.conf settings.

And WEIRD of all, when the server was stil responsive.. I managed to sneak in cheking HugeMem Usage and got:

# grep -i huge /proc/meminfo HugePages_Total: 21570
HugePages_Free: 21570
HugePages_Rsvd: 0
Hugepagesize: 2048 kB

DB instance (lone) has arounf 43GB of SGA allocated.

Wondering why all of a sudden Oracle or the system decided the instance thrown out at using HugeMem! ipcs showed the 43GB of SHM...

Pretty strange.

Anyone care to comment? We still have RHEL support analysing these information as well as HangWatch stats and a partial vmcore.

TIA!

Hakuna Matata.

TwoProc · ‎04-28-2011

Greetings Alzhy,

1) re: shared mem. 1 per database instance is optimal, no need to fool with it any further, it's as good as good gets.

2) I'll bet when your system goes down, your context switches are absolutely off the chart. If so, you're running too much at once, and have become cpu bound. Ways to fix 2a) more cpu available, 2b) less cpu consumption.
along the lines of b) - your dbas really, really need to run statspack and determine what are the large, large items queries killing your system either in number of times executed, or total cpu consumed.
Really, tuning in this is generally how one must proceed, and a good DBA will find out what is consuming his systems, and will generate plans to tune. If not, hire another consultant DBA... if you're stuck I can give you a name, but you'll have put an email address out there somewhere I can exchange information with regarding the company as we are not allowed to do that in these forums. I've used him when stuck, and he is amazing.

Re: Shared memory consumption by the SGA. The big buffer_cache, code areas, etc, are all stored in shared memory segments so that all of the running pieces of the Oracle database can get at the structure simultaneously. It's how my query, your query, db processes, web queries, etc can happen all at once, they all have access to the SGA components shared in the shared memory area along with the systems processes that need to handle access/locking/maintenance, etc. So, shared memory is for sharing the database...

If you watch this system when performance goes down, do you see a) network traffic rates drop, b) disk i/O drop while context switching increases and goes off the charts?
If so, you're cpu bound by the total number of processes you're running, and the cpus are spending more time switching processess in and out of execution, than they are actually executing code. This is common problem when the upper echelons of performance and concurrency become an issue.

Remember... way back when... when you and I discussed the advantages/disadvantages of BigIron cpu design vs Intel general power house cpu systems ??? I told you about how, in general, the Itanium, IBM Power, PA-Risc, etc systems tend to handle loads with less decay as concurrency loads increase. At the same time, Intel processors handle loads much, much faster but experience a much larger and significant decay in throughput as concurrent loads increase. Well, this is probably part of what you're seeing. Intel systems start off 3 or 4 times faster than Itanium, but experience HUGE dropoffs in performance as the number of concurrent processes increase into the number of processess you're currently experiencing, while Itanium, Power, PA-Risc etc, would not experience huge throughput degradations from concurrency, but never as fast to begin with. From the drawings in the past that I've seen, the cross over point was at about 500 concurrent processes or so, at this point, Itanium chips show their stuff by handling the larger loads with much, much less decay. Kind of like the difference between trying to pull a vehicle stuck in the mud with a tractor vs your personal passenger truck. Truck is faster, tractor can pull bigger loads. Useful metaphor anyways.

You need more cpu, or you need to consume less cpu...

Good Luck!

We are the people our parents warned us about --Jimmy Buffett

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Perf T'shooting - Large Linux DB Farm

Perf T'shooting - Large Linux DB Farm

Re: Perf T'shooting - Large Linux DB Farm

Re: Perf T'shooting - Large Linux DB Farm

Re: Perf T'shooting - Large Linux DB Farm

Re: Perf T'shooting - Large Linux DB Farm

Re: Perf T'shooting - Large Linux DB Farm

Re: Perf T'shooting - Large Linux DB Farm

Re: Perf T'shooting - Large Linux DB Farm

Re: Perf T'shooting - Large Linux DB Farm

Re: Perf T'shooting - Large Linux DB Farm