Re: Another memory issue:

Dale Edmunds · ‎10-18-2004

Hi,
I have an L-class 9000/800 HP-UX server running 11.11. The system has 2 CPU's (540Mhz I believe), 2Gb or Physical memory and a 1Gb network card. I also have 2 Oracle Databases (8i flavour) running on the server: the amount of memory they consume between them is 550Mb, so well beneath the golden threshold of 50% of available system memory.

Over the last 4 weeks, server performance has dropped through the floor. I have studied the Live database (statspack), and am satisfied the database itself isn't the cause of the problem: typical wait events are short duration and occasional.

I'm now looking at the O/S. I've been using command line options like vmstat, iostat, swapinfo etc. but I've installed Glance to make things a bit easier to decipher.

First up - the amount of Free Mem is low. More often than not, we're talking only 2 or 3Mb. What I see having studied it over time, is that as the amount of Free Mem drops, so the Buffer Cache increases. Sys Mem and User mem remains pretty much constant.

Now - each user connection to the system consists of a Telnet session, and from that a Bequeth database connection, so I'm pretty sure I can't use MTS in the database config (bequeth uses Unix pipes and not direct connections to the database listener) And the Lion's share of memory is going on user connections (I'm using "UNIX95= ps -e -o vsz=Kbytes -o ruser -o pid,args=Command-Line | sort -rkn1" to see where memory is going).

What is now happening is that users trying to connect to the application are timing out. I think this is because the O/S is unable to allocate enough memory to the user's processes (in particular the database connection).

I need to prove 100% that the issue is memory related. My manager is right to ask why - if we have changed nothing at the database, O/S or application for several months - memory is now an issue. I also want to be certain that asking him to spend a couple grand on 2Gb of additional RAM is going to solve the problem!

I know I also have a couple of hot disks, but I can deal with these by moving the hot datafiles.....

So - help in proving my theory is greatly appreciated. In particular, how can I trace the relationship between Real Memory and the Buffer Cache? It seems as the former decreases, the later increases.

Thanks in advance

Dale

P.S. I'm no HP or any other kind of O/S expert, so be gentle!

Dale Edmunds · ‎10-18-2004

Hi,

vmstat and swapinfo output attached, Glance screen shot attached to first post.

Thanks

Dale

Simon Hargrave · ‎10-18-2004

Given that your buffer cache is changing size, I suspect that the default ( and very wasteful ) 50% maximum use of buffer cache is configured.

It is much better to have a fixed, small sized buffer cache. Especially on Oracle, which uses the SGA to cache disk writes anyway - the buffer cache is just adding an extra unneeded layer.

To set the buffer cache to fixed, change the bufpages kernel parameter from the default 0 (which means dynamic) to something like 102400 (which is in 4Kb pages, so 400Mb in this case).

Obviously try this out on a test server to see how it affects your setup, as it will depend if other apps etc take a hit from this.

Dale Edmunds · ‎10-18-2004

Hi Simon,

Thanks for the reply.

I can change the bufpages setting, but this will require a kernal rebuild and system reboot, so will need to schedule downtime.

By changing the parameter, is the kernal rebuild done automatically?

regards

Dale

Simon Hargrave · ‎10-18-2004

If you change the kernel parameter through SAM, it will automatically rebuild the kernel and reboot the machine. Obviously you should shutdown oracle et al before doing this.

Bill Hassell · ‎10-19-2004

Reducing the buffer cache will help with system overhead, both in resizing the cache as well as searching it. What is strange is the o/f lisiting for cumulative values. Was this snapshot taken after running the same copy of glance for several days? Free memory is fairly unimportant as is the idea that there is a golden limit on available memory. HP-UX is a virtual memory computer and will use swap space as needed. So processes will run as long as there is enough swap space available.

However, once competing processes start paging, then swap thrashing will occur, thus reducing performance by as much as 100:1. This is seen in the page-out rate in Glance, but you'll need meaningful numbers. It looks like your copy of Glance is way too old for the patches you have in the kernel or Glance needs to be restarted. But you can always use vmstat and look at the po (page-out) column. Single digits OK, 2 digits marginal 3 digits or more = massive swapping which mean way too little memory.

Genrally speaking, Oracle performance is indirectly related to the amount of SGA available. If SGA = 500 megs, then increasing SGA (shared memory for Oracle) to 900 megs should help immensely, BUT you may not get this amount of memory in a single chunk due to fragmentation (if your Oracle 8i is only 32bits). Competing programs will scatter across the address space and prevent Oracle from starting when asking for a larger chunk of SGA.

Adding more RAM won't help fragmentation directly. Rebooting will help. Starting and stopping the database and related middleware can cuase fragmentation, especially if a novice sysadmin uses kill -9 (never use kill -9 for database components). You can look at the shared memory area with ipcs -bmop but for fragmentation, get a copy of shminfo from ftp://contrib:9unsupp8@hprc.external.hp.com/sysadmin/programs/shminfo/

2Gb is very much on the low side for Oracle. Get the extra 2Gb and then start on 2 fronts: convert to 64bit Oracle if possible (middleware may cripple this requirement) so shared memory mapping is no longer an issue, or setup memory windows so Oracle can have a private map to use for SGA (no fragmentation). In both cases, 4Gb of RAM will help a lot. And keep the buffer cache to less than 500 megs. In 11.11, the buffer cache rapidly adjusts to allow processes to have all the RAM they need so it isn't the big issue it used to be, but system overhead (kernel cycles) are increased when managing a large (1Gb or more) buffer cache. Yours was sitting at under 300 megs at the time of the snapshot, a reasonable value.

Bill Hassell, sysadmin

Dale Edmunds · ‎10-19-2004

Hi Bill,

Phew - OK - thanks for replying!

Firstly - the snapshot was taken using the same copy of Glance. I installed the software (60 day trial license), and then used SAM to look at the stats. I have literally compared 2 different points in time, on different days. The server is way out on it's patchset revision (I inherited the server when the old administrator retired!)

I have rarely seen the PO value on VMSTAT go into the 100's, but when things start to grind we certainly see double figures for PO rates.

The buffer hit ratio for the database is good - 95% - and I may even reduce the size of the shared pool as too big can have a negative effect (more blocks to scan!), but will need to test and monitor this. We're running 64-bit O/S and 64-bit Oracle.

Certainly rebooting did help (we did so on Friday) - for a while - and I brought the databases down cleanly (shutdown immediate). The only other database shutdown is done weekly for a full offline backup (and again the abort option isn't used).

Monitoring the system frequantly I see the Buffer Cache doesn't stray over 300Mb. Even though Free Mem is in single figures, I get the feeling this may be an efficient use of resources - and not necessarily an issue!

Many Thanks

Dale

Bill Hassell · ‎10-19-2004

With 64bit Oracle, 2Gb is definitely limiting performance. Your buffer cache is probably not growing to the max size (typically 50%) because so many processes need RAM and the buffer cache will give up it's space for processes. 95% is a good hit ratio. You'll need to change the dbc_max_pct to about 20% for your current RAM size, 10% if you double RAM. Meaniwhile the 64bit version of Oracle has no practical limitations on the SGA size, so once you get the extra RAM, have your DBAs look at increasing it.

Bill Hassell, sysadmin

Dale Edmunds · ‎10-19-2004

Hi,

Thanks Bill - as I'm the DBA (who's now having to masquerade as HP-UX systems administrator) it's all falling in my lap to sort out. I know this system has many problems - memory is one of them, but we have some hot disks (ineffective striping of datafiles across spindles), the application doesn't support table partitioning (which would really help), still using the rule based optimiser as testing with CHOOSE hurt performance - and I traced this to poor application code. I've calculated stats and rebuilt deep indexes, but it helps little.

As I can do little with the database or application at this stage, I can at least address the other issues at the O/S and hardware level!

Many thanks

Dale

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Another memory issue:

Another memory issue: