Operating System - HP-UX
1827889 Members
1742 Online
109969 Solutions
New Discussion

Strange buffer cache question.

 
Aharon Chernin
Advisor

Strange buffer cache question.

System: V2500, HP/UX 11.00 .

I reduced our buffer cache from 1.56 gigs to 756 megs to see if I could gain any performance improvements.

The system did respond fairly well to this change.

Though after a few hours one of our database files became corrupted (Universe PIC db). Whats the chances that lowering the buffer cache could corrupt a database? The system was under low to moderate load at the time. And database corruption happens occasionaly every month or so.

But, since I had made the buffer cache change the night before, all fingers are pointed at me and the buffer cache. Our development test systems performed flawlessly with 300 megs buffer cache, before I did the change to the live machines.

Could you guys please respond with:

*) Chances a buffer cache drop to 756 from 1.56 gigs would corrupt a database.
*) HP's recommended buffer cache settings, and why they are recommended.

This is worth some points for you guys :) And the more responses I can get, the better chances I will be able to keep the buffer cache at the recommended setting.

I personally think the database corruption just happened to be a fluke, since it has happened in the past.
Unix is user friendly, it's just picky about its friends.
13 REPLIES 13
John Poff
Honored Contributor

Re: Strange buffer cache question.

Hi,

I would suspect that dropping the buffer cache has nothing to do with the database corruption problem, especially if it happens on a regular basis. With less buffer cache, the data will have less memory to live in and should be flushed to disk more frequently. If you aren't logging any kind of disk errors from the operating system, I would suspect the database programs, and then possible the disk array. Even at that, most modern disk arrays are so good that disk corruption really isn't an issue unless you have some bad hardware (bad disk, bad buffer cache in the array, etc.) so I'm inclined to think it is a database problem. Also, with a lower buffer cache, there should be more memory for the database application, assuming that it can take advantage of the addditional memory. What type of disk storage are you using on that system?

Also, using Glance and/or MeasureWare, you should be able to track your buffer cache hit percentage, and I'm willing to be that it is in the high 90 percent range with the lower buffer cache size.

JP
John Poff
Honored Contributor

Re: Strange buffer cache question.

Can you post some details about the database corruption. Is it just one file getting corrupted, or several files? Is it just some garbage in a few records, or throughout a whole file? What do you do to recover from the corruption?

JP
Aharon Chernin
Advisor

Re: Strange buffer cache question.

Disk array is an EMC Symmetrix.. And it's not logging any errors.

The Filesystem (VxFS 3.1) is not logging any errors at all.

Using Glance, CPU, disk IO, and RAM utilization are all extremly low.

Thanks for the replies so far!

Unix is user friendly, it's just picky about its friends.
Aharon Chernin
Advisor

Re: Strange buffer cache question.

The corruption was the header information of a 2.9 gig database file.. Just the header of 1 file, no other corruption anywhere else.
Unix is user friendly, it's just picky about its friends.
John Poff
Honored Contributor

Re: Strange buffer cache question.

All the parts of your environment sound good. There are lots of us out here running large databases on V-class systems, under 11.00, on EMC Symmetrix arrays, with lots of different kinds of databases (Oracle, Sybase, etc.) and I haven't heard of any corruption problems like that. I would really suspect the database or a program that updates the database.

I don't know of any way to trace or log the writes to a particular file, but maybe one of the other real wizards on the forum will have an idea. If you could track the programs that write to that file, and tie it to when the header record gets corrupted, you might be able to narrow down the corruption to a certain program.

JP
Bill Hassell
Honored Contributor

Re: Strange buffer cache question.

The buffer cache size is not something magic. Making it very large (more than 900megs or less than 200 megs) will impact read/write performance for files. As for the corruption of a single header, I would look at all the patches (are you up to date, at least with a 2002 bundle?). Adn as you've stated, such corruption has happened in the past. I would strongly suspect the database code.

As for a recommendation, the range is 200-800megs for most modern (1999 and newer) machines with at least 4 Gb of memory. For Universe, when there is a lot of report writing, you can bump it up but as always, it all depends. Most important, the cache is simply improving filesystem performance...there is no magic value that might cause corruption.


Bill Hassell, sysadmin
A. Clay Stephenson
Acclaimed Contributor

Re: Strange buffer cache question.

You can completely rule out decreasing the size of the buffer cache as to the cause of the corruption. I would tend to think the database is your real problem BUT it might be possible that you have a very poorly patched 11.0 box. You should check to see that fairly recent JFS and LVM patches have been installed. I would search on 'data corruption' and look for any symptoms similar to yours.

The reccomended buffer cache settings are really "it depends". On most 11.0 boxes going above about 400MB is pointless; on 11.11 buffer caches of 800-1000MB perform well in MOST cases.

If it ain't broke, I can fix that.
Aharon Chernin
Advisor

Re: Strange buffer cache question.

Thanks for the replies so far guys :) This really helps my endevours in performance tuning.

As far as patching is concerned. I am pactched up to June 2002. Always stay up to date with latest patch releases, latest lvm patches, and latest vxfs patches, etc.

Are you guys saying the buffer cache performance has improved with 11.11? Is it possible to load 11.11 on a k460? I may load a test copy on one of my dev boxes.
Unix is user friendly, it's just picky about its friends.
Michael Tully
Honored Contributor

Re: Strange buffer cache question.

Hi,

Yes it is possible to run 11.11 on a K460, and the 64 bit version as well. Check the table in this link.
http://devresource.hp.com/STK/serversupport.html

I haven't loaded the June 2002 release on any of my production servers, but have on a test system. I see no big problems with it.

I've not measured a difference in 11 Vs 11i as far as buffer cache is concerned, but the 11i version is very stable and we use on most of our production servers. My opinion is that it depends entirely on the combination of hardware, software, applications and databases.

Michael
Anyone for a Mutiny ?
Chris Richings
New Member

Re: Strange buffer cache question.

Hi,

I have seen corruptions when a site is using EMC physical devices and have not set the correct values for pv timeout and bad block relocation.

------

If you using EMC Physical devices/disks, you may want to ensure you have the physical volume timeout set to the required value.

Also that Bad Block Relocation should be set to 'None' on all Logical volumes that are on the EMC physical devices.

-------

The PV timeout should be set to 180 for all EMC devices.

check the value with 'pvdisplay' command

pvdisplay /dev/dsk/cXtXdX

look for line 'IO Timeout' if its set as default this is 30 seconds. (if I remember correctly)

change with 'pvchange' command

pvchange -t 180 /dev/dsk/cXtXdX

---

Logical volumes should have 'bad block relocation' set to 'None'

check the value with
lvdisplay /dev/vgXX/lvolX

look for line "Bad block"
this setting should be either 'on/off/None'

If not set to 'None' set with below command.

lvchange -r N /dev/vgxx/lvolx


Hope this helps,

regards

Chris


Aharon Chernin
Advisor

Re: Strange buffer cache question.

Bad block relocation is off, and IO timeouts would have logged in dmesg, or syslog, or through power path. All of which would have logged an error if a timeout occurred.
Unix is user friendly, it's just picky about its friends.
Nick Wickens
Respected Contributor

Re: Strange buffer cache question.

You don't mention which database engine you are using ? We had a recommendation from Informix some time ago to use raw disk access (ie via rlvol rather than lvol) as they believe this is safer for their particular engine than having a system crash with some data possibly still in the buffer.

As a result of this we reduced our buffer cache right down to 10% without problem.

Hats ? We don't need no stinkin' hats !!
Aharon Chernin
Advisor

Re: Strange buffer cache question.

Thanks nick. We are running a database called Universe. It's a type of PIC database. I dont even think it supports raw filesystems for data files.
Unix is user friendly, it's just picky about its friends.