Operating System - OpenVMS
1752590 Members
4138 Online
108788 Solutions
New Discussion юеВ

Re: RDB poor Recovery or Rollback performance.

 
Thomas Ritter
Respected Contributor

RDB poor Recovery or Rollback performance.

We have a 4 node Disaster Tolerant Cluster running VMS 7.3-2 and Oracle Rdb V7.1-401. Each ES45 is configured with 24GB of RAM at 50% utilisation, and has 4 CPU installed. Peaks VMS locks are about 3,000,000 million. We run RDB with Global Buffers enabled. A typical user allocated about 900 locks. There are about 600 users attached to the main database.

When a number of users disconnect from the Database, typically because of LAN or Server related issues, RDB has to generate Rollbacks using Recovery Unit Journals (RUJ) files. Typically we may have 100 rollbacks in progress. Sometimes one of these takes up to 15 minutes, which in affect makes the database unusable. We are certain that I/O throughput has little to do with the problem. Accounting records indicates little I/O. The recovery work seems to be struggle with what we believe maybe "queuing" related or related to RDB obtaining "freeze lock" .

This is not a new problem and has troubled our systems for years. The customer is very unhappy given that they have invested millions in Wide Are Cluster, yet downtime is still being experience because of RDB and VMS workings.

Would anyone be able to suggest a course of study which will enable us to understand the factors involved in improving RDB Recovery performance ?

30 REPLIES 30
Volker Halle
Honored Contributor

Re: RDB poor Recovery or Rollback performance.

Thomas,

are you running some performance measuremet tools (eg. ECP or even better T4) ? As these tools collect performance relevant data continously, they can be quite useful for 'after-the-fact' analysis.

T4 would allow you to analyse the system load from an OpenVMS perspective, so there are chances that you could identify a bottleneck, if it should be inside OpenVMS.

Volker.
Wim Van den Wyngaert
Honored Contributor

Re: RDB poor Recovery or Rollback performance.

Jeroen Hartgers_3
Frequent Advisor

Re: RDB poor Recovery or Rollback performance.

I think you need a consultand. Because rdb performance is difficult stuff, but there is a lot of performance profit to make.

maybe the question should be why are there so many rollbacks.

may be you should contact www.vxcompany.com they know everything about vms, rdb and oracle
Wim Van den Wyngaert
Honored Contributor

Re: RDB poor Recovery or Rollback performance.

May be it is simply the rollback that takes a long time. E.g. if doing the transaction takes 10s to complete, rollback will take by average 5s (depending on what you exactly do).

And since all rollbacks are serialized, you get a block of about 500 s.

But you talk about 15 minutes. So may be there are transactions of 15 minutes to complete. Try to make them smaller/commit faster.

Wim
Wim
Jean-Fran├зois Pi├йronne
Trusted Contributor

Re: RDB poor Recovery or Rollback performance.

How many recovery buffers?

If the value is small and you plenty of free buffers increase this can give a significant boost.

you can use:
$ pipe rmu/dump your_database | sea tt: "recovery buffer

And increase the parameter using
sql> alter database filename your_database
number of recovery buffers is xxx;

where xxx is the number of recovery buffer.


Jean-Fran├Г┬зois
Thomas Ritter
Respected Contributor

Re: RDB poor Recovery or Rollback performance.

Jean-Fran├Г ├В┬зois,

Our recovery buffer values are

Default recovery buffer count is 500

I need to understand what 500 means and what the valid range of values is.
Maybe this is the default value.

Thomas
John Donovan_4
Frequent Advisor

Re: RDB poor Recovery or Rollback performance.

Thomas - we have a small 3GB database and our recovery buffer setting is 1700. If you have a large DB & many users, 500 is way too small. See section 10.2 in 7.0 "Guide to Database Maintenance" guide. I got fantastic help from Oracle when I set this up many moons ago.

jd
"Difficult to see, always in motion is the future..."
John Donovan_4
Frequent Advisor

Re: RDB poor Recovery or Rollback performance.

Follow up from Oracle page:
Subject: RDBPROD: Impact of NUMBER OF RECOVERY BUFFERS on Row Cache DBR Performance
Doc ID: Note:111080.1 Type: WHITE PAPER
Last Revision Date: 18-JUL-2005 Status: PUBLISHED
Title: Impact of NUMBER OF RECOVERY BUFFERS on Row Cache DBR Performance
Product: Oracle Rdb
Op/Sys: OpenVMS

The database "NUMBER OF RECOVERY BUFFERS" parameter specifies the number database buffers that recovery (DBR) process will use while performing recovery operations. As with user processes, the number of buffers for a DBR process can have a dramatic effect on performance. Generally, increasing the number of buffers for the DBR process will have a positive effect on performance when the DBR process is performing recovery for large transactions (either for rollback of aborted transactions or for recovery (REDO) of many or large transaction when the database FAST COMMIT feature is enabled).

The performance of the database recovery (DBR) process can be especially significant when the Row Cache feature is enabled. After a database or node failure (system crash, for example) for a database with Row Cache enabled, when the database is re-opened, the monitor will create a DBR process to recover the database. This DBR process performs the following steps:

1. All row caches are recovered from the backing store (.RDC) files

2. The oldest checkpoint (of either the "fast commit" or the Record Cache Server (RCS) checkpoint) for any database user is determined

3. All transactions starting at the oldest checkpoint found are (re)applied to the database

4. Each active transaction is rolled back

Because of the potential database I/O required to perform steps 3 and 4, a larger number of buffers can help reduce the duration of the database recovery process.

Testing demonstrates how significant the number of DBR buffers can be on the performance of re-opening a database after node failure. A test case involving 25 users performing 1,000 transactions each (a total of 25,000 transactions) was interrupted by a simulated system crash. After the system was restarted, the database was opened. A specially instrumented database recovery (DBR) process was used to measure the amount of CPU time consumed along with the number of I/Os performed for various portions of the recovery.

With the default of 20 recovery buffers for the DBR process, a total of 53,542 disk I/Os were performed during the REDO portion of recovery. At a rate of 100 I/Os per second, this step would require about 9 minutes. Increasing the buffer count to 1,000 reduced the number of disk I/Os for this step to 4,342. At the same rate of 100 I/Os per second, the REDO step would now take less than 1 minute. However, for this particular test, increasing the number of buffers for the DBR process to 2,000 only reduced the I/O count to 4,262 and further increases of the buffer count resulted in no additional I/O decrease.

Generally, if global buffers are not enabled and there is sufficient memory on the system, a large number of recovery buffers for the DBR process is beneficial. It is important, however, to avoid specifying so many buffers that the DBR process page faults excessively.

When global buffers are enabled, the number of buffers for the database recovery process is limited to the global buffer USER LIMIT quota. In order to increase the number of buffers for the DBR process, both the "USER LIMIT" and "NUMBER OF RECOVERY BUFFERS" parameters may need to be increased.

For node failure recovery when the ROW CACHE feature is enabled, a value of 5000 or 10000 for buffers for the DBR process may be reasonable. The optimum number of buffers will vary greatly depending on the number and complexity of transactions and processes to be recovered and is thus quite application and site specific.
Copyright ├В┬й 2000 by Oracle Corporation. All Rights Reserved.
"Difficult to see, always in motion is the future..."
Thomas Ritter
Respected Contributor

Re: RDB poor Recovery or Rollback performance.

Thanks very much for the good advice. However, we are sure we do not have an I/O related problem. The problem seems to be with RDB. Mostly likely something internal. Configuraton maybe.