Re: RDB poor Recovery or Rollback performance.

Chris Barratt · ‎02-14-2006

Thomas,

One thing you might want to look at is the maximum number of processes that can be created on your node. When you get a mass disconnect from your database, a recovery process is created for each attach.

If you have say 300 people attached, and only 200 free process slots in VMS, things can get a bit tangled.

To get around this, you can increase the VMS sysgen limit - maxprocesscnt I think, or alternatively there is an rdb logical which controls how many recovery processes get created at once per database - it is rdm$bind_max_dbr_count.

In our setting, where processes may be connected to up to 10 dbs at once, I have set this to 10, and we have about 350 processes on a system with a maxprocesscnt of around 1500.

This may be your problem, particularly if dbs are open on multiple nodes.

Cheers,
Chris

P.S. In rmu/show stat, the "DIJ" keystroke combination takes you to a DBR activity screen which may help you see what is going on.

Thomas Ritter · ‎02-14-2006

Chris, we have first hand experience what happens when balance set slots are exhausted. The Database will close down with a useful error message why.
We have balance set slots set to about 2048 and average 800 processes per node. We also have Global Buffers enabled on most of our databases, which means that the values of the DBR do not apply to those databases.

From

7.1.2 RDM$BIND_MAX_DBR_COUNT Documentation Clarification Bugs 1495227 and 3916606 The Rdb7 Guide to Database Performance and Tuning Manual, Volume 2, page A-18, incorrectly describes the use of the RDM$BIND_MAX_DBR_COUNT logical. Following is an updated description.

The RDM$BIND_MAX_DBR_COUNT logical name and the RDB_ BIND_MAX_DBR_COUNT configuration parameter define the maximum number of database recovery (DBR) processes to be simultaneously invoked by the database monitor for each database during a "node failure" recovery. This logical name and configuration parameter apply only to databases that do not have global buffers enabled. Databases that utilize global buffers have only one recovery process started at a time during a "node failure" recovery. In a node failure recovery situation with the Row Cache feature enabled (regardless of the global buffer state), the database monitor will start a single database recovery (DBR) process to recover the Row Cache Server (RCS) process and all user processes from the oldest active checkpoint in the database.

Thomas

Jean-François Piéronne · ‎02-14-2006

It would be interesting to look use "RMU/show statistics " and look at
"Recovery Statistics Screen"
(Journaling Information -> Recovery Statistics)
and when recovery are in progress the "DBR Activity Screen"

JF

Jean-François Piéronne · ‎02-14-2006

Another question:
Is fast commit enable on your database?

If yes which parameters are used (checkpoint interval and others)

JF

John Donovan_4 · ‎02-15-2006

"the database monitor will start a single database recovery (DBR) process to recover the Row Cache Server (RCS) process and all user processes from the oldest active checkpoint in the database."

And when this happens I believe the monitor will use the DB recovery buffers. (someone correct me if I'm wrong) 500 is too small. Have you tried incresing this parameter in the DB? Do you have a test system where you can reproduce this or is it all on production?

"Difficult to see, always in motion is the future..."

John Donovan_4 · ‎02-15-2006

Oh I almost forgot the performance issues we had on 7.1 which Oracle helped us with:
1.) Disable FIB (fast incremantal backup)
"alter database filename db_name no incremental backup scan optimization;"

2.) RMU/COLLECT OPTIMIZER/STATISTICS=(CARD,STORAGE) dbroot
or
rmu/collect optimizer db_name/system_relations

I know these helped our performance overall but not sure it will help you w/ DBR. IS ALS running?

"Difficult to see, always in motion is the future..."

Jean-François Piéronne · ‎02-15-2006

John,

Rdb will start only one DBR process at the startup of the database when the database was shutdown and when fast commit is enabled.

When many process failed due to a network problem, for example, Rdb start as many DBR as there are failed process (except if you have defined RDM$BIND_MAX_DBR_COUNT), but the DBR run serially.

Disabled FIB reduce contention on SPAM when many user update (insert, delete) row in the same range of pages

RMU/COLLECT help the optimizer but has no effect on DBR.
DBR only use dbkeys to access rows

JF

John Donovan_4 · ‎02-15-2006

Well I thought it was worth a shot... :))

"Difficult to see, always in motion is the future..."

Thomas Ritter · ‎02-16-2006

We came of this logical name, which we intend to explore.

RDM$BIND_DBR_LOG_FILE.
This logical generates a recovery log file for each recovery process which can give additional useful information for troubleshooting.
The recovery log file includes:
System information.
Active processes.
Specific process information for the recovery process.
Defined and undefined Rdb-related logicals.
DBR event tracking with associated timestamps.

$ DEFINE/SYSTEM device:[dir]DBR_PID.LOG

Thomas

Richard J Maher · ‎02-18-2006

Hi Thomas

I'm sure you already have but, I suggest getting on touch with Oracle Rdb Support on this. 15 mins is an awfully long time! The good news is that you have enough time to find out what the process currently holding the FREEZE lock is actually doing. (I couldn't find if you said that you had Fast Commit enabled?)

Do you see a lot of these messages in the Rdb Monitor log file -

- Dead process transaction 0:0 was not active

Anyway FWIW, I've attached an old discussion from the JCC Listserver (also worth a shot for assistance)

When you get a chance, please post the results as to whether it was a matter of trying to tune/speed up the DBR process, or if there was some other blocking issue.

Cheers Richard

----- Original Message -----
From: "Richard Maher"
To:
Sent: Saturday, March 16, 2002 9:57 PM
Subject: Re: Freeze!

> Hi Federico,
>
> Did you contact Oracle support and if so what was the answer?
>
> Why does the DBR hang on to the freeze lock for so long?
>
> Does it have to hang on to it until recovery is complete?
>
> I used to think it was only held long enough to kick-off and initialize the
> recovery process. (But up until last week, when I looked at some abnornal
> terminations in more detail, I also thought it was the monitor that took out
> the freeze lock, so what do I know)
>
> I took John Creed's advice and turned on the DBR log file and could see
> evidence of pregnant pauses in the timestamps but nothing to say why. From
> RMU/SHOW LOCKS the recovery process clearly has the freeze lock in PR mode
> and nothing, like a long verb, is holding it up but why does it hang on to
> it for so long?
>
> The flip side of this problem is why do we have abnormal terminations? In my
> case it is users clicking the |X| while in a DCL server option and not
> exiting gracefully and they should be persuaded not to, but what can you do?
> Having said that, if I was defending them in a court of law I'd have to ask
> "Why is Rdb starting a recovery process and freezing the database when
> there's clearly nothing to recover?" Do you get alot of these?
>
> - Dead process transaction 0:0 was not active
>
> I tried with pre-started transaction off and although there was no
> transaction to rollback, an RUJ file still seemed to be created (which is
> probably a good idea for performance).
>
> $rmu/dump user
> Active user with process ID 00006848
> Stream ID is 1
> Monitor ID is 1
> Transaction ID is 339
> Recovery journal filename is "MF_PERSONNEL$00019FEAE8D3.RUJ;1"
> No transaction in progress
> Last Process quiet-point was AIJ sequence 2
>
> *The monitor log
> 6-MAR-2002 15:38:32.84 - received recovery status from 00004B78:1
> - process name RDM_RB_1, user SYSTEM
> - for database "MF_PERSONNEL.RDB;1"
> - Dead process transaction 0:0 was not active
>
> So the story so far is:-
>
> . It is a given that abnormal terminations should be eliminated wherever
> possible and users should be reined in. (Especially in the light of
> undo/redo and Fast Commit)
> . Rdb's behaviour (and prestarted transactions in particular) is all geared
> up for performance and this section of the code isn't gonna be changed for a
> couple of dodgy users.
>
> But is there a third option with executive mode rundown procedures? The
> attached code doesn't work and even of it did I don't think the DSRI Rdb$
> routines are expecting to be called from EXEC mode, but couldn't Rdb itself
> at process rundown, see that there was no transaction (or only a pre-started
> transaction) active and do enough to tidy up and tell the monitor this was a
> normal termination? If there was a real txn active then just fall over as
> usual.
>
> I don't know when or where in the death of a process these rundown routines
> are called and maybe the status of Rdb at such times may be too undefined,
> but is it worth a look?
>
> Did you know when using SQLPRE with rdms$keep_prep_files on you only get an
> .MOB object file and not the usual .MAR file that you get with RDBPRE?
>
> Should I have tried to map sql$transaction_ptr instead for SQL?
>
> Depending on when you kill (ctrl-y,stop) any Rdb program that has been
> linked to the following, you can get RDB-E-OPEN_TRANS or COSI-F-ACCVIO and
> so on. If you have a recent copy of the DSRI handbook or you know what the
> MACRO for a SQL>DISCONNECT ALL; would look like then maybe you could have a
> go.
>
> Regards Richard Maher
>
> PS. %AMAC-I-RUNTIMSTK Is the best, the smartest and just the most
> superlative message I have ever seen!
>
>
> $ on warning then exit
> $ if .not. f$privilege("cmkrnl,sysprv") then goto no_priv
> $ if f$getsyi("arch_name") .nes. "Alpha" then goto no_vax
> $!
> $ create rdb_shut.mar
>
> .title Rdb_shut
> .ident "V1.0"
>
> .library "sys$library:lib.mlb"
>
> $plvdef
> $ssdef
>
> msg_vec:
> .long 1
> msg_sts:
> .long ss$_abort
>
> my_rdb_vector:
> .long 0
> rdb$lu_status:
> .long 0
> .blkl 18
>
> .psect
> rdb$transaction_handle,nopic,ovr,rel,gbl,noshr,noexe,rd,wrt,novec,quad
> my_txn_handle:
> .long 0
> .blkb 4 ; Round up
>
> .psect pers,nopic,ovr,rel,gbl,noshr,noexe,rd,wrt,novec,quad
> my_db_handle:
> .long 0
> .long 12
> .ascii /MF_PERSONNEL/
>
> .psect _maher$code,pic,con,rel,lcl,shr,exe,rd,nowrt,quad
>
>
> exec_rundown: .jsb_entry ; Entry point for rundown
>
> pushl #0 ; Caller PC ????
> pushal my_txn_handle
> tstl @(sp) ; Transaction active?
> beql 1$
> pushal my_rdb_vector
>
> calls #03,g^rdb$rollback_transaction ; Why is the standard
> commit?
> blbs r0,2$
> $putmsg_s -
> msgvec=my_rdb_vector
> brb 2$
> 1$:
> $putmsg_s -
> msgvec=msg_vec
>
> addl2 #8, sp
> 2$:
> pushl #0 ; Caller PC
> pushal my_db_handle ; DBhandle
> tstl @(sp) ; Is it already detached
> beql 3$ ; Skip this one if so
> pushal my_rdb_vector
> calls #03,g^rdb$detach_database
> blbs r0,4$
> $putmsg_s -
> msgvec=my_rdb_vector
> brb 4$
> 3$:
> movzwl #ss$_normal,msg_sts
> $putmsg_s -
> msgvec=msg_vec
> addl2 #8, sp
>
> 4$: rsb
>
>
> .PAGE
> .SBTTL Privileged Library Vector
>
> .psect dickie$services,page,vec,pic,nowrt,exe
>
> .long plv$c_typ_cmod ; Set type of vector to change
> ; mode dispatcher
> .long 0 ; Reserved
> .long 0 ; # of Kernel mode routines
> .long 0 ; # of Executive mode routines
> .long 0 ; Kernel routine list
> .long 0 ; Exec routine list
> .long 0 ; Kernel rundown handler
> .address exec_rundown ; Exec rundown handler
> .long 0 ; RMS Dispatcher
> .long 0 ; Kernel routine flags
> .long 0 ; Exec routine flags
>
> .end
>
> $!
> $ macro/list/enable=quad rdb_shut.mar
> $!
> $ link /share=rdb_shut.exe -
> /sysexe -
> /map -
> /cross -
> /full -
> /notrace -
> rdb_shut.obj, -
> sys$input:/options
>
> gsmatch=lequal,2,0
>
> symbol_vector = ( -
> pers=psect, -
> rdb$transaction_handle=psect -
> )
>
> protect=yes
> collect=safe,_maher$data
> protect=no
> collect=user_rw,pers,rdb$transaction_handle
>
> $!
> $copy/log rdb_shut.exe sys$common:[syslib]
> $!
> $if f$file_attributes("sys$share:rdb_shut.exe","KNOWN")
> $then
> $ installx replace sys$share:rdb_shut.exe
> $else
> $ installx add sys$share:rdb_shut.exe /open/header/share/protect
> $endif
> $!
> $purge sys$share:rdb_shut.exe
> $exit
> $!
> $no_priv:
> $ write sys$output "Insufficient privilege. You need (CMKRNL,SYSPRV)"
> $ exit 44
> $no_vax:
> $ write sys$output "This code only works on alpha"
> $ exit 44
>
>
> ----- Original Message -----
> From: Paul Mead
> To:
> Sent: Monday, March 04, 2002 11:36 PM
> Subject: Re: Freeze!
>
>
> > I have to agree that disabling Fast Commit is a sure way to reduce the
> > ability of your application to get work done quickly while possibly not
> > having any affect on the problem you are trying to solve. Rather than
> using
> > the "shotgun" approach to try and resolve this problem why don't you just
> > call Oracle support and get someone who understands, or can find someone
> who
> > understands, DBRs to help you with the problem?
> >
> > ----- Original Message -----
> > From: "Federico Monteverde"
> > To:
> > Sent: Monday, March 04, 2002 3:50 PM
> > Subject: RE: Freeze!
> >
> >
> > > Ok, but our problem is we are already having long recovery times. Very
> > > frequently our database is being freezed for about 20 seconds and more.
> > >
> > > Regards,
> > > Federico
> > >
> > > -----Mensaje original-----
> > > De: Ian Smith [mailto:Ian.E.Smith@oracle.com]
> > > Enviado el: lunes 4 de marzo de 2002 19:06
> > > Para: oraclerdb@jcc.com
> > > Asunto: Re: Freeze!
> > >
> > >
> > > Ok, so ROLLBACK may take longer, is that really an issue? Since most
> > > applications do vastly more COMMIT actions than ROLLBACK then this
> should
> > be
> > > a
> > > reason to *use* FAST COMMIT.
> > >
> > > I believe that savings using FAST COMMIT make any occassional ROLLBACK
> > > overhead
> > > insignificant. It also enables a lot of very important Rdb
> functionality.
> > >
> > > Ian
> > >
> > > Federico Monteverde wrote:
> > >
> > > > Fast commit is disabled. According to documentation it could increase
> > > > rollback times.
> > > >
> > >
> > > --
> > > Ian Smith Read the Technical Corner column
> > > Oracle Rdb Engineering Group in the Rdb Web Journal
> > > email: Ian.E.Smith@Oracle.com http://www.oracle.com/rdb
> > > (Standard disclaimer: The statements and opinions expressed here are my
> > own
> > > and do not necessarily represent those of Oracle Corporation)
> > >
>
>

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: RDB poor Recovery or Rollback performance.