Databases
cancel
Showing results for 
Search instead for 
Did you mean: 

Oracle redo->archive rollover "failure"

Peter Heinemann
Frequent Advisor

Oracle redo->archive rollover "failure"

We're running hpux 11-i; Oracle 8.1.7;
disks are EMC SAN through tachyon FC HBAs.

The problem:
DBA reports that redo -> archive rollovers generate an error, then restart. He has noticed that this error is accompanied by 100% i/o on the file system. Here's an example of the errors reported in his log:
Wed Sep 25 11:09:38 2002
ARC1: Beginning to archive log# 4 seq# 261
ARC1: Failed to archive log# 4 seq# 261
ARC1: Beginning to archive log# 6 seq# 262
Wed Sep 25 11:11:26 2002
ARC0: Completed archiving log# 4 seq# 261
ARC0: Beginning to archive log# 6 seq# 262
ARC0: Failed to archive log# 6 seq# 262
ARC0: Beginning to archive log# 6 seq# 262
ARC0: Failed to archive log# 6 seq# 262
Wed Sep 25 11:13:30 2002
ARC1: Completed archiving log# 6 seq# 262

He researched into Oracle Metalink; an article there said to ignore the error. He also stated that no enhanced logging is available. However, the fact that the error occurs makes him nervous (heck, he is a DBA after all -- he's supposed to be nervous) and it does create an useless and space-wasting archive log file each time it occurs.

So, what's an admin to do?
- the 100% io rate: -- aside from queue depth, what else is there to look at? At this point we have a correlation, but not necessarily a cause-and-effect. There are no EMS or syslog entries related to the io rate or the failures.
- would there by any kernel params. related to disk io, especially SAN/fibre connected ones to adjust performance accordingly?
- I did read the thread about modifying the mount_vxfs options for delaylog/nodetainlog and will try them; does anyone think these are related?

Thanks in advance...

...Peter
4 REPLIES
Volker Borowski
Honored Contributor

Re: Oracle redo->archive rollover "failure"

Hi Peter,

would be good to know how big your redologs are, but I guess even if they are 100M they should be archiveable in two minutes. If they are small, enlarge them (i.e. SAP defaults to 20MB).

If you get switches within 2 minutes range, you should enlarge the online-logs. If you brought the log-switches to 15 minutes and more, and still do not get the archive done, add additional log-groups.

What about your log group #5 ?
Is this a real gap, or is group #5 damaged/outordered ?

Put parameter "log_checkpoints_to_alert=true" or likewise in you init.ora file. If you get additional "checkpoint not complete" messages inbetween, it is an additional sign to enlarge the online-logs.

BTW: Never consider these archivelogfiles as "useless". In fact they are a very important part of an oracle-database :-)

Good hunting
Volker




Brian Crabtree
Honored Contributor

Re: Oracle redo->archive rollover "failure"

Also, you will want to check your I/O on the disks the archive directory is in. This error will happen if the archiver is unable to keep up with the database. Possibly increasing the number of archive processes might be a good idea, and increasing the size will decrease the frequency that it will switch at, which can help with this issue as well.

Brian
Christian Gebhardt
Honored Contributor

Re: Oracle redo->archive rollover "failure"

Hi Peter

This errors in the logfile are normal behaviour in 8.1.7.
In your example there is following situation:

- ARCO archives log#4
- ARC1 is willing to work, saw that log#4 is ready to archive and begins. Then ARC1 notice that ARC0 is still doing the job writes the error message and archive log#6 instead
- ARC0 is ready with archiving log#4, is willing to work, saw that log#6 is ready to archive, ...

Strange but it works as designed

Christian
R. Allan Hicks
Trusted Contributor

Re: Oracle redo->archive rollover "failure"

I may be missing something, but.. you have two archiver processes. I can see where if you have 2 redo logs you could get into trouble because you could fill one up start the archiver and fill the next one up before the frist could finish. Yes, you should be able to archive 100M in two minutes. Some things that you might want to check.

1. What all is going to the archive drive? Could you have a contention problem. You have at least two archive processes fighting over it. For example, if the archives are on the same drive as the redos or a hot tablespace you could have some problems.

2. What happens if you add redo logs? If you have two logs and two archivers, Oracle is supposed to politely wait until a redo log becomes available. If you have two redo logs and both archivers are archiving them, you have no available redos and Oracle should hang until a redo log becomes available.

I think the first order of business is to find out why your I/O is at 100%. If the disc channels are hosed, you have a real problem archiving anything.

-Good Luck
Allan
"Only he who attempts the absurd is capable of achieving the impossible