Operating System - HP-UX
1753524 Members
5098 Online
108795 Solutions
New Discussion юеВ

cannot correlate error in Oracle 10g alert log "logical corruption"

 
SOLVED
Go to solution
TwoProc
Honored Contributor

cannot correlate error in Oracle 10g alert log "logical corruption"

Hey all, I've got a problem that's new to me.

I've got this error in the alert log:
Error backup file 489, block 178631: logical corruption

OK - this looks really bad. I quickly found out what file 489 is, and run a dbverify (dbv),
I run it across the whole file, and I try it once more using the user name and password.

No errors. Drat (but a good drat).
I look in v$backup_corruption - nothing there,
look in v$database_block_corruption - nothing ere either.

So, I decide I can look from RMAN.
I run:
backup validate check logical datafile ';

No errors.

Look in the two v$ corrupt views (as above), both empty.

Ok, from dba_segments I get the list of every object in that whole tablespace that includes that datafile.
And then for each, I ran:
analyze table validate structure cascade;

And... NO ERRORS!

I look again in the two v$ corrupt views and nothing there. For giggles, I even checked the v$copy_corruption - nothing there either.

OK, only one segment left in the whole database, and it's a "LOBSEGMENT" type, and I don't know how to run an analyze on that type, so I skip (unwillingly) that segment.

OK, so - maybe I missed something, I'll just check with a "backup validate check logical" each and EVERY database datafile in the whole database.

Nothing. Bleah.

I check my two v$ corrupt views (as above) - nothing.

So, I decide to do a select(*) of every item in the aforementioned tablespaces' tables - just to see if I could make the error repeat. Can't do it.

Lastly, I look in the alert logs, and the problem has occured before, in the same exact data file, at the same exact (nearly) time, when I kicked off an RMAN full hotbackup for the last 3 days (this week).

Note: I was REALLY DISSAPOINTED to see that HP's DataProtector 6.0 didn't detect, nor show this error in its log; and is happily flagging my RMAN hotbackups as succesful with no errrors or warnings. Nice, huh?

I've put all backups and archivelogs on permanent retention in the tape library, including any ones in the system until this is resolved somehow. I just hope the space holds out.

So, here's where I am, I've got this database datafile that is SCARING me, and I can't find anything wrong in the WHOLE database, much less - the one datafile, or any structures within the datafile, or it's tablespace.

If anyone can offer insight, especially from prior experience, please, please advise.

Many Thanks,

John
We are the people our parents warned us about --Jimmy Buffett
9 REPLIES 9
Eric Antunes
Honored Contributor

Re: cannot correlate error in Oracle 10g alert log "logical corruption"

Hi John,

There is a similar issue described in Metalink Note 342443.1...

Best Regards,

Eric
Each and every day is a good day to learn.
TwoProc
Honored Contributor

Re: cannot correlate error in Oracle 10g alert log "logical corruption"

Eric - your advice always appreciated. Thanks for the doc #, I'll check it out shortly.

Anyone else seen this one? Yogeeraj? Steven?

Anyone? Anyone?
-- Ben Stein, Ferris Bueller's Day Off
We are the people our parents warned us about --Jimmy Buffett
Volker Borowski
Honored Contributor

Re: cannot correlate error in Oracle 10g alert log "logical corruption"

Not sure, if exactly the same,
but we had this after a datapump reorg of
a table with a LONG column (fix available meanwhile).
RMAN was complaining, DBV not, but "validate
structure cascade" was unhappy as well.

Was fixable with dbms_repair, it was just some length information mismatch.

Can you do a simple export? If yes, I'd try
to export, rename original table/indexes, and import back. Check if error is still there.

Do not know if this helps
Volker
Jean-Luc Oudart
Honored Contributor
Solution

Re: cannot correlate error in Oracle 10g alert log "logical corruption"

John

kind of strange one. When you run the RMAN command "backup validate check logical database;" and nothing in the v$database_block_corruption view !

RMAN does not check for logical block corruption by default, this may explain why you did not get any DP error.
The above RMAN command to check the logical database should populate the view with such corruption. Just wondering if the block is not in use anymnore and therefore the command has nothing to report.

Did you log this with Oracle support ?

Regards
Jean-Luc
fiat lux
TwoProc
Honored Contributor

Re: cannot correlate error in Oracle 10g alert log "logical corruption"

Volker and Jean Luc - interesting points. In fact I could select from the table fine, with no error! So, I just alter table xxx moved it to another tablespace, and then moved it back.
When I ran the query against the dba_extents, it was now "gone" and not in use by any objects. When I discussed it with one of the "corruption" assigned gurus at Oracle, and we dragged over it and over it, it seemed like that block was probably allocated to my data, but probably lived in the "free" space kept by table to stretch/grow over to avoid chained rows.
So, right now, it's a logical error, that doesn't belong to any object, as soon as something, anything writes over that block, and new checksums get laid down, it will finally fix the problem. As it is not a physical corruption, there is nothing to worry about because the block will never be used by anything, except for two cases, a) level zero object hot backups, which will note that there is an error (as before, but only in the alert log?), or when it finally gets overwritten, which will solve the whole problem.

For now, the block isn't owned by any object's space allocation, so it causes no harm. If I'm lucky, something soon will decide to use the block by overwriting it (which is the most likely thing the system can do with an unclaimed block, the next is to claim it and not use it for free space, as before).

Thanks for you two guy's last input, very nice of you to take the time to give your input.
We are the people our parents warned us about --Jimmy Buffett
TwoProc
Honored Contributor

Re: cannot correlate error in Oracle 10g alert log "logical corruption"

See posting above. Thanks Eric, Volker, and Jean Luc.
We are the people our parents warned us about --Jimmy Buffett
Volker Borowski
Honored Contributor

Re: cannot correlate error in Oracle 10g alert log "logical corruption"

Hi,

if you like to get rid of this as well:

create table dummy like dba_extents;
alter table dummy allocate extent in datafile '.....';

repeat until freespace in datafile = 0

insert into dummy select * from dba_extents;
commit;

repeat until first extent in other datafile allocated.

drop table dummy;

Volker
TwoProc
Honored Contributor

Re: cannot correlate error in Oracle 10g alert log "logical corruption"

Thanks for the tip Volker, I'm going to give that a try on a test db.

Great advice, thank you!
We are the people our parents warned us about --Jimmy Buffett
TwoProc
Honored Contributor

Re: cannot correlate error in Oracle 10g alert log "logical corruption"

OK, well this error was never reported back to the Data Protector parent process, because it was a checksum failure in an unused block. Reusing the block with just test data (thanks for the tip Volker) blew away the problem. Since the error was merely a warning of a sort, it caused no issues. Oracle advised my that since my database was "upgraded" from Oracle 9i to 10g, this occurs in the index areas as a consequence of new/different/better chksumm-ing tools for 10g blocks vs 9i data blocks. I was informed by Oracle that this can happen again in conversion to database version 11g in unused blocks, and that the answer is to just re-use the space somehow, to get new data and checksums done against the space. The "free radical" if you will, here is that when you use tables that have free space requirements, the block that needs redoing may fall into the free space area, and still not be re-used. In which case you'd have to create a table in which there is no free reserve space in the create table definition, then populate it with junk data until the block is allocated to a segment, then you can drop the junk table.
We are the people our parents warned us about --Jimmy Buffett