Re: ORA-01578: ORACLE data block corrupted

Jean-Luc Oudart · ‎10-19-2004

Ralph,

to check if the database is in archive mode, check the v$database view :
select * from v$database;
check LOG_MODE

metalink is the Oracle support site to search for problem, report problems and also
forums.

Regards
Jean-Luc

fiat lux

Brian Crabtree · ‎10-19-2004

Ralph,

Ok, first, what version of Oracle are we running?

Second, have the DBA's used the Oracle data verification tool ($ORACLE_HOME/bin/dbv) on this datafile, and all the other datafiles.

Corruption in the SYSTEM tablespace is very bad, as it will almost guarentee that backups are invalid and will not be usable.

The best/only option at this point is to have the DBA's do a full export of the database, destroy everything, recreate it, and reload the database.

Now, to answer your thoughts. This does not necessarily mean that it is a hardware failure. Depending on the version of Oracle, and what is happening, it is possible that it is logical corruption of the database, rather than physical corruption. Checking all of the files on the database for corruption is the easiest way to test for this, as physical corruption tends to spread (short of an intermittent RAID failure, which is a whole other way we can go).

Unless this is test/development, I would work on getting the database put into a stable place first, and move from there.

Let me know if you have any questions.

Thanks, Brian

Jean-Luc Oudart · ‎10-19-2004

Ralph,
to obtain oracle product version :
select * from v$version

Regards
Jean-Luc

fiat lux

Ralph Grothe · ‎10-19-2004

Hello again,

there are seven Oracle instances running per cluster node (two-node active/active MC/SG cluster).
The servers are, though furnished with 6 gigs of RAM, a bit under memory pressure right now.
vmstat as well as glance report intensive paging activity @ 100% mem usage and less than 100mb free memory.
However, in glance yet I cannot detect actual swap outs (aka deactivations).
I don't know how big the so called SGA (is it shared mem segs + mmapped pages?) per instance is.
But there also must exist a data dict. view that yields this entity.
I think if I use the ipcs command I only get segments that were claimed by procs through the shmget() syscall, but it won't display those pages allocated through mmap() syscalls, right?

# ipcs -mob|awk '$5=="ora920"{s+=$NF};END{printf"%10.2f MB\n",s/2**20}'
1514.28 MB

Unfortunately the manpage of ipcs doesn't reveal if the SEGSZ value is given in bytes.
I guess one would have to refer to some system header file to make sure.

The instance seem to be running in archive logging mode.

$ sqlplus

SQL*Plus: Release 9.2.0.4.0 - Production on Mi Okt 20 11:44:19 2004

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

Enter user-name: /as sysdba

Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
JServer Release 9.2.0.4.0 - Production

SQL> select log_mode from v$database;

LOG_MODE
------------
ARCHIVELOG

The broken data block problem I've been told was fixed meanwhile by the DBAs.

So there indeed doesn't seem to be a latent lurking HW defect.

Thus I gather that all performance and latency issues reported now from clients relate to shortage of memory rather than anything else.
I advised them to either get rid of a couple of processes (something they never would aggree to), or buy more RAM.

The highwater mark of device swap used since its initialization reads some 1.4 gigs, which comes close to the shm segs I summed up above.
So I guess many pages of it not being used were paged out to it once in a while.

# swapinfo -tam
Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 4096 1481 2615 36% 0 - 1 /dev/vg00/lvol2
dev 6664 0 6664 0% 0 - 2 /dev/vg00/lvol11
reserve - 6581 -6581
memory 4757 1007 3750 21%
total 15517 9069 6448 58% - 0 -

Do you have better suggestions what could be done to "increase performance" with the HW given (e.g. what not vital processes could be stopped)

Madness, thy name is system administration

Ralph Grothe · ‎10-19-2004

Oops, forgot to supply the Oracle banner (for the version)

$ echo "select banner from v\$version;"|ORACLE_SID=inwb01 sqlplus /as\ sysdba

SQL*Plus: Release 9.2.0.4.0 - Production on Mi Okt 20 12:21:36 2004

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
JServer Release 9.2.0.4.0 - Production

SQL>
BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
PL/SQL Release 9.2.0.4.0 - Production
CORE 9.2.0.3.0 Production
TNS for HPUX: Version 9.2.0.4.0 - Production
NLSRTL Version 9.2.0.4.0 - Production

SQL> Disconnected from Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
JServer Release 9.2.0.4.0 - Production

Madness, thy name is system administration

Yogeeraj_1 · ‎10-19-2004

hi ralph,

You will need to take your last backup of system01.dbf, restore it and roll it forward. Hopefully you are in archivelog mode and have been doing backups.

We can easily recover from this -- assumming you have backups. In general, in order to perform a block level recovery YOU MUST have used rman and be using rman, otherwise this feature is not available.

normally, if you cannot correct it via standard recovery techniques (from backups) please contact support for assistance.

let us know more about how it goes on.

regards
Yogeeraj

No person was ever honoured for what he received. Honour has been the reward for what he gave (clavin coolidge)

Volker Borowski · ‎10-20-2004

Hi Ralph,

well Metalink is THE information/software/support - source for Oracle. You surely should ask your DBAs to get an account. You can check out notes and guides for OS-specific implementation issues liek kernel-parameters, patches and else.

OK, the block-corruption is fixed !
Did they tell you how ?
Did they do export/import ?
Did the re-create indexes/temp-tables to do the recovery of "nologging"-actions ?

Both actions could have fixed the block-corruption but also did delete the statistics on all/some indexes involved.
May be the lack of performance now comes from the fact the the newly rebuild indexes have no statistics now, which would be a good reason, why the fix of the (assumed) index-corruption goes with decrease of performance.

Ask the DBAs if they recalculated the statistcs.

In addition, ask them, if in your database there is DB-activity with NOLOGGING involved ! In this case, even a clusterswitch may lead to those 1578s, because if the instance crashes or goes down with a shutdown abort. In this case, a 1578 on one of these objects is extremly likely.
You may need to adjust your cluster environment in this case.

Option one: Stop using NOLOGGING -> this is costy in terms of performance

Option two: Check your startup-procedures in case of a cluster-switch. It might not be sufficiant just to startup the database, but may be you need to run additional scripts to clear the objects created with nologging, before your application can continue to be productive.

Hope this helps
Volker

Brian Crabtree · ‎10-20-2004

Ralph,

Time to get new DBA's. :)

Ok, ways to get information on the SGA.

select * from v$sga;
select name,value from v$parameter where name in ('db_cache_size','pga_aggregate_target','sort_area_size','shared_pool_size','sga_max_size','processes','db_block_size','db_block_buffers');

Ok, this will give you a quick run-down of what is going on with the system. If they are using PGA_AGGREGATE_TARGET (non-zero number above) then you can add that to the SGA_MAX_SIZE and get a rough estimate of how much memory is being used on each database. If they are not using PGA_AGGREGATE_TARGET, then a really rough estimate of memory can be gotten by multiplying SORT_AREA_SIZE by PROCESSES.

You should also check the dbc_max_pct and dbc_min_pct UX kernel parameters. These should be set around 5-7 or so for a straight DB server, as oracle does its own buffering.

Again (can't stress this enough), the database with the corrupted SYSTEM datafile needs to be fixed first, before trying to tune performance. I would be suprised if the database was working with that, let alone performance problems with it.

Let me know if you have any questions.

Thanks, Brian

Mel_12 · ‎10-20-2004

It appears that Ralph is playing DBA here. Unless this instance is not production by nature, I strongly suggest the company get hold of a DBA. You don't know what Metalink stands for, you sound like you have not heard about SGA and perhaps only mundane knowledge of shared pool and RMAN.

Threads like this are better served if the person in need is encouraged to get their DBA to respond to questions. Specialized knowledge areas should be respected and not toyed with. There are unemployed DBAs out there.....

Regards,
Mel

Ralph Grothe · ‎10-20-2004

Hi all,

many thanks for your invaluable hints how to cope with Oracle inconsitancies.
Because, as Mel rightly stressed, I lack knowledge of even Oracle basics I asked our DBAs to have a look at this thread.
I cannot tell how they fixed things, and if everything state of the art that should be done in such circumstances has been excercised.
I however feel an urgent need to undergo Oracle trainig myself because ever too often DB related problems are burdened upon us Unix sysadmins from our clients.
Be it that our customers' DBAs lack detailed knowledge, be it they aren't available when disaster strikes, or be it they are simply reluctant to get their hands dirty or to take up responsibility.
The involved systems are productive and no playground at all.
And if something is getting screwed up I bet they will point at us in search for the culprit, although their sites indeed employ DBAs.
I find this all very unsatisfactory but we have to face reality, and educate ourselves.
That's why I'm so grateful for the existence of such a forum that experts like you fuel.
There is however a big drawback when you voluntarily take over tasks of "deviating trades" that it sticks with you in the end.
But only the curses (viz. expectations, demands, responsibility), and not the blessings (viz. acknowledgement, payment).

Madness, thy name is system administration

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: ORA-01578: ORACLE data block corrupted