HPE GreenLake Administration
- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: OS vs Oracle on failing drive
Operating System - HP-UX
1833198
Members
2805
Online
110051
Solutions
Forums
Categories
Company
Local Language
back
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
back
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Blogs
Information
Community
Resources
Community Language
Language
Forums
Blogs
Go to solution
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-25-2004 01:22 AM
10-25-2004 01:22 AM
Hi,
I just want to share this story and get your comments.
A couple days ago, a production Oracle DB halted twice with the following errors in alert.log:
ARC0: Beginning to archive log# 5 seq# 47
ARC0: Failed to archive log# 5 seq# 47
Thu Oct 21 12:50:24 2004
Log corruption near block 224 change time
All Archive destinations made inactive
ARC1: Failed to archive log# 5 seq# 47
ARCH: Archival stopped, error occurred. Will continue retrying Thu Oct 21 12:50:24 2004 ORACLE Instance PLIN - Archival Error
ARCH: Connecting to console port...
Thu Oct 21 12:50:24 2004
ORA-16038: log 5 sequence# 47 cannot be archived
ORA-00354: corrupt redo log block header
ORA-00312: online log 5 thread 1: '/u03/data/oradata/PLIN/log5.log'
Basically the redo logs became corrupt. This kind of error pointed to the HW, i.e bad disk drive and DBA moved all redo logs off the local drives and put them onto SAN storage.
The local drives was vg01, 8 drives, 4 drives mirrored over the other four. We ran diskinfo and dd on all drives - no errors at all. We did not see any errors in syslog.log either.
On a weekend, when rebooting this server (K460 , running HP-UX 11.00) I figured that one drive did fail and it was replaced.
Now I feel kind of uneasy... It looks like Oracle figured a bad drive prior to OS started to report errors. Moreover, even though the drive was mirrored, it really did not help at all! For some reason one failing drive in a mirrored pair caused a production problem. Is there anything I can do about this other than move data off the local drives?
Your opinions are appreciated.
Elena.
I just want to share this story and get your comments.
A couple days ago, a production Oracle DB halted twice with the following errors in alert.log:
ARC0: Beginning to archive log# 5 seq# 47
ARC0: Failed to archive log# 5 seq# 47
Thu Oct 21 12:50:24 2004
Log corruption near block 224 change time
All Archive destinations made inactive
ARC1: Failed to archive log# 5 seq# 47
ARCH: Archival stopped, error occurred. Will continue retrying Thu Oct 21 12:50:24 2004 ORACLE Instance PLIN - Archival Error
ARCH: Connecting to console port...
Thu Oct 21 12:50:24 2004
ORA-16038: log 5 sequence# 47 cannot be archived
ORA-00354: corrupt redo log block header
ORA-00312: online log 5 thread 1: '/u03/data/oradata/PLIN/log5.log'
Basically the redo logs became corrupt. This kind of error pointed to the HW, i.e bad disk drive and DBA moved all redo logs off the local drives and put them onto SAN storage.
The local drives was vg01, 8 drives, 4 drives mirrored over the other four. We ran diskinfo and dd on all drives - no errors at all. We did not see any errors in syslog.log either.
On a weekend, when rebooting this server (K460 , running HP-UX 11.00) I figured that one drive did fail and it was replaced.
Now I feel kind of uneasy... It looks like Oracle figured a bad drive prior to OS started to report errors. Moreover, even though the drive was mirrored, it really did not help at all! For some reason one failing drive in a mirrored pair caused a production problem. Is there anything I can do about this other than move data off the local drives?
Your opinions are appreciated.
Elena.
Solved! Go to Solution.
3 REPLIES 3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-25-2004 01:32 AM
10-25-2004 01:32 AM
Re: OS vs Oracle on failing drive
I've had this happen to me in the past.
1) Shut the databaase and get a gold backup.
2) Use cstm or mstm or xstm (X wind) and run the excercize command on every disk in the system.
3) dmesg or vi /var/adm/syslog/syslog.log
If you find a bad disk arrange replacement.
These systems generally have larger numbers of small disks. Its unlikely though entirely possible that Oracle and the boot disk are the same disk.
I hope you have been doing make_tape_recovery tapes handy.
Its always a good idea to have vg00 seperate from your oracle data.
SEP
1) Shut the databaase and get a gold backup.
2) Use cstm or mstm or xstm (X wind) and run the excercize command on every disk in the system.
3) dmesg or vi /var/adm/syslog/syslog.log
If you find a bad disk arrange replacement.
These systems generally have larger numbers of small disks. Its unlikely though entirely possible that Oracle and the boot disk are the same disk.
I hope you have been doing make_tape_recovery tapes handy.
Its always a good idea to have vg00 seperate from your oracle data.
SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-25-2004 01:37 AM
10-25-2004 01:37 AM
Re: OS vs Oracle on failing drive
Hi,
In my openion OS shall be latest patched and also latest diagnostics, monitoring tool shall be installed....
Besides this, redo logs shall have multiple copies on the server, 3-copies I believe ateast shall be there for latest logs..And if possible keep the same on contigency copy also...You shall be running some script to copy logs of some time back only...
Hope this helps..
Prashant
In my openion OS shall be latest patched and also latest diagnostics, monitoring tool shall be installed....
Besides this, redo logs shall have multiple copies on the server, 3-copies I believe ateast shall be there for latest logs..And if possible keep the same on contigency copy also...You shall be running some script to copy logs of some time back only...
Hope this helps..
Prashant
"Intellect distinguishes between the possible and the impossible; reason distinguishes between the sensible and the senseless. Even the possible can be senseless."
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-25-2004 02:02 AM
10-25-2004 02:02 AM
Solution>> even though the drive was mirrored, it really did not help at all! For some reason one failing drive in a mirrored pair caused a production problem
Well, Oracle does do basic sanity checking on the data. Thus it can report data problems, without IO errors.
The mirroring may actually hinder in finding a problem. Just imagine the HBA / cable injects a bad bits for the write to one of the members. Or one of the members does nto faithfully write through. Now it is going to be pot-luck ads to whether you see good data or bad data. You may be reading froma good disk most of the time, but under heavier load, you may get data from the other member, over a problem path.
>> Besides this, redo logs shall have multiple copies on the server, 3-copies I believe ateast shall be there for latest logs..And if possible keep the same on contigency copy also...You shall be running some script to copy logs of some time back only...
So you have multiple redo groups in Oracle with multiple members within each group, each of those members being LVM mirrored (for 4+ data copies). Here I would have expected Oracle to do the rigth thing when one member deliver doubtfull data.
fwiw,
Hein.
The opinions expressed above are the personal opinions of the authors, not of Hewlett Packard Enterprise. By using this site, you accept the Terms of Use and Rules of Participation.
Company
Events and news
Customer resources
© Copyright 2025 Hewlett Packard Enterprise Development LP