HPE GreenLake Administration
- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Database corruption with failed drive and lvsync
Operating System - HP-UX
1836579
Members
2212
Online
110102
Solutions
Forums
Categories
Company
Local Language
back
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
back
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Blogs
Information
Community
Resources
Community Language
Language
Forums
Blogs
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-16-2004 04:14 PM
05-16-2004 04:14 PM
Database corruption with failed drive and lvsync
I had a disaster two weeks ago that resulted in database corruption and data loss. We were trying to attach another disk tray to a FC Raid controller online (no power down, database running on RAID on other disks on the controller). Trouble was that the disk tray has an ID that deternines the ID of the disks to the RAID controller, and the ID of the new tray was the same as one of the existing ones (we didn't know to check this).
OK, that was the first stupid mistake. So the database was running on mirrored logical volumes, and when the disk tray was brought online on the one RAID controller, it freaked out and that volume went offline. So the database (Oracle) continued to run without issue, reading and writing to the mirror.
When this happened, I immediately powered the RAID controller down, took out the new disk tray, called the mfg. and found out about the ID conflict. So I disconnected the new disk tray and powered up the controller and disks again, as they were originally. One of the disks in the RAID was marked "bad" and was automatically replaced by the online spare, and the volume was rebuilding from parity.
I don't recall doing it, but I must have run lvsync while the RAID was rebuilding. Apparently, something in this combination caused data corruption. The database was still running throughout this process. Strangely, I see some messages in the syslog file about SCSI write errors and it looks like the FC connection was going up and down while the lvsync was running.
Anyway, I ended up with corruption in some Oracle data files and some archive logs were corrupt, so I couldn't restore back to the current time from backup.
Anybody ever seen this before? What did I do wrong? (aside from get up that morning?)
See attached section of the syslog file during the time this occurred.
OK, that was the first stupid mistake. So the database was running on mirrored logical volumes, and when the disk tray was brought online on the one RAID controller, it freaked out and that volume went offline. So the database (Oracle) continued to run without issue, reading and writing to the mirror.
When this happened, I immediately powered the RAID controller down, took out the new disk tray, called the mfg. and found out about the ID conflict. So I disconnected the new disk tray and powered up the controller and disks again, as they were originally. One of the disks in the RAID was marked "bad" and was automatically replaced by the online spare, and the volume was rebuilding from parity.
I don't recall doing it, but I must have run lvsync while the RAID was rebuilding. Apparently, something in this combination caused data corruption. The database was still running throughout this process. Strangely, I see some messages in the syslog file about SCSI write errors and it looks like the FC connection was going up and down while the lvsync was running.
Anyway, I ended up with corruption in some Oracle data files and some archive logs were corrupt, so I couldn't restore back to the current time from backup.
Anybody ever seen this before? What did I do wrong? (aside from get up that morning?)
See attached section of the syslog file during the time this occurred.
2 REPLIES 2
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-17-2004 06:20 PM
05-17-2004 06:20 PM
Re: Database corruption with failed drive and lvsync
Hi,
It is preferable to get down time for adding and JBOD to avoid any problem. Then check from ODE that new addition is visible
"Remember - Precaution is always better than cure"
Fine, ID conflict was there, even though, having power down the new JBOD, you should not execute any command during rebuild operation. After rebuild ops , you need to take other steps...
Anyway, we learn from mistakes...
Cheers ..
NH
It is preferable to get down time for adding and JBOD to avoid any problem. Then check from ODE that new addition is visible
"Remember - Precaution is always better than cure"
Fine, ID conflict was there, even though, having power down the new JBOD, you should not execute any command during rebuild operation. After rebuild ops , you need to take other steps...
Anyway, we learn from mistakes...
Cheers ..
NH
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-18-2004 04:37 AM
05-18-2004 04:37 AM
Re: Database corruption with failed drive and lvsync
Yes, definitely a lesson learned here. If the IDs hadn't been in conflict, everything would have been smooth though. I turned a situation where I was looking for NO downtime into a considerable amount of downtime.
I was trying to piece together how this data corruption occurred across the mirror. I'm thinking that when the drive tray was plugged in on the RAID on one side of the Mirror/UX mirror, it corrupted the data on the RAID volume there. There was about 30 seconds elapsed befoe the RAID controller alarm went off and the RAID was disabled there. I think that Oracle and/or the OS could have requested many data blocks from that volume and then written back to both sides of the mirror, corrupting data on both mirror copies. Then there is the fact that the RAID controller was rebooted and stated rebuilding and the mirror resyncing, possibly with corrupt data on the stale drive.
I was trying to piece together how this data corruption occurred across the mirror. I'm thinking that when the drive tray was plugged in on the RAID on one side of the Mirror/UX mirror, it corrupted the data on the RAID volume there. There was about 30 seconds elapsed befoe the RAID controller alarm went off and the RAID was disabled there. I think that Oracle and/or the OS could have requested many data blocks from that volume and then written back to both sides of the mirror, corrupting data on both mirror copies. Then there is the fact that the RAID controller was rebooted and stated rebuilding and the mirror resyncing, possibly with corrupt data on the stale drive.
The opinions expressed above are the personal opinions of the authors, not of Hewlett Packard Enterprise. By using this site, you accept the Terms of Use and Rules of Participation.
Company
Events and news
Customer resources
© Copyright 2025 Hewlett Packard Enterprise Development LP