Operating System - HP-UX
1834374 Members
2233 Online
110066 Solutions
New Discussion

Re: source of file corruptions?

 
Dave Chamberlin
Trusted Contributor

source of file corruptions?

I recently moved our production database from a K460 to an N. The 460 had 11 mirrors, mostly 9G, in HASS arrays. I did get 5 new mirrors (18G 10k RPM disks) with the new box, and the plan was to move 4 mirrors from the old server to the new server to balance the load. The mirrors could not be configured until we went live last week. On going live, 1 mirror had just redo logs, another just archive logs, and the other two had datafiles. A few hours later when our first nasty batch job started, I had trouble. The problem was a corruption in a redo log, which oracle says was caused by the archiver not keeping up. Both redo and archive logs were disks in the HASS array. Two days later, there was a corrupt block in an Oracle datafile (murphy made sure it was our largest inv table...). This datafile was on one of the HASS disks. Two days later, I had a bad block in another Oracle dafafile (an index this time), also in the HASS array. I also had another archiver error that in this case recovered, apparently waiting for the archiver to finish writing logs. This is the first corruption I have seen on my systems. Oracle says it is a hardware/OS issue. I believe this. HP cannot see any problems. There were two of EMS warnings in syslog, but no details on the warnings. There are firmware differences (hp02, hp03, hp14) on the older disks, but HP says they are all "current", and could not have caused the problems. The new box is well patched, and there are separate controllers for each side of the HASS. Being paranoid at this point, I have moved all data off of those disks, but now the system performance is suffering. I have to buy an all new array or make the old array safe. I can't use the "try this and see if you get corruption" approach. I would feel secure if I could cause failures as it is, make a change, and be unable to cause a failure. I don't believe in voodoo and don't want to start. Can anyone offer any wisdom here?
Has anyone had similar problems? Sorry I am so long winded - its been a fun week
5 REPLIES 5
Volker Borowski
Honored Contributor

Re: source of file corruptions?

Hi,

sounds like missing patches for async IO.
If this is SAP environment, did you check out the corresponding OSS-notes on operatingsystem patches required ?

First approach would be to turn off async IO and do a dbverify of the entire database.

Recreate the online redo logs.

No idea but hardware issues, if you do not use async IO.

Also it would be of help, if you can tell us your relaese of Oracle and OS, and if you have the database on 64 bit or on 32 bit, or if you did change this, when going from the K to the N.

Do not know if this helps
Volker
harry d brown jr
Honored Contributor

Re: source of file corruptions?

>>>I recently moved our production database from a K460 to an N. The 460 had 11 mirrors, mostly 9G, in HASS arrays. I did get 5 new mirrors (18G 10k RPM disks) with the new box, and the plan was to move 4 mirrors from the old server to the new server to balance the load. The mirrors could not be configured until we went live last week.<<<

Can you explain what the relationship between the ???mirrors??? are to logical volumes to hass racks?

Are you calling a disk drive a mirror or a logical volume or a volume group?

Are you splitting the hass rack and mirroring between the two sides, or are you mirroring within the same side of the hass rack?

Are you mirroring to like disks?

I know that???s a lot of questions, but we have hundreds of hass racks, and we haven???t encounter any kind of failure like that.


Live free or die
harry
Live Free or Die
Dave Chamberlin
Trusted Contributor

Re: source of file corruptions?

More info:
Oracle is 7.3.4, 32 bit
K460 was 10.2, N is 11.0
I do not know if I am using async IO - how do I check?
The HASS has 8 bays, set up as 4 vg 1 lv each top to bottom, disks mirrored left/right with separate controllers left/right.
harry d brown jr
Honored Contributor

Re: source of file corruptions?

Are you using raw or filesystems for oracle?

live free or die
harry
Live Free or Die
Dave Chamberlin
Trusted Contributor

Re: source of file corruptions?

We are using filesystems. One difference though - on the old server, all the Oracle data is on HFS filesystems, all are vxfs on the new server.
The fstab entries on the old system were:
/dev/vg09/u09 /u09 hfs rw,suid 0 2
The entries on the new are:
/dev/vg09/u09 /u09 vxfs delaylog,largefiles 0 2