Operating System - OpenVMS
1839157 Members
3451 Online
110136 Solutions
New Discussion

Re: Issues with Oracle and VMS Host Based Volume Shadowing?

 
SOLVED
Go to solution
Dave Gudewicz
Valued Contributor

Issues with Oracle and VMS Host Based Volume Shadowing?

An Oracle DBA has said that Oracle and VMS Host Base Volume Shadowing has problems/issues.

I've not heard of this before, but thought I'd ask this group in case I missed something along the way.

Let's assume current versions for starters.

Dave...
34 REPLIES 34
Ian Miller.
Honored Contributor

Re: Issues with Oracle and VMS Host Based Volume Shadowing?

Can this person be more specific? Sounds like a misplaced rumor to me. HBVS works just fine for me.
____________________
Purely Personal Opinion
Dave Gudewicz
Valued Contributor

Re: Issues with Oracle and VMS Host Based Volume Shadowing?

>Can this person be more specific?

I asked the question as it was given to me, so therefore its as specific as can be.

Dave...
Kris Clippeleyr
Honored Contributor

Re: Issues with Oracle and VMS Host Based Volume Shadowing?

Dave,

Never had problems with HBVS combined with Oracle Rdb, or Oracle Oracle.
So, is this one of those rumors to blacken VMS?
Greetz,
Kris (aka Qkcl)
I'm gonna hit the highway like a battering ram on a silver-black phantom bike...
Dave Gudewicz
Valued Contributor

Re: Issues with Oracle and VMS Host Based Volume Shadowing?

Kris,

I don't think this is a blacken VMS issue.

Not sure about the source of the question; an article somewhere? someone's experience? rumor? donno.

I'll ask the DBA.

Dave...
Dave Gudewicz
Valued Contributor

Re: Issues with Oracle and VMS Host Based Volume Shadowing?

More info on this question. Excerpts from an email message sent to the DBA.

*******************

Will Oracle support work to resolve data corruption issues with non-supported configurations?

Our policy is we will help them with anything that happens on a supported platform or can be simulated on a supported platform.

So how does that tie into volume shadowing?

In case a data block is corrupted and its due to volume shadowing we will try to simulate that on a setup without volume shadowing if it works will not support it if it doesn't then we will.

******************

The last sentence above seems to be misworded. I take it to mean, if it (VS) works, then we'll support it, if not then we won't.

Now my comments:

Is a VMS system with HBVS considered a non-supported configuration by Oracle?

Is VMS HBVS **officially** supported by Oracle? If no, please cite the reference.


Dave...
Dale A. Marcy
Trusted Contributor

Re: Issues with Oracle and VMS Host Based Volume Shadowing?

Dave posted:

"In case a data block is corrupted and its due to volume shadowing we will try to simulate that on a setup without volume shadowing if it works will not support it if it doesn't then we will.

******************

The last sentence above seems to be misworded. I take it to mean, if it (VS) works, then we'll support it, if not then we won't."

I read this as saying they will try to replicate the problem on a system that does not use volume shadowing. If it works on a non-volume shadowing system, then they will not support it. If it doesn't work on the non-volume shadowing system, then they will support it.

We have used several versions of Oracle here with several versions of VMS and have not had any problems with it and volume shadowing.
Robert Brooks_1
Honored Contributor
Solution

Re: Issues with Oracle and VMS Host Based Volume Shadowing?

It was written . . .

in case a data block is corrupted and its due to volume shadowing we will try to simulate that on a setup without volume shadowing if it works will not support it if it doesn't then we will.

---

My colleagues and I would be quite interested in hearing of *any* documented case where HBVS corrupted the data. While I won't state definitively that it has never happened, it is ***very*** unlikely that any detected corruption was caused by HBVS.

I'd tend to look at potential underlying hardware problems (disk, controller), rather than attempting to blame HBVS.

Assuming that corruption is detected, what are the criteria used to state that HBVS is at fault?

-- Rob
Jim Geier_1
Regular Advisor

Re: Issues with Oracle and VMS Host Based Volume Shadowing?

I am aware of a large OpenVMS/Oracle/Cerner site in Orange, California where they are struggling with a similar problem. They are running OpenVMS 7.3-2 on a cluster of a GS1280 in two hard partitions, two GS80, and one ES47, most current patch kits, disks shadowed across two HSG80-based EMA 12000 cabinets. HSG80 controllers all at V8.8-2, disks all configured as JBOD at the controller level.

It seems that on more than one occurrence, a disk fails but did not drop out of the shadow set. The definition of "disk fails" is that hundreds or thousands of uncorrectable errors are generated. Typically I would expect this disk to fail out of the shadow set. But the disk remained inthe shadow set, and is seems that corrupt data got copied to the non-failed disk, and thus the database becomes corrupted.

The challenge is how to prove or disprove the viability of HBVS in such a scenario. How does one simulate such an error or series of errors. There have been more than one instance of the problem in recent weeks, leading to a severe mistrust of HBVS.
VMS Support
Frequent Advisor

Re: Issues with Oracle and VMS Host Based Volume Shadowing?

That does sound nasty. We run Oracle with HBS and are looking at going to 7.3-2 (Currently at 7.3- third party supplier reasons) in the near future. Should HP send out an alert about this. I would not be a happy bunny if we upgrade and end up with corrupt data due to a suspected issue with 7.3-2 and HBS. This would be a bad thing for OpenVMS and the trust we have in the O/S and HBS. In the same way we trust DFO not to corrupt data. I don't want to be having to worry about things in the O/S that should be stable at this point. How long has HBS been about ?
Rob Young_4
Frequent Advisor

Re: Issues with Oracle and VMS Host Based Volume Shadowing?

It seems that on more than one occurrence, a disk fails but did not drop out of the shadow set. The definition of "disk fails" is that hundreds or thousands of uncorrectable errors are generated. Typically I would expect this disk to fail out of the shadow set. But the disk remained inthe shadow set, and is seems that corrupt data got copied to the non-failed disk, and thus the database becomes corrupted.

---

Did the disk fail? No. Just ring up errors.
So at what point does HBVS kick out a disk?
1 error? No - probably revectored a bad
block. 10 errors? 20 errors? Or does
HBVS kick out the disk if the shadow member
is unresponsive?

You could see how errors on one disk are
migrated to the other though via read/update/write cycle.

So is HSG80 firmware more tolerant and keeping around a disk that is having "issues?"

Now I realize most of what I write above could be misinterpreted. I'm asking, trying
to drill down - no smileys or sarcasm intended.

Rob
Bill Hall
Honored Contributor

Re: Issues with Oracle and VMS Host Based Volume Shadowing?

Jim, VMSSupport,

The problem is not with HBVS, but with VMS and more specifically the HSG80. I saw the file corruption on two ES40 development/test cluster running VMS 7.3-1 and NO HBVS, using a single HSG80 pair, 8.7 firmware. Eight six member raidsets, all 18GB drives. I'm sure Rob can verify this has been easy to reproduce and can verify that I/O throttling has been tweaked and re-tweaked several times in 7.3-2 fibre-scsi ECOs in an attempt to resolve this.

It suprises me that the Cerner site you mentioned has so much cpu capacity and so very very little storage I/O capacity. I would have expected multiple EVAs or even an XP. I thought a 1GB SAN and the HSG80 was nice compared to two CIs and multiple HSG50s, but VMS and ES40s can make good use of 2GB pipes, HBR, HBVS and an XP1024 ;-).

Bill
Bill Hall
Benjamin Levy
Frequent Advisor

Re: Issues with Oracle and VMS Host Based Volume Shadowing?

2 suggestions on how to increase the chances that VMS will expel a disk from a shadow set if it is logging many errors:

1) Adjust SYSGEN parameter SHADOW_SYS_DISK. We have it set to 53249 decimal, equals D001 hex. I think the bits in that letter D contribute to expulsion of failing shadow set members, not just on the system disk but on application disks as well.

2) Upgrade to HSG80 ACS V8.8F-2, and set HOST_REDUNDANT parameter.
Robert Brooks_1
Honored Contributor

Re: Issues with Oracle and VMS Host Based Volume Shadowing?

Jim wrote . . .

It seems that on more than one occurrence, a disk fails but did not drop out of the shadow set. The definition of "disk fails" is that hundreds or thousands of uncorrectable errors are generated. Typically I would expect this disk to fail out of the shadow set. But the disk remained inthe shadow set, and is seems that corrupt data got copied to the non-failed disk, and thus the database becomes corrupted.

The challenge is how to prove or disprove the viability of HBVS in such a scenario. How does one simulate such an error or series of errors. There have been more than one instance of the problem in recent weeks, leading to a severe mistrust of HBVS.

-------------

Did the shadow set in question undergo a merge after the member disk began failing?

Has a call to the CSC been logged?
If it has, I do not think that it has not made its way to VMS Engineering yet.

Jim Geier_1
Regular Advisor

Re: Issues with Oracle and VMS Host Based Volume Shadowing?

The large Cerner site has purchased an XP 12000 storage system, and will be migrating to that over the next several months. And, since we all know that Oracle loves CPU power, the plan is to replace the two GS80s with another 32-processor GS1280.

I have learned that the "failed" disk went into mount-verify and then that timed out, so there was no merge.

They are running ACS V8.8-2. Do not know about host redundant, but I'll look at that.

All of the relevant HP hardware and software support groups are involved.

The outstanding question is how does one simulate a failure to verify that HBVS will drop a disk out of a shadow set?
Robert Brooks_1
Honored Contributor

Re: Issues with Oracle and VMS Host Based Volume Shadowing?

Jim wrote . . .

I have learned that the "failed" disk went into mount-verify and then that timed out, so there was no merge.

--
If there was no merge, then it is hard to see how HBVS could have "replicated" the bad data from one member to another.

Are you saying the device went into mount verification, mount verification timed out, and the member was not thrown out of the virtual unit?

Was connectivity to the remaining member(s) intact, or did all members lose connectivity.

Please note that with shadow sets, it is only the virtual unit that will undergo mount verification; the individual members will *not* undergo mount verification.

Jim also wrote . . .

The outstanding question is how does one simulate a failure to verify that HBVS will drop a disk out of a shadow set?

---
Easy. You can disable the LUN at the controller level, or you can disable the relevant paths at the fibre channel switch.


We are aware of some issues that prevent members from being tossed out in a timely manner, but these problems manifest themselves as the virtual unit being in mount verification longer than it should -- the failing member should be removed in SHADOW_MBR_TMO seconds, which is usually set way below MVTIMEOUT. The virtual unit cannot exit mount verification until one of two things happens -- either the member gets removed or the connectivity to the member is restored. Note that for a single member shadow set, the member is never removed.
Marc Van den Broeck
Trusted Contributor

Re: Issues with Oracle and VMS Host Based Volume Shadowing?

Hi dave,

when we upgraded to Oracle8i we moved all our Oracle database files to non VMS host based shadowing discs because I read an article from Oracle that states that there could be a performance issue.

Rgds
Marc
VMS Support
Frequent Advisor

Re: Issues with Oracle and VMS Host Based Volume Shadowing?

"Jim, VMSSupport,

The problem is not with HBVS, but with VMS and more specifically the HSG80. I saw the file corruption on two ES40 development/test cluster running VMS 7.3-1 and NO HBVS, using a single HSG80 pair, 8.7 firmware. Eight six member raidsets, all 18GB drives. I'm sure Rob can verify this has been easy to reproduce and can verify that I/O throttling has been tweaked and re-tweaked several times in 7.3-2 fibre-scsi ECOs in an attempt to resolve this."

So what we are saying is that when we upgrade to 7.3-2 and use HSG80 (As we do)...A attempt has been made to tweak OpenVMS 7.3-2 so we will not get the odd file go corrupt. Oh dear.
Just hope its not an important file...that stops our production day :-)


John Donovan_4
Frequent Advisor

Re: Issues with Oracle and VMS Host Based Volume Shadowing?

We have been successfully running 7.0-65 & 7.1-41 of Oracle Rdb for a year with the following configuration:

OpenVMS 7.3-2
VS (image ident X-23)
DS25
SAN & VS sets
HSG80's
36GB ultra SCSI-3

I have also been running Oracle Rdb and VS on our Canadian server for 10 years now without any issues. I am the DBA & system admin.
"Difficult to see, always in motion is the future..."
Bill Hall
Honored Contributor

Re: Issues with Oracle and VMS Host Based Volume Shadowing?

VMSSupport,

All I'm saying is that I don't believe the issues with the cited Cerner site lies with HBVS. I believe their problem is with their HSG80s, VMS I/O tuning and the ability of VMS to overrun the HSG80's I/O request queue.

I recieved a copy of the following from our FE earlier this year. You should also read the release notes for HSG80 ACS v8.8-2 and the sited VMS 8.2 documentaion below that also applies to VMS 7.3-n.

Greetings,

Recent testing with both EVA and XP arrays indicates OpenVMS must be
tuned to optimize both the host and array performance. In particular,
OpenVMS hosts should reduce the number of asynchronous, concurrent I/Os
per process when accessing SAN storage arrays with an intelligent cache.
This applies to both EVA storage arrays and XP disk arrays.

This recommendation opposes recommendations OpenVMS engineering
previously provided. OpenVMS engineering suggested setting DIOLM to
4,096 or higher for the backup account. For OpenVMS VAX systems, this
helped reduce processing overhead.

With the advent of AlphaServer and Itanium based systems, the CPU
architecture minimizes this overhead. In fact, these high DIOLM values
adversely impact performance of SAN storage arrays with cache
subsystems, which predicts I/O activity. This is especially true during
large sequential read streams, where the array attempts to prefetch data
to satisfy the large read stream.

Even with CI based storage arrays, larger AlphaServer systems with large
DIOLM values cause problems with normal SCS communications. Dropping
DIOLM reduces errors in CI environments as well as with SAN storage.

Here are the symptoms of high DIOLM values:

* on HSJs high DIOLM values cause virtual circuit timeouts to the
controllers
* on HSGs high DIOLM values cause the HSG80 to hang / crash
* on EVAs high DIOLM values causes very high Read Hit rates to
VDisks
* on XPs high DIOLM values causes very poor I/O throughput

On both the EVA and XP arrays, a high DIOLM value causes the cache
management firmware to 'thrash'. That is, the firmware can not
effectively utilize cache for what should be a few large sequential
reads. Decreasing DIOLM allows these arrays to more effectively
prefetch data for these large sequential reads. Dropping DIOLM
dramatically improves array performance.

To address this issue, OpenVMS engineering changed the documented
recommendations for DIOLM and Working Set Sizes in the V8.2
documentation set. The documentation now recommends setting DIOLM to
100. When accessing XP disk arrays, experience shows DIOLM at 4 to 8
provides ideal performance. Some testing may be required to determine
the best value. However, with XP disk arrays use 100 as the top value
for DIOLM. Also note that WSQUOTA and WSEXTENT contribute heavily to
the performance of OpenVMS BACKUP.

PQL_MDIOLM must also be decreased, since it sets a system wide minimum
value for DIOLM. Systems that use DECwindows and/or Motif may need to
dynamically change PQL_MDIOLM during backup operations.

New tuning values for OpenVMS BACKUP can now be found at:


http://h71000.www7.hp.com/doc/82FINAL/aa-pv5mj-tk/00/01/117-con.html

This URL points to section 11.7, titled "Setting Software Parameters for
Efficient Backups". This is in the "HP OpenVMS System Manager's
Manual, Volume 1: Essentials" manual.

While this email focuses on OpenVMS BACKUP, recent experiences
demonstrate this may be an issue with other products, such as the HP
Disk File Optimizer for OpenVMS.


Regards,
=jbf=

John B. Fisher
OpenVMS/Storage Complex Problem Manager
Global Complex Solutions Management, GSE
Hewlett-Packard Company


Bill Hall
Dave Gudewicz
Valued Contributor

Re: Issues with Oracle and VMS Host Based Volume Shadowing?

I was out for a couple of days and I see that this thread has had a few interesting additions since my last visit.

I also noticed the "magical answer" icon, but I'm not sure this thread is ready for the bunny just yet.

Dave...
Ian Miller.
Honored Contributor

Re: Issues with Oracle and VMS Host Based Volume Shadowing?

replies given 8 or more points result in the bunny symbol.
____________________
Purely Personal Opinion
John Donovan_4
Frequent Advisor

Re: Issues with Oracle and VMS Host Based Volume Shadowing?

Oh sure Dave, give Robert a 10 and me a 3!
I guess I should have used a lot more words and gotten a higher score. I suppose using HBVS on an OpenVMS server w/ Rdb for 10 years only rates a 3!?! (just kidding)

John
"Difficult to see, always in motion is the future..."
Dave Gudewicz
Valued Contributor

Re: Issues with Oracle and VMS Host Based Volume Shadowing?

"Oh sure Dave, give Robert a 10 and me a 3!
I guess I should have used a lot more words and gotten a higher score. I suppose using HBVS on an OpenVMS server w/ Rdb for 10 years only rates a 3!?! (just kidding)"

John,

Reason for the 3 was not because of lack of words. It was because my question was related to Oracle classic and not Rdb, but I'll give you a "10" for your last reply!

Dave...
John Donovan_4
Frequent Advisor

Re: Issues with Oracle and VMS Host Based Volume Shadowing?

Oh I'm sorry I thougth you were talking about a REAL database! Thanks for the 10! :^))
"Difficult to see, always in motion is the future..."