Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Shadowing problem

 
SOLVED
Go to solution
dschwarz
Frequent Advisor

Shadowing problem

On one node of our OpenVMS Cluster we observe
growing error counter for a single member shadow
set without errors on the member disk.

Output of 'SHOW DEVICE D' looks like this:

Device Device Error
Name Status Count
DSA100: Mounted 24
...
$1$DGA100: (BUCL01) ShadowSetMember 0 (member of DSA100:)

Cluster consists of 2 DS20E, OpenVMS Version is 7.3-2,
Disk is MSA1000 unit, shadow set is mounted on both cluster members.

DIAGNOSE shows:
...
IOSB x000000000000005C
DATACHECK - write check error
...

Any suggestions ?
12 REPLIES 12
Jon Pinkley
Honored Contributor

Re: Shadowing problem

dschwarz,

Please provide output (in plain text attachment) from the following:

$ mcr sysman set env/cluster
SYSMAN> do show device/ful dsa100

(Cut and paste output to notepad, save with .txt extension, and save as attachment with reply.)

Jon
it depends
dschwarz
Frequent Advisor

Re: Shadowing problem

hi Jon,

attached you will find sysman output
(at least I hope so).
Jon Pinkley
Honored Contributor

Re: Shadowing problem

I don't see any obvious problems, and I can't explain the errors on the DSA virtual unit, at least assuming that DSA100 has been as single member shadowset for the duration of the boot.

Are other members being added and removed, for instance for "backup snapshots"?

What device did the DATACHECK write check error appear under?

Jon

P.S. I had no trouble reading the attachment.
it depends
dschwarz
Frequent Advisor

Re: Shadowing problem

At the time in question here, we didn't add/remove memebrs to/from the shadow set.

All errors appear under DSA100 as you can see in the attached DIAGNOSE extract.

This system has another seven(!) single member shadow sets mounted. None of them has ever shown errors on th dsaxxx device.

Hoff
Honored Contributor

Re: Shadowing problem

ANALYZE /DISK /SHADOW the shadow master?

IIRC, this command arrived via shadowing ECO kit in V7.3-2. And on the topic of ECOs, if you're not current on your ECOs and your firmware, get there.

If you're current on your patches and firmware, it might also pay off to engage your hardware support organization here, as there's clearly something a little wonky here.

Volker Halle
Honored Contributor
Solution

Re: Shadowing problem

You've set your DSA100: shadowset virtual unit for DATA CHECK on all WRITE-operations (SET VOLUME/DATA_CHECK=WRITE DSA100:), this is quite unusual. Did you change that recently ? Why ?

Do those errors disappear, if you turn it off ? Does shadowing correctly handle this setting ?

Volker.
dschwarz
Frequent Advisor

Re: Shadowing problem

Hoff,

ANA/DISK/SHADOW compares the disk with itself. As expected there were no errors.

We are not completely up to date with firmware and patches. I shall install the latest firmware and patches as soon as possible.

Volker,

DATA_CHECK on write operations has been enabled in 2002 because there were mysterious problems with Oracle Rdb. It didn't help solving those problems but nobody turned it off. We will try this now, because this device is in fact the only with DATA_CHECK turned on.

But what's wrong with DATA_CHECK enabled except slowing down I/O ?
Wim Van den Wyngaert
Honored Contributor

Re: Shadowing problem

Seems a bug in shadowing to me.
http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&taskId=110&prodSeriesId=459922&prodTypeId=18964&prodSeriesId=459922&objectID=c01462103 so turn the write check off.

Also try a anal/dis/read of your disk (will read all data on disk !) to see if there are parity errors (bad blocks). It shouldn't give the errors because there are no hardware errors but it could give more info about what is wrong.

Wim
Wim
Jur van der Burg
Respected Contributor

Re: Shadowing problem

Ah, yes I remember. An application may cause this by issueing an async datacheck i/o and clobbering the buffers before the datacheck completes.

Fwiw,

Jur.
debu_17
Occasional Visitor

Re: Shadowing problem

I feel that the errors due "Data Check on write" may have something to do with , the write cache for this disk, as the cache may be shared with the main disk for which dsa100 is the shadow volume.
2. as already suggested the async operations
may modify the data before the write check completes.

dschwarz
Frequent Advisor

Re: Shadowing problem

Your answers helped me understanding what is happening on the system. I now have an idea why this error occurs and why it doesn't happen very often.

Thank you very much.

I turned DATA_CHECK off.

I hope this will solve the problem, but as this is not seen very often, it will take some months maybe years until we can be quite sure.
John Hockett
Occasional Visitor

Re: Shadowing problem

Curious - to seeing this problem were there any kits installed - you can send the PRODUCT SHOW HISTORY to john.hockett@hp.com

Thanks