1752587 Members
3793 Online
108788 Solutions
New Discussion юеВ

Shadowing problem

 
SOLVED
Go to solution
dschwarz
Frequent Advisor

Shadowing problem

On one node of our OpenVMS Cluster we observe
growing error counter for a single member shadow
set without errors on the member disk.

Output of 'SHOW DEVICE D' looks like this:

Device Device Error
Name Status Count
DSA100: Mounted 24
...
$1$DGA100: (BUCL01) ShadowSetMember 0 (member of DSA100:)

Cluster consists of 2 DS20E, OpenVMS Version is 7.3-2,
Disk is MSA1000 unit, shadow set is mounted on both cluster members.

DIAGNOSE shows:
...
IOSB x000000000000005C
DATACHECK - write check error
...

Any suggestions ?
12 REPLIES 12
Jon Pinkley
Honored Contributor

Re: Shadowing problem

dschwarz,

Please provide output (in plain text attachment) from the following:

$ mcr sysman set env/cluster
SYSMAN> do show device/ful dsa100

(Cut and paste output to notepad, save with .txt extension, and save as attachment with reply.)

Jon
it depends
dschwarz
Frequent Advisor

Re: Shadowing problem

hi Jon,

attached you will find sysman output
(at least I hope so).
Jon Pinkley
Honored Contributor

Re: Shadowing problem

I don't see any obvious problems, and I can't explain the errors on the DSA virtual unit, at least assuming that DSA100 has been as single member shadowset for the duration of the boot.

Are other members being added and removed, for instance for "backup snapshots"?

What device did the DATACHECK write check error appear under?

Jon

P.S. I had no trouble reading the attachment.
it depends
dschwarz
Frequent Advisor

Re: Shadowing problem

At the time in question here, we didn't add/remove memebrs to/from the shadow set.

All errors appear under DSA100 as you can see in the attached DIAGNOSE extract.

This system has another seven(!) single member shadow sets mounted. None of them has ever shown errors on th dsaxxx device.

Hoff
Honored Contributor

Re: Shadowing problem

ANALYZE /DISK /SHADOW the shadow master?

IIRC, this command arrived via shadowing ECO kit in V7.3-2. And on the topic of ECOs, if you're not current on your ECOs and your firmware, get there.

If you're current on your patches and firmware, it might also pay off to engage your hardware support organization here, as there's clearly something a little wonky here.

Volker Halle
Honored Contributor
Solution

Re: Shadowing problem

You've set your DSA100: shadowset virtual unit for DATA CHECK on all WRITE-operations (SET VOLUME/DATA_CHECK=WRITE DSA100:), this is quite unusual. Did you change that recently ? Why ?

Do those errors disappear, if you turn it off ? Does shadowing correctly handle this setting ?

Volker.
dschwarz
Frequent Advisor

Re: Shadowing problem

Hoff,

ANA/DISK/SHADOW compares the disk with itself. As expected there were no errors.

We are not completely up to date with firmware and patches. I shall install the latest firmware and patches as soon as possible.

Volker,

DATA_CHECK on write operations has been enabled in 2002 because there were mysterious problems with Oracle Rdb. It didn't help solving those problems but nobody turned it off. We will try this now, because this device is in fact the only with DATA_CHECK turned on.

But what's wrong with DATA_CHECK enabled except slowing down I/O ?
Wim Van den Wyngaert
Honored Contributor

Re: Shadowing problem

Seems a bug in shadowing to me.
http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&taskId=110&prodSeriesId=459922&prodTypeId=18964&prodSeriesId=459922&objectID=c01462103 so turn the write check off.

Also try a anal/dis/read of your disk (will read all data on disk !) to see if there are parity errors (bad blocks). It shouldn't give the errors because there are no hardware errors but it could give more info about what is wrong.

Wim
Wim
Jur van der Burg
Respected Contributor

Re: Shadowing problem

Ah, yes I remember. An application may cause this by issueing an async datacheck i/o and clobbering the buffers before the datacheck completes.

Fwiw,

Jur.