Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Shadowing on a system disk.

 
TKelly_1
Occasional Advisor

Shadowing on a system disk.

Hello all,

I've recently been made responsible for a DS25 running openvms 7.3. The system disk in this machine 'was' shadowed.

Originally, two disks (DKA0 and DKA100) were in the shadowset DSA0:. Now, DKA100 is the only member of the DSA0 shadowset. It turns out the phsyical disk DKA0 was dead and has been for some time. I'd like to restore that shadowset.

I've replaced the physical disk with an identical disk (i.e. exact same disk, plugged into the spot from which the old DKA0 was removed). I can mount the new DKA0, copy files to it, verify it's working etc.. However when i try to re-add it to the DSA0 shadowset using:

mount/system dsa0: /shadow=($1$dka0:) alphasys

The shadow set attempted to merge but clocked up a large number of errors. I've been using the "Volume Shadowing for OpenVMS" manual as a reference, but have yet to spot what I'm doing wrong.

It appears to me that shadowing the system disks is a 'special case' (as the systartup_vms.com does not explicitly mount the system shadowset on startup, rather OpenVMS just knows to do this). Is there a particular sequence of actions required to rebuild this system shadowset with a new disk?

Thanks and regards.
19 REPLIES 19
Karl Rohwedder
Honored Contributor

Re: Shadowing on a system disk.

I am wondering about the merge, I would assume a shadow copy, perhaps you may tell us the real error messages.
The systemdisk is a special case during booting, but when the system us up and running adding and removing members from the shadowset is done as with data disks.

regards Kalle
Volker Halle
Honored Contributor

Re: Shadowing on a system disk.

TKelly,

welcome to the OpenVMS ITRC forum.

OpenVMS shadowed system disks do NOT need to be mounted in SYSTARTUP_VMS.COM, because OpenVMS will form the system disk shadowset with the same members as existed prior to the boot.

Your mount command is correct. I prefer to always use /CONFIRM when doing manual MOUNTs. This will tell you the label of the disk to be added - and overwritten - and still allows you to say NO, if you had specified the wrong disk.


The shadow set attempted to merge ...


The operation you've described is called a SHADOW COPY, not a MERGE. During a shadow-copy all blocks from the existing members are copied to the new member. A MERGE compares all blocks of all members and only updates blocks, which were different by copying it from the 'master member'.

Did you look at the errors being reported ? You will need DECevent (DIAGNOSE) or WEBES (SEA) to do this.

Try an INIT/ERASE DKA0: SCRATCH first, this will write to all blocks of the new disk and should report any problems at that time. This operation will take a while.

Volker.
The Brit
Honored Contributor

Re: Shadowing on a system disk.

Did you initialize the disk before you attempted to mount it into the shadowset??

$ Init/Sys $1$DKA0: ALPHASYS

(label is actually irrelevant). It is possible that the new disk is not actually "New". However, the fact that the disk is accepted into the shadowset and begins to copy, suggests that you used the correct commands, and that the problem is more likely with the new disk.

Anyway, as requested by Kalle, it would be best to cut and paste the actual error messages.

Dave
Dave.
TKelly_1
Occasional Advisor

Re: Shadowing on a system disk.

Hello and thanks for the quick replies. Ditto the welcome, I've been reading the posts on here for quite a while. It's an invaluable knowledgebase.

First off, yep, I'm 100% clear that "OpenVMS shadowed system disks do NOT need to be mounted in SYSTARTUP_VMS.COM". That's as per what I've read and my/your comments above.

:) Thanks for the note on /CONFIRM - I had actually used that myself. I too like the assurance of getting the target of the copy confirmed.

Apologies if I was loose with my terminology ('use of merge instead of copy'). You are correct, I am referring to the initial copy when adding a new member to the DSA0: shadowset.

Yup - I initialised the disk (a couple of times!) while playing with it.

I'll go and use DIA to dig out the exact errors from day I first tried this and then post them in short order.
Volker Halle
Honored Contributor

Re: Shadowing on a system disk.

TKelly,

did you do just an INIT DKA0: label or did you also use /ERASE - that makes a big difference here...

Volker.
Jim_McKinney
Honored Contributor

Re: Shadowing on a system disk.

> shadow set attempted to merge but clocked up a large number of errors

Copy, not merge, right? Errors on the source or destination disk? If the source disk has forced error flags set on any blocks those will need to be remedied prior to any shadow copy. In any case, you'll probably want to use one of the error log translation tools (DECEvent, WEBES, etc) to decode the event.
TKelly_1
Occasional Advisor

Re: Shadowing on a system disk.

I did not use /ERASE. So I'll add that to my to-do list too. Many thanks for highlighting the difference!
Robert Gezelter
Honored Contributor

Re: Shadowing on a system disk.

TKelly,

Just a small note of clarification. SYSTARTUP_VMS.COM is not the only possible place where a MOUNT can be placed.

The case of the system residence volume is indeed special, and all of the command files invoked during the startup are indeed not germane, however in other cases, it does make a great difference.

- Bob Gezelter, http://www.rlgsc.com
Jim_McKinney
Honored Contributor

Re: Shadowing on a system disk.

> use /ERASE

I actually find it preferable to use BACKUP and make a /PHYSICAL copy of the source disk onto the target, follow that with an INITIALIZE or MOUNT/OVERRIDE=SHADOW to destroy the SCB, and then initiate the shadow copy. Shadow copies then are much faster as the majority of the data on the target disk will not have to be updated and the initial copy was performed with no decision making code involved.
Hoff
Honored Contributor

Re: Shadowing on a system disk.

TKelly, please post the errors.

Correctly-functioning shadowing does not log errors, and initializing disks for inclusion into a shadowset is not particularly relevant to the generation of errors.

Please post the errors.

If the errors are on the system disk or a critical data disk, please ensure you have a restorable copy created from an off-line system disk; from the distro media or a spare system disk.

Please post the errors.

This could be a simple case of an incorrectly-terminated SCSI bus or a bad cable or such, or there could be a disk error or controller error or other lower-level issue lurking. Shadowing does put a substantial I/O load on the hardware, and does tend to expose lower-level hardware problems.

Please post the errors.

Volker Halle
Honored Contributor

Re: Shadowing on a system disk.

Jim,

BACKUP/PHYSICAL requires downtime, as the source disk needs to be mounted /FOREIGN.

TKelly,

INIT/ERASE is not generally necessary before adding a new disk to a shadowset, an INIT disk: SCRATCH before adding it to a shadowset might always make sense.

In this case, where you do not trust DKA0: anymore, INIT/ERASE DKA0: SCRATCH would at least write to ALL the blocks of the replacement disk to make sure the disk is o.k.

Volker.
Jim_McKinney
Honored Contributor

Re: Shadowing on a system disk.

> BACKUP/PHYSICAL requires downtime, as the source disk needs to be mounted /FOREIGN.

Are you sure about this? Seems to work ok here on a 7.3-2 system...

A$ moun/fore $1$DKE300
A$ back/phys/igno=inte sys$sysdevice: $1$DKE300:
^T
XXXA::MCKINNEYJ 08:21:49 BACKUP CPU=00:00:03.06 PF=5197 IO=13918 MEM=960
Input: physical LBN 241727 (of 17773524)
Output: saveset volume 0, block 0 (33040 byte blocks)


Volker Halle
Honored Contributor

Re: Shadowing on a system disk.

Jim,

in your case, $1$DKE300: was not mounted and you therefore can mount it foreign. That works.

But TKelly wanted to add a new member to the system disk shadowset and to do a BACKUP/PHYSICAL to that new member first, it would require the system-disk to be dismounted to allow a MOUNT/FOR.

Or is there some misunderstanding ?

Volker.
Jim_McKinney
Honored Contributor

Re: Shadowing on a system disk.

Only the target disk need be mounted foreign. The currently mounted system disk DSA device can be used a source and the foreign mounted new member can be the target. Once the backup is complete, the target disk can be dismounted and then placed into the shadow set (after scramling the SCB). The following works for me... VMS 7.3-2.

$ sh dev sys$sysdevice

Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
DSA0: Mounted 0 ALPHA73 12188700 615 1
$1$DKA100: (XXXX) ShadowSetMember 0 (member of DSA0:)

$ mou/for $1$dka200:
$ back/phys/ign=inter dsa0: $1$dka200:
^T
XXXX::MCKINNEYJ 09:22:26 BACKUP CPU=00:00:07.93 PF=6103 IO=182807 MEM=1024
Input: physical LBN 23551 (of 17773524)
Output: saveset volume 0, block 0 (33040 byte blocks)
Volker Halle
Honored Contributor

Re: Shadowing on a system disk.

Jim,

sorry, you're right.

The manual says: For a BACKUP/PHYSICAL operation, the input disk needs to be mounted /FOREIGN or you must have LOG_IO or PHY_IO privilege.

I tried MOUNT/FOR on the already mounted input disk and this - of course - failed.

Volker.
TKelly_1
Occasional Advisor

Re: Shadowing on a system disk.

Hello all,

Thanks for the valuable input. I only have a limited window each day allocated to this so apologies for my delays replying.

Ok, first a small clarification, as I can see getting all the detail 100% correct is very important. The machine is a DS20E running openvms7.3-1. Apologies if that makes a material difference.

Re init/ERASE:
I ran an init/ERASE on DKA0 yesterday. It took around 30 minutes. No errors were reported and the error count on the disk did not increase. So I guess that's positive.

Re "Posting the Errors":

This morning I ran the following mount command (and received the OS responses that follow):


$ mount/system/confirm dsa0: /shadow=($1$dka0:) alphasys
%MOUNT-F-SHDWCOPYREQ, shadow copy required
Virtual Unit - _DSA0: Volume Label - ALPHASYS
Member Volume Label Owner UIC
_$1$DKA0: (T1) ALPHASYS1 [IT,TAYLODGE]
Allow FULL shadow copy on the above member(s)? [N]:y
%MOUNT-I-MOUNTED, ALPHASYS mounted on _DSA0:
%MOUNT-I-SHDWMEMCOPY, _$1$DKA0: (T1) added to the shadow set with a copy operatn
%MOUNT-I-ISAMBR, _$1$DKA100: (T1) is a member of the shadow set

...so that part looks good.

The copy operation starts, gets to about 3% and then the Error Count on the disk starts rising rapidly (in the first couple of minutes the Error Count on the disk jumps from 46407 to 55000). See below:

T1:SYS> sh dev dka0

Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
DSA0: Mounted 0 ALPHASYS 5143917 729 1
$1$DKA0: (T1) ShadowCopying 55036 (copy trgt DSA0: 3% copied)
$1$DKA100: (T1) ShadowSetMember 0 (member of DSA0:)

The Error Count continues to rise...

I then did a "dia/cont" while the errors are being registered, and chose one of the many DKA0 errors I saw (they appear to all be more or less the same). Here it is:

**** V3.4 ********************* ENTRY 42 ********************************


Logging OS 1. OpenVMS
System Architecture 2. Alpha
OS version V7.3-1
Event sequence number 26472.
Timestamp of occurrence 13-NOV-2009 09:54:21
Time since reboot 24 Day(s) 19:20:08
Host name T1

System Model COMPAQ AlphaServer DS20E 666 MH

Entry Type 1. Device Error


---- Device Profile ----
Unit $1$DKA0
Product Name BD0096826B
Vendor COMPAQ

-- Driver Supplied Info -
Device Firmware Revision HPB4
VMS SCSI Error Type 5. Extended Sense Data from Device
SCSI ID x00
SCSI LUN x00
SCSI SUBLUN x00
Port Status x00000001 NORMAL - normal successful completion
SCSI Command Opcode x2A Write (10 byte command)
Command Data
x00
x00
x08
x3E
xE1
x00
x00
x7F
x00

SCSI Status x02 Check Condition
Remaining Byte Length 18.

--- Device Sense Data ---

Error Code x70 Current Error
Segment # x00
Information Byte 3 x00
Byte 2 x00
Byte 1 x00
Byte 0 x00
Sense Key x0B Aborted Command
Additional Sense Length x0A
CMD Specific Info Byte 3 x00
Byte 2 x00
Byte 1 x00
Byte 0 x00
ASC & ASCQ x4700 ASC = x0047
ASCQ = x0000
SCSI Parity Error
FRU Code x03
Sense Key Specific Byte 0 x00 Sense Key Data NOT Valid
Byte 1 x00
Byte 2 x00

----- Software Info -----
UCB$x_ERTCNT 16. Retries Remaining
UCB$x_ERTMAX 16. Retries Allowable
IRP$Q_IOSB x0000000000000000
UCB$x_STS x18020810 Online
Software Valid
Volume is Valid on the local node
Unit supports the Extended Function bit
IRP$L_PID x8288A910 Requestor "PID"
IRP$x_BOFF 0. Byte Page Offset
IRP$x_BCNT 65024. Transfer Size In Byte(s)
UCB$x_ERRCNT 63212. Errors This Unit
UCB$L_OPCNT 404302. QIO's This Unit
ORB$L_OWNER x00010004 Owners UIC
UCB$L_DEVCHAR1 x1C4D4008 Directory Structured
File Oriented
Sharable
Available
Mounted
Error Logging
Capable of Input
Capable of Output
Random Access


I can see various pieces of information that look like errors here, i.e.:
VMS SCSI Error Type 5
SCSI Parity Error
Sense Key Data NOT Valid
... but unfortunately those errors don't mean a lot to me at the moment.

Thanks and regards,
Tom.
Volker Halle
Honored Contributor

Re: Shadowing on a system disk.

Tom,

the error is on the destination disk (DKA0), so that explains why you shadow-copy gets stuck there ! Looks like a SCSI parity error on a WRITE command.

So there is some hardware problem with this disk or the SCSI path to this disk.

Volker.
TKelly_1
Occasional Advisor

Re: Shadowing on a system disk.

Gah - hardware! Folks, many thanks for taking the time to read through this. I've learned a bit (Thks Jim/Volker for the discussion on the 'mount' above).

I'll pass the errors etc. on to the lad that installed the disk in the first place.

Thanks,
Tom.
TKelly_1
Occasional Advisor

Re: Shadowing on a system disk.

APOLOGIES!

Shoot. We run a points system on a forum in work, and it's "points out of ten". Total. For a thread. I assumed that was the same here...

I allocated points above to 'divide up my ten available points'... only afterwards when I saw that I could still assign points to the remaining posts did I think I may have made a mistake.

One quick search later reveals the itrc points system:
N/A: The answer was simply a point of clarification to my original question
o 1-3: The answer didn't really help answer my question, but thanks for your assistance!
o 4- 7: The answer helped with a portion of my question, but I still need some additional help!
o 8-10: The answer has solved my problem completely! Now I'm a happy camper!

I'm sorry. I really appreciate you all taking the time to answer. Allocating 1 point here and 2 there above wasn't meant to indicate a lack of appreciation for the replies, I just misunderstood the points system.

Regards to all,
Tom.