Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

backup command issue

SOLVED
Go to solution
Bence Richter
Occasional Advisor

backup command issue

Good morning Ladies and Gentlemen,

I've got 2 news.

1) Introduction:
-----------------
This is my first tread as I just registered myself here after browsing the site for 2 years while I was learning OpenVMS (v8.2) in my free time. I am originally a UNIX engineer (medium level) and I have started to play with my little Aplhaserver 1200 about 2 years ago from zero. I know few thing about it now, I do backups, restores, networking, user admin tasks, so the basic level I guess.
So I'm new here, but I'm really glad to join this "club".

2) The issue itself:
---------------------
After I've done a successful backup in the same way as always, I wanted to restore it on my baby (Alphaserver 1200).
I've done this many times before, but now I got weird error message and I cannot handle it.

Error message #1.,
$$$ backup/image/verify mka400:sybck6/rewind dra0:/init
%SYSTEM-I-MOUNTVER, DKA0: is offline. Mount verification in progress.

%SYSTEM-I-MOUNTVER, DKA0: has completed mount verification.

%BACKUP-F-LABELERR, error in tape label processing on MKA400:[000000]SYBCK6.;
-BACKUP-F-NOTANSI, tape is not valid ANSI format

Can you advice please?

Regards,
Bence
13 REPLIES
John Gillings
Honored Contributor
Solution

Re: backup command issue

Bence,
Errors on BOTH the disk AND the tape drive?

I'd be checking that the SCSI chain is correct. Check for length and termination. Make sure there is ONE terminator on each end.

Note that SCSI is notorious for "working" under normal circumstances, but failing when put under load (like restoring a backup). Just because is seems to work, doesn't mean it's correct. You may need to open the box to check properly.

It may also help to see how the backup was taken. Although the backup was apparently successful, the only real test is to make sure it can be restored. Have your backups passed that test?
A crucible of informative mistakes
Hein van den Heuvel
Honored Contributor

Re: backup command issue

1) Welcome to OpenVMS Bence! And congrats on 'coming out of the closet!'. :-)

2)
You are running this restore on the CONSOLE right? The monut verification is probably NOT from the backup command itself, but an async console message, TRIGGERED by and error after the backup IOs started.

This input is really MKA400 (some scsi tape), and the output DRA0: (some SWXCR / KZPCC logical drive ) right?
And DKA0 is the system disk perhaps?

Then I'm sorry to report that I suspect you got hardware trouble.
Check: $SHOW ERROR
Is the tape-drive in the box or external?
The PKA0 controller is onboard right (I don't remember the AS1200 rightaway).
Did you make any disk/tapedrive configuration changes recently?
Anyway, I suggest you jiggle (unplug and reseat) any and all SCSI cable or terminator you can get at, and try again.


The tape problem is probably an other result of the IO problem. But you cound just try to MOUNT /FOR and DUMP/BLOC=COUNT=5 the tape for further insight.

Hope this helps,
Hein
Bence Richter
Occasional Advisor

Re: backup command issue

Good afternoon Gents,

Thanks for your replies.
Lotâ s of questions here. I try to answer all of them.

> Errors on BOTH the disk AND the tape drive?
I donâ t get this one.

> It may also help to see how the backup was taken. Although the backup was apparently successful, the only real test is to make sure it can be restored. Have your backups passed that test?
First of all, I have 2 servers. One of them is production, the other one is the test. I do take system volume backup from this prod box (DS25) every month and I do restore them on the test server (AS1200).
The way I backup the system is, I get a DDS3 tape and put into the external DAT24 drive of the DS25 alpha box. Then:
1. I shut down production
2. p>>> b dqa0 !(this is the CD ROM)
3. $$$ initalize mka500: sybck6 !(mka500 is the external DAT24 tape drive, sybck6 is the label of the initialized tape)
4. $$$ mount/override=identification dkb0 !(dkb0 is the system volume)
5. $$$ backup/image/record/verify dkb0: mka500:sybck6
6. $$$ lo !(I logout after the backup process finished, then I reboot the server in normal mode.)

I take the tape and I put it into the Alphaserver 1200:
I start the AS1200 server, CD in and:
1. p>>> b dka0 !(boot from CD)
2. $$$ initalize dra0: /structure=2 /erase=init alphasys !(dra0 is the system volume and I erase it)
3. $$$ mount dra0: /foreign
4. $$$ backup/image/verify mka400:sybck6/rewind dra0:/init

Except this month, every time I have a bootable test system after the restore. Of course I need to update the sysecurity.com, the startup_vms.com and setup the right logicals for the new HW. But after all it works, so I can say: yes, the test usually successful. The problem is that this month I couldnâ t even start the restoration.

Hein,
- Yes, I work from a console (both serial and graphics available)
- Yes, my mka400 is the input and is the internal tape device on AS1200
- Yes, Dra0 is th output and is a KZPCC. It is also my system disk volume.
- Yes, I thought I have a HW error after all :(
- No, tape drive is internal in the AS1200. I have a spare one. Should I swap them for a try?
- No, I did not make any changes in the disk/tape drive recently


Thanks for both of you.
Iâ ll check the HW for plug errors, cable errors and I re-plug the SCSI cables for a start.

Cheers,
Bence
Joseph Huber_1
Honored Contributor

Re: backup command issue

>>

> Errors on BOTH the disk AND the tape drive?
I donâ  t get this one.
<<

MKA and DKA devices are connected to the same SCSI controller/bus.
The fact that the disk DKA0 got errors at the same time as backup from MKA400 started indicates a problem affecting the whole SCSI chain.
Therefore a check of the SCSI chain is needed.
At the end it can be the DAT drive is broken, and affects the SCSI bus.
http://www.mpp.mpg.de/~huber
Shriniketan Bhagwat
Trusted Contributor

Re: backup command issue

Hi,

Can you try the BACKUP with /IGNORE=LABEL qualifier?

Regards,
Ketan
Volker Halle
Honored Contributor

Re: backup command issue

Bence,

to check whether the MKA400: tape drive has a SCSI problem and/or the backup tape is correctly readable, just try:

$$$ BACKUP/LIST MKA400:/SAVE/REWIND

and see whether the saveset on tape is readable and the files in the saveset are being listed. You can abort the backup list operation after some time by pressing CTRL-Y.

Volker.
Hoff
Honored Contributor

Re: backup command issue

DKA0: is your system disk? If you're not sure, you can use:

$$$ SHOW LOGICAL SYS$SYSDEVICE:

The usual command would be:

$$$ backup/image/verify -
mka400:sybck6/save/rewind -
dra0:/init

Look at the termination, length and devices on the SCSI "A" chain here.

Once you've looked at the primary "A" SCSI chain here (which is the usual culprit), you'll probably want to configure the SWCC software for this DRA0: device and confirm all of the disks are working. The DRA0: device is usually indicative of a SWXCR Mylex DAC 960 PCI RAID controller, and that series of RAID controller is known to be a bit fussy around its configuration. SWCC links:

http://labs.hoffmanlabs.com/node/564


>Can you try the BACKUP with /IGNORE=LABEL qualifier?

Eh? Would you mind elaborating on that suggestion? (My experience points to several issues with that qualifier and corruptions, and that qualifier isn't AFAIK relevant for an image saveset restoration.)
Bence Richter
Occasional Advisor

Re: backup command issue

Hi all,

Thank you for all of your replies.
I was on a long sick leave for 2 weeks with my lungs and had no chance to take care of this issue till now.
What I've done so far since I'm back is I cleaned the HW properly and I replugged the SCSI connectors as many referred to a HW issue around SCSI channels.
I started to run a full system restore from tape and it works!! I cannot confirm success yet as it hasn't finished, but hopefully tomorrow morning I can have some news.

Thanks again all, see you tomorrow.

Bence
Bence Richter
Occasional Advisor

Re: backup command issue

Gentlemen,

Thank you very much for all of your help in this issue.
The server has finished the restoration yesterday and works really well again.

Shall I close the thread? I guess so...

Regards,
Bence
Kumar.R
Occasional Visitor

Re: backup command issue

Hello Bence,

Welcome to the OpenVMS discussion forums.

The first two errors doesn't look relavant to the the last two ones.It looks like dka0: is the system disk and went to mount verification for a while.Or you getting the dka0 errors still?

Comng to the last two errors, it looks like your tape drive could not read from the tape on MKA400.

* This needs a cleaning of the tape drive. with a cleaning disk.

* The tape drive is bad.Needs a replacement
of the tape drive.

Hope this solve your problems.

Thanks and regards,
Kumar
Bence Richter
Occasional Advisor

Re: backup command issue

Thanks Kumar.

Actually you are right. It is a fact that I can take backups after the general cleaning of the box + re-plugging the SCSI cables, but DKA0 (CD-ROM) is mounting for too long sometimes. So probably that's another issue.
Maybe my CD-ROM is dying... That's the only thing I can imagine, as I re-checked the cables.

Cheers,
Bence
Hoff
Honored Contributor

Re: backup command issue

I've not seen a dirty tape drive or dirty media or unclean tape drive heads cause disk mount verification errors. The degree of disruption here tends to involve a fairly substantial disruption to what appears to be the system disk, and typically beyond filthy tape heads.

A bad disk or bad tape drive or a spotty SCSI connection or a bad cable or bad termination or such, yes. That can cause bus disruptions.

DKA0: is a fairly unusual unit setting for a SCSI CD-ROM and a fairly common setting for a disk drive. Ensure you don't have duplicate units on the bus, whether with one of the SCSI devices present or with the host SCSI controller (which is usually at unit 6 or 7) in this configuration.

It's easily feasible that the CD-ROM drive or the CD media is tossing errors, and (if that's the device at DKA0:) then that could explain most or all of what you're seeing here.

And gear of this vintage (old SCSI CD-ROM widgets, old SCSI disks, old Mylex DAC 960 controllers, old SCSI cables, etc) can and do fail.
Jon Pinkley
Honored Contributor

Re: backup command issue

Bence,

You may find that your backups and restores are faster if you use a larger blocking factor on your backups. By default backup uses 8192 byte blocks for tape drives, and that is tiny.

See help bakcup/block for more info.

I would suggest replacing

5. $$$ backup/image/record/verify dkb0: mka500:sybck6

with

5. $$$ backup/image/record/verify dkb0: mka500:sybck6/block=65024 !(or at least 32256)

The %SYSTEM-I-MOUNTVER, DKA0: messages may just be a result of the SCSI buss being really busy with the tape backup. Do they occur when backup isn't active? I.e. if you do

$$$ analyze/disk/read DKA0:

do you get MOUNTVER errors? If not, then it is probably just backup hogging the adapter with a deep queue of pending operations.

If you have the 8.3 distribution CD, that has an updated version of BACKUP that may give you better results. I am assuming you are using the 8.2 distribution.

Jon
it depends