Re: %BACKUP-F-CLUSTER, unsuitable cluster factor redux

BrianT_1 · ‎10-28-2008

In another thread, I posted:

I'm still using a VAX with OpenVMS V7.3. I'm getting this same error whether I init the disk first myself or try to allow BACKUP to init it. I have a cluster factor of 8 and the disk had been running for _years_ with no problem (until it failed). The total size is 53294505, so that should allow 53294505/9=5921611 files. I specified only /maximum_files=1500000. No matter what I do, I cannot restore this backup with /IMAGE. I can, of course, restore it without /IMAGE, but how can BACKUP create the saveset without complaint and then not let me restore it exactly as it was?

Hoff responded to that and his responses and my answers to his questions are interspersed.

H> Are you current on your ECOs for OpenVMS
H> VAX V7.3? BACKUP and SCSI and UPDATE would
H> be the obvious targets. (If not, load
H> these now and try again.)

I am up to date on these, according to the master list of ECOs, with the exception of a couple of ECOs that don't apply to me.

H> How big was the old disk, and how big is
H> the new disk? (If I've done the math
H> correctly, it looks like it might be an
H> RZ23 disk? And is this really a 104 MB
H> disk?)

It is an RZ1D, 9GB disk. Both the old disk and the new disk are physically identical. It's an HSD-hosted raidset of four RZ1Ds and three of the four building devices are physically the same devices. I had to replace one.

H> Which VAX? (There are other disk-capacity
H> issues, depending on the VAX model and VAX
H> console.)

VAX 7730. It's the same system that was hosting the disk before the restore and with which the disk was initialized originally and on which the backup was taken.

H> Is this BACKUP /IMAGE a system disk?

No.

H> How many files were on the old disk?

I really don't know. I didn't count.

H> What command(s) did you use to restore the
H> disk?

$ backup/image tape:saveset disk:/init

I also initialized the disk manually with

$ init/sys/own=system/max=1500000 -
/head=3000000/clust=8 disk: label

(which appears to be the original INIT command) and used

$ backup/image tape:saveset disk:/noinit

H> To INITIALIZE the disk?

I tried the above and I also tried just

% init/sys/own=system disk: label

H> What was the original BACKUP command? (You
H> can get this from the BACKUP /LIST -- just
H> post the whole header.)

$ BACKUP/IMAGE/RECORD/IGNORE=INTERL,LABEL) -
disk: tape:saveset/BLOCK=32256

H> There are cases where BACKUP cannot
H> restore a disk image due to conflicts in
H> the structures, too.

And this would be a bug. BACKUP should always be able to restore a disk that was running successfully and which it had no problem backing up in the first place. BACKUP should always be able to initialize a disk with the same values INITIALIZE found acceptable.

H> As for the specified maximum file count
H> here, I've yet to encounter an OpenVMS
H> system that has a disk anywhere near full
H> of one-cluster files. Is that really the
H> case here?

No, it's not.

Here's the header information from the BACKUP saveset.

Save set: $1$DUA103.40
Written by: BACKUP
UIC: [000010,000040]
Date: 11-OCT-2008 13:18:52.88
Command: BACKUP/IMAGE/RECORD/IGNORE=(INTERLOCK,LABEL) $1$DUA103: $10$M
UA16:$1$DUA103.40/BLOCK=32256
Operating system: OpenVMS VAX version V7.3
BACKUP version: V7.3
CPU ID register: 13000202
Node name: _CASS::
Written on: _$10$MUA16:
Block size: 32256
Group size: 10
Buffer count: 503

Image save of volume set
Number of volumes: 1

Volume attributes
Structure level: 2
Label: AIRBUS_DISK
Owner:
Owner UIC: [000001,000004]
Creation date: 19-APR-1999 15:29:04.89
Total blocks: 53294505
Access count: 3
Cluster size: 8
Data check: No Read, No Write
Extension size: 5
File protection: System:RWED, Owner:RWED, Group:RE, World:
Maximum files: 2960805
Volume protection: System:RWCD, Owner:RWCD, Group:RWCD, World:RWCD
Windows: 7

H> CLUSTER, unsuitable cluster factor
H> for 'device-name'
H>
H> Facility: BACKUP, Backup Utility
H>
H> Explanation: During an attempt to
H> initialize an output volume, the Backup
H> utility found that the cluster factor was
H> too large or too small for the specified
H> device.
H>
H> User Action: If the input is a save set,
H> use the BACKUP/LIST command to determine
H> the volume initialization parameters of
H> the input volumes. Refer to the
H> description of the DCL command
H> INITIALIZE, determine a suitable cluster
H> factor, and initialize the output volumes
H> using the INITIALIZE command. Then,
H> reenter the command specifying
H> the /NOINITIALIZE qualifier.

So, based on the volume initialization parameters, what might I be doing wrong, if anything? I may need to restore this or another disk again.

A side question: how does /DIRECTORIES enter into this, if at all. That just controls preallocation of 000000.dir, correct?

Hoff · ‎10-28-2008

This BACKUP saveset is already potentially corrupt, given the command used for its original creation.

The host controller (in another discussion) was reporting errors. I will presume those have been resolved, though it is not clear whether those errors could also have contributed to saveset corruptions.

If the following DCL command:

BACKUP /IMAGE ddcu:saveset/SAVE ddcu:

doesn't resolve this, I'd try another approach and an approach not involving this particular OpenVMS VAX version and ECO or this particular OpenVMS VAX box. I'd ask that the BACKUP command not be edited, adjust or altered; that no command qualifiers nor tweaks be applied to the syntax.

As a particular alternative, try OpenVMS Alpha V8.3 or OpenVMS I64 V8.3; both of these releases have newer BACKUP bits.

Or you can call in some help.

BrianT_1 · ‎10-28-2008

H> This BACKUP saveset is already
H> potentially corrupt, given the command
H> used for its original creation.

And why is that? What portion of the original BACKUP command would lead to this corruption. I would contend that the saveset is not corrupt because I was able to restore the data by leaving off the /IMAGE qualifier. It just took much longer to restore.

H> The host controller (in another
H> discussion) was reporting errors. I will
H> presume those have been resolved, though
H> it is not clear whether those errors
H> could also have contributed to saveset
H> corruptions.

I have no way of knowing what the errors even mean, so I can't determine if they've been resolved.

H> If the following DCL command:
H>
H> BACKUP /IMAGE ddcu:saveset/SAVE ddcu:
H>
H> doesn't resolve this,

And since I stated that I already used this command (/SAVE is implied when using a tape device for the saveset and I included both /INIT and /NOINIT in trials - it's got to be one ot the other), we know that it doesn't. FOr completeness, I tried it with neither /INIT nor /NOINIT and, of course, it didn't change anything.

H> I'd try another
H> approach and an approach not involving
H> this particular OpenVMS VAX version and
H> ECO or this particular OpenVMS VAX box.

Will you give me a VMS system with which to do this? I have no access to anything but what I have.

H> I'd ask that the BACKUP command not be
H> edited, adjust or altered; that no
H> command qualifiers nor tweaks be applied
H> to the syntax.

With ot without tweaks, BACKUP in OpenVMS VAX V7.3 is clearly broken.

H> As a particular alternative, try OpenVMS
H> Alpha V8.3 or OpenVMS I64 V8.3; both of
H> these releases have newer BACKUP bits.

Could you suggest a way to do this? I have no access to any of that hardware or software here. I do have OpenVMS Alpha V7.3-a on an AlphaServer 4/233 but it's a member of a cluster not connected to the one where I must restore the data.

H> Or you can call in some help.

I tried that. I sent HP a message four days ago via the www.openvms.compaq.com website asking that someone contact me. I received a robo-response, but nothing else.

Robert Gezelter · ‎10-28-2008

Brian,

If I am reading the posts correctly, Hoff is referring to the fact that file updates could be in progress during the BACKUP operation. IO errors during the BACKUP could result in an inherently corrupted Save Set.

Restoring a BACKUP Save Set without image could cause a problem if there are any aliased files on the volume (which is why people are warned to be careful when restoring system volumes).

Personally, I would attempt to recreate this with a very small test case and a non-tape Save Set. If there is a small reproducer, then it is far easier to get attention, something I learned many years ago when dealing with various support organizations at client's behest.

- Bob Gezelter, http://www.rlgsc.com

BrianT_1 · ‎10-28-2008

BG> If I am reading the posts correctly,
BG> Hoff is referring to the fact that file
BG> updates could be in progress during the
BG> BACKUP operation. IO errors during the
BG> BACKUP could result in an inherently
BG> corrupted Save Set.

If it were inherently corrupt, it would not restore at all, with or without /IMAGE, I would think.

BG> Restoring a BACKUP Save Set without
BG> image could cause a problem if there
BG> are any aliased files on the volume
BG> (which is why people are warned to be
BG> careful when restoring system volumes).

No aliased files.

BG> Personally, I would attempt to recreate
BG> this with a very small test case and a
BG> non-tape Save Set. If there is a small
BG> reproducer, then it is far easier to
BG> get attention, something I learned many
BG> years ago when dealing with various
BG> support organizations at client's
BG> behest.

Unfortunately, I don't have the luxury of doing this, since I have no spare drives of the size the raidset creates and I'm in a disaster recovery situation where I MUST get this data restored for a multimillion dollar project.

GuentherF · ‎10-28-2008

Brian,

I have that feeling your HSD-hosted RAID set is hosed. One factor in checking the cluster factor is the total blocks count of the output device obtained in BACKUP by SYS$GETCHN.

Check with DCL-SHOW DEVICE/FULL or in SDA-FORMAT UCB (let us know if you need the full details here). Ah, and before you do that mount the disk /FOREIGN which forces VMS to update the disk geometry info in the UCB.

/Guenther

Hoff · ‎10-28-2008

Ok; so the simple BACKUP command blows up. Then it's likely something within BACKUP itself that is at fault here, or (for whatever reason) the output device doesn't quite match the input.

None of which is news, and none of which is new information.

The saveset structures are likely fine, it's the data in the saveset that is at risk. The file system interlocks that were ignored here are intended to provide for either consistency or an indication of inconsistent data, and not to require folks to add qualifiers on BACKUP. The data corruptions that can arise here can be entirely silent, per discussions with one of the long-time BACKUP maintainers. In other words, it is the contents of the files archived within the saveset that can be suspect. Whether there is a problem here depends on what activity was underway (if any) when BACKUP captured each input file.

I have VAX, Alpha and Itanium systems and storage available, if you'd like to discuss this offline.

Hein van den Heuvel · ‎10-28-2008

[This is a topic appears to be a cross-posting from topic 1587 in EISNER::VMS.NOTE ]

I have a hard time making the numbers add up... unless this is a raid-5 with 4 members and an effective storage of 3*9,000,000*2 blocks and even then, that seems just close, not exact.

For an hardware raid set, the individual member disks are invisible, and thus irrelevant to the OS. As long as the raid set is happy.

>> 53294505/9=5921611 ... I specified only /maximum_files=1500000.

That still sounds like a lot of files.
How many files do you think there actually were?
How many block used in INDEXF.SYS?

That would be for an average file size of about 3 clusters = 24 block. Small, but ok fine.

backup header>> Maximum files: 2960805
Ok, more, smaller files on the original.

>>> $ init/sys/own=system/max=1,500,000 -
/head=3,000,000

I added those comma's.
I sure hope you actually used /head=300,000

If you do specify more than /MAX, then the maximum is silently set.

I seem to recall a note about an odd error with backup 7.3 when the /MAX and /HEAD was set to the absolute max, resulting in immediate failure on restore.

After the manual init, what was the resulting /MAX with $SHOW DEV/FULL?

I suggest trying an other manual $INIT but backing out the /HEAD by at least one under /MAX, and realistically probably 1/2.
Then retry the $BACKUP/NOINIT

Use DFU to REPORT of a similar idsk actual usage?

hth,
Hein.

BrianT_1 · ‎10-28-2008

HV> I have a hard time making the numbers
HV> add up... unless this is a raid-5 with
HV> 4 members and an effective storage of
HV> 3*9,000,000*2 blocks and even then,
HV> that seems just close, not exact.

It's whatever ADD RAIDSET of four RZ1Ds gives on the HSJ. The number is almost exactly 3X the size of one RZ1D as shown by VMS. (17769177 = RZ1D. 17769177*3=53307531. Device on VMS shows total blocks=53294505.)

HV> For an hardware raid set, the
HV> individual member disks are invisible,
HV> and thus irrelevant to the OS. As long
HV> as the raid set is happy.

Of course.

HV> That still sounds like a lot of files.
HV> How many files do you think there
HV> actually were?
HV> How many block used in INDEXF.SYS?

There are 1283598 files on the disk. INDEXF.SYS is 1285765/1500760. According to DFU, I have 303 free headers. On another RAIDSET device that's supposed to be identical to this one, DFU says there are 27192 free headers.

HV> That would be for an average file size
HV> of about 3 clusters = 24 block. Small,
HV> but ok fine.

Actually, the average file size is about 29 blocks.

HV> backup header>> Maximum files: 2960805
HV> Ok, more, smaller files on the original.
HV>
HV> $ init/sys/own=system/max=1,500,000 -
HV> /head=3,000,000
HV>
HV> I added those comma's.
HV> I sure hope you actually
HV> used /head=300,000
HV>
HV> If you do specify more than /MAX, then
HV> the maximum is silently set.

I did specify that because I thought /HEADERS specified the size of INDEXF.SYS and I have what's supposed to be an identical RAIDSET device that has INDEXF.SYS at the 2,960,805 value

HV> I seem to recall a note about an odd
HV> error with backup 7.3 when the /MAX
HV> and /HEAD was set to the absolute max,
HV> resulting in immediate failure on
HV> restore.

Suggest some reasonable values to me and I can see if I can try them (although I'm not sure I'll be able to do that, since I'd have to restore data that's two weeks old - not something the project wants to hear).

HV> After the manual init, what was the
HV> resulting /MAX with $SHOW DEV/FULL?

With or without any qualifiers? I believe that without any qualifiers, I got a cluster size of 51 and fewer than 500,000 max files.

HV> I suggest trying an other manual $INIT
HV> but backing out the /HEAD by at least
HV> one under /MAX, and realistically
HV> probably 1/2. Then retry the
HV> $BACKUP/NOINIT

It's dubious whether I'll be able to do this, since if it doesn't work, the project suffers because it will take another four days to restore. That said, I may have to do this anyway because of the forcederror messages I've been getting from many of the files on the disk. I think I have multiple problems. I'm trying to restore a portion of the files that are evincing this error to another disk to see if the saveset contains them or if they developed after the restore. The original backup and the subsequent restore didn't log any errors.

GuentherF · ‎10-28-2008

In this old BACKUP code there are only two locations where this error is issued. In both cases the total block count is involved in the calculation. So I would first check the total blocks count the way VMS sees it.

Btw. using /IGNORE=INTERLOCK never-ever is a cause for a corrupted save set. It may save the disk in an inconsistent state but that's a whole different thing then.

/Guenther

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: %BACKUP-F-CLUSTER, unsuitable cluster factor redux

%BACKUP-F-CLUSTER, unsuitable cluster factor redux