Operating System - OpenVMS
1753259 Members
5786 Online
108792 Solutions
New Discussion юеВ

Re: Decnet Phase V Question

 
Jon Pinkley
Honored Contributor

Re: Decnet Phase V Question

Warren,

In thread http://forums13.itrc.hp.com/service/forums/questionanswer.do?threadId=1311866 you stated "I've got an Integrity Cluster (2 rx6600 nodes) running VMS 8.3 and DecNet Phase V and TCPIP 5.6-9"

My guess is that you have a much worse problem than you realize. As Volker pointed out, you have a serious inconsistency in the disk structure of the disk. What you are seeing is a symptom of the problem.

Hopefully you can determine a specific point in time that the problem was created. As stated by Volker, the most likely cause is having that disk mounted by two systems that were not part of the same cluster. That could be done by booting one of the systems with the cluster related sysgen parameters set incorrectly, or by booting from the installation DVD and mounting a disk that the other member has mounted.

If you don't know when it happened, restoring from an image backup won't necessarily do you any good, since the problem may have been present when the backup was made. When the files are restored, you will no longer have multiply allocated blocks, but some files may have blocks that were overwritten by other files.

My point is that a disk restored from an image backup of a disk that has multiply allocated blocks may pass ANALYZE/DISK/LOCK_VOLUME without any errors detected, but the damage to the files will still be there.

The following thread is one of several that discuss multiply allocated blocks.

http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1267096

Good Luck,

Jon
it depends
Hakan Zanderau ( Anders
Trusted Contributor

Re: Decnet Phase V Question

I fully agree with Jon.

Make sure the disk is OK before trying to repair DECnet.

Hakan
Don't make it worse by guessing.........
Warren G Landrum
Frequent Advisor

Re: Decnet Phase V Question

Thanks All,

Do any of you feel/think that an upgrade to VMS 8.3-1 will fix the probable system disk corruption problem that it sounds like I may have?

w
Hoff
Honored Contributor

Re: Decnet Phase V Question

> Do any of you feel/think that an upgrade to VMS 8.3-1 will fix the probable system disk corruption problem that it sounds like I may have?

An V8.3-1H1 upgrade might or will cure this for those system disk files that get replaced as part of the upgrade.

Such an upgrade won't address any (corrupted) files that might be outside the files replaced by the upgrade. The DECnet databases are typically not replaced by an upgrade, for instance, and local startup and configuration files are also typically maintained.

And this approach presumes that any lurking corruption does not somehow also destabilize the upgrade. That's unlikely, but if something wanted to rummage in the DECnet databases as part of the upgrade processing...


Colin Butcher
Esteemed Contributor

Re: Decnet Phase V Question

Having been there a few months back, I'll guess that the system disc became inadvertently corrupted when you added the second node.

One of the tricky little problems with OpenVMS on Integrity is that in order to have BOOT_OPTIONS.COM set up the boot paths for you - the target disc you want to boot from has to be mounted.

However, at that point, your system is not a member of the cluster, so the disc gets corrupted unless you mount the target disc read-only (and preferably no cache too) while booted from a local disc before you run BOOT_OPTIONS.COM to set up the boot paths.

It's not (as far as I recall) described in the install guide. I did discuss this with a couple of folks from Engineering at the last bootcamp after I'd got bitten in a similar manner.

So, you probably need to restore back to the point before you added the second node to the cluster.

Hope that helps explain things.

Cheers, Colin (http://www.xdelta.co.uk).
Entia non sunt multiplicanda praeter necessitatem (Occam's razor).
Warren G Landrum
Frequent Advisor

Re: Decnet Phase V Question

Colin, Hoff, et al:

Well that bites about the undocumented stuff that should have been done doing an Integrity original Cluster set-up. Sounds like I probably got bit as Colin suggested.

Only a few Multiply allocated files show up as I'm pasting below:

BHBO1>anal/disk sys$sysdevice
Analyze/Disk_Structure for _$1$DGA1419: started on 9-FEB-2009 09:08:55.34

%ANALDISK-I-OPENQUOTA, error opening QUOTA.SYS
-SYSTEM-W-NOSUCHFILE, no such file
%ANALDISK-W-MULTALLOC, file (5659,1,0) [SYS0.SYSMGR]ACCOUNTNG.DAT;1
multiply allocated blocks
VBN 1425 to 1440
LBN 2569104 to 2569119, RVN 1
%ANALDISK-W-MULTALLOC, file (8362,146,0) [SYSLOST]TCPIP$FTP_RUN.LOG;112
multiply allocated blocks
VBN 1 to 16
LBN 2569104 to 2569119, RVN 1
%ANALDISK-W-MULTALLOC, file (11969,24,0) [SYS0.SYSEXE]DNS$CACHE.0000004505;1
multiply allocated blocks
VBN 2145 to 2240
LBN 3297792 to 3297887, RVN 1
%ANALDISK-W-MULTALLOC, file (12045,2,0)
multiply allocated blocks
VBN 513 to 608
LBN 3297792 to 3297887, RVN 1
%ANALDISK-W-MULTALLOC, file (10106,9,0) [VMS$COMMON.SYSEXE]RIGHTSLIST.DAT;5
multiply allocated blocks
VBN 673 to 688
LBN 3298496 to 3298511, RVN 1
%ANALDISK-W-MULTALLOC, file (11969,24,0) [SYS0.SYSEXE]DNS$CACHE.0000004505;1
multiply allocated blocks
VBN 2305 to 2320
LBN 3298496 to 3298511, RVN 1
%ANALDISK-W-MULTALLOC, file (10380,14,0) [SYS0.SYSMGR]NET$ROUTING_STARTUP.NCL;1
multiply allocated blocks
VBN 1 to 16
LBN 3298544 to 3298559, RVN 1
%ANALDISK-W-MULTALLOC, file (12038,2,0) [SYS1.SYSMGR]USB$UCM_EVENTS.LOG;1
multiply allocated blocks
VBN 97 to 112
LBN 3298544 to 3298559, RVN 1

One of them DOES look as though they are related to Decnet - plus the RIGHTSLIST.DAT shows to be in that state. I Backed up the rightslist to a higher version copy so that users can grab that copy as they log in and I also rebooted one of the nodes this weekend.

Here are possible options I see that may or may not help:

1) Back out Decnet Phase V and install Decnet Phase IV - Decnet is really not even being used on this system by the application or the users.

2) Install latest consolidated update (#8) for Integrity 8.3

3) Upgrade to VMS 8.3-1H

I understand that for options 2 and 3 that if they don't replace the corrupted files, then I could conceivably have the same problem.

Based on the files above that I show being multiply allocated, what do you guys think? As I said, I can't even reconfig Phase V as stared above, because of the Checksum problem
Volker Halle
Honored Contributor

Re: Decnet Phase V Question

Warren,

to recover from those MULTALLOC errors, you need to delete ALL the files, that are reporting MULTALLOC errors. First check, whether you have good backups of those files.

Be aware, that you may have other corrupted files on your system disk, if you have done ANAL/DISK/REPAIR in the past and just deleted ONE of the involved files.

RIGHTSLIST.DAT is certainly the most critical file involved in those MULTALLOC errors reported. At least check with MC AUTHORIZE SHOW/ID/FULL * whether all existing data records seem to be readable.

You can restore NET$ROUTING_STARTUP.NCL from another system disk or from the DVD, this file is NOT system-specific.

From the options given, use upgrade to V8.3-1H1 plus install latest patches. This will help you most. Whether the upgrade really works, cannot be predicted, as the system disk of your system disk is in a questionable state.

Good luck,

Volker.
Colin Butcher
Esteemed Contributor

Re: Decnet Phase V Question

You could spend a lot of time trying to fix things and finding stuff for months to come.

I'd be exceedingly tempted to build a new system disc from scratch, starting with V8.3-1H1. If you choose to copy some files across from the corrupted disc - check each and every file's contents as you go.

Build the new system disc offline by borrowing an IA64 system if you can, or simply drop one node out of the existing cluster. Work quickly by first making a bootable disc copy of the installation DVD, then add all the layered products etc. you need, then add the command files and so on you'll be using. Also use command files to create the accounts, set up the poxy database, etc. etc. This way you'll also have a complete set of all the pieces you need to create the system. Don't forget the EFI settings backup & restore for the console too.

It's probably quicker in the long run - and you'll know you're in good shape from here on. Yes, it's painful to contemplate - but probably not as painful as having a major production problem later on that causes you to have to start again, but under severe pressure.

Cheers, Colin (http://www.xdelta.co.uk).
Entia non sunt multiplicanda praeter necessitatem (Occam's razor).
Hoff
Honored Contributor

Re: Decnet Phase V Question

Build a new system disk. No question.

Chasing an unstable system disk or chasing a system disk that's been hacked around on is not worth the effort; attempting to repair such a disk is the more expensive strategy.

A fresh install is also the opportunity to understand what's on your disk, and how it's set up. Which can be valuable from the perspective of recovery, and around such things as support contracts -- do you really need XYZ product? -- and software versions.

Sure. This hurts. So does chasing yet more DECnet-Plus weirdness, and whatever new permutations of weirdness arise if (when?) additional corrupt blocks are encountered.
Warren G Landrum
Frequent Advisor

Re: Decnet Phase V Question

Thanks Guys,

Much as it hurts, I'll go ahead and make plans to build a new system disk, to ensure that no more goblins come up and bite us when we least expect it !!!

Warren