Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Decnet Phase V Question

 
Warren G Landrum
Frequent Advisor

Decnet Phase V Question

Guys,

I've got Decnet Phase V installed on a 2-node Integrity Cluster, VMS 8.3. Having a problem with Set Host. When I 'Set Host 0' or Set Host anywhere, it works fine from one node. But if I do the same commands, Set Host 0 or Set Host anynode from the other node, this is what I get:

BHBO2>set h bhbo1
%RMS-F-DEV, error in device name or inappropriate device type for operation

BHBO2>set h 0
%RMS-F-DEV, error in device name or inappropriate device type for operation

I'm thinking I've got Decnet config screwed up somehow.

I just now saw that I only had a NET$PROXY. DAT in the root of the node that is NOT allowing me to set host, so I copied this to the sys$common:[sysexe] dir and redefined that logical on both nodes to point explicitly to sys$common:[sysexe]net$proxy.dat instead of to sys$system:net$proxy.dat, since I wasn't able to delete the copy from the root since it was being used.

So I may have multiple problems going on here caused by improper configuration by me.

Any help will be greatly appreciated.

Thx,

Warren

20 REPLIES
Hakan Zanderau ( Anders
Trusted Contributor

Re: Decnet Phase V Question

It's Friday evening in Sweden right now, so
my fast anwser is to run

$ @sys$manager:net$configure basic
( option number 1 ( entire configuration ))

to reconfig DECnet.

It will use current configuration as defaultanswer on all questions.

Hakan

ps. DECnet is like Printers.....no one wants to touch it ;-) ds
Don't make it worse by guessing.........
Warren G Landrum
Frequent Advisor

Re: Decnet Phase V Question

Thanks Hakan,

WHen I do that, I get:

BHBO2>@sys$manager:net$configure basic
Copyright 2006 Hewlett-Packard Development Company, L.P.

DECnet-Plus for OpenVMS BASIC network configuration procedure

This procedure will help you create or modify the management scripts
needed to operate DECnet on this machine. You may receive help about
most questions by answering with a question mark '?'.

You have chosen the BASIC configuration option. This option enables
you to quickly configure your system by answering a few questions and
using most of the default answers. If you would rather do some specific
tailoring of your system's network configuration, you should invoke
NET$CONFIGURE.COM with the ADVANCED configuration option, ie:
@SYS$MANAGER:NET$CONFIGURE ADVANCED

* Do you want to continue? [YES] :
%NET$CONFIGURE-E-CKSREADERR, error reading checksum file


So this does not sound good. Possibly I have a corrupt checksum file. Any ideas how to correct or get around this?
Bill Hall
Honored Contributor

Re: Decnet Phase V Question

I don't recall ever having to do this, but I's look at SYS$SPECIFIC:[SYSEXE]NET$CHECKSUM_NCL_LOCAL.DAT and NET$CHECKSUM_NCL_LOCAL_SAVED.DAT in the same directory. I'd compare the file creation and modification dates to see if they correspond to the last known reconfiguration of DECnet.

Might be a good idea to do an ANAL/DISK on this system disk just to get an idea of what kind of shape it's in...

Then I might try to rename the *_SAVED.DAT and then run NET$CONFIGURE.COM again. Do you know for sure that your DECnet configuration requires using the p1 parameter of BASIC or ADVANCED? Not that I think its coded in the checksum file, just wondering if you knew the details of your DECnet configuration.

Bill
Bill Hall
Warren G Landrum
Frequent Advisor

Re: Decnet Phase V Question

Bill,

Thanks, I didn't know about those checksum files.

Anyhow, I checked the date and times,and the date is right, but I truly don't remember the exact time. The LOCAL and COMMON versions of that Checksum file both have the same date and time, for what that's worth.

I did the anal of the sysdisk, and a lot of files, including NET$CHECKSUM_NCL_COMMON.DAT, had multiply allocated blocks.

I then did a repair.

Then copied the Local SAVED checksum file to higher version as new LOCAL and ran NET$CONFIGURE - with same results.

When I initially did the NET$CONFIGURE after installing the system, I did NOT use the BASIC or ADVANCED params.

Looks like i might have to call HP to get me out of this one, huh? Unless you have any more ideas - please, please !!!!
Hein van den Heuvel
Honored Contributor

Re: Decnet Phase V Question

Yeah, just checked my system and found, with a little help from $SET WATCH FILE/CLA=MAJOR

SYS$SYSROOT:[SYSEXE]NET$CHECKSUM_NCL_LOCAL.DAT

It is a VFC (DCL WRITE) file created on the time I did a decnet V setup for the box.

It starts out like:

$ type/page sys$system:NET$CHECKSUM_NCL_LOCAL.DAT
6%Local%SYSTEM%26-JAN-2009 22:14:03.61
NODEINFO:1.53,ENDNODE,49::,0,2,63.1023,576
ADAPTER:EIA0:CSMACD-0:CSMACD-0:1:0:0:EI/82558:1:1:0:0:0:0
ADAPTER:EWA0:CSMACD-1:CSMACD-1:1:0:0:DExxx/TULIP:1:0:0:0:0:0
:

There's that date.
For ucks I just randomdly changes a few values, but it did not return a checksum read error.

Still, I'd say rename that away and try again.

fwiw,
Hein.



Hoff
Honored Contributor

Re: Decnet Phase V Question

SET WATCH? Um, Ok. I just looked at the text of the DCL command procedure. :-)
Hein van den Heuvel
Honored Contributor

Re: Decnet Phase V Question

>> SET WATCH? Um, Ok. I just looked at the text of the DCL command procedure.

:-)

Right. What was I thinking? I wasn't. Somehow I assumed we were dealing with a program.
It's just a crappy command file that does that 'make believe' error message stuff, while hiding the real underlying error.

"%NET$CONFIGURE-E-CKSREADERR, error reading checksum file"

Hah! Just a piece of string that looks like an error.

What it really means is that there was an EOF error on the first read... an empty file.

The file in question can be F$parse("NET$CHECKSUM_NCL_COMMON","SYS$COMMON:[SYSEXE].DAT")
or
F$parse("NET$CHECKSUM_NCL_LOCAL","SYS$SPECIFIC:[SYSEXE].DAT")

Cheers,
Hein.

Volker Halle
Honored Contributor

Re: Decnet Phase V Question

Warren,


I did the anal of the sysdisk, and a lot of files, including NET$CHECKSUM_NCL_COMMON.DAT, had multiply allocated blocks.


Is your system disk a common system disk for both systems ? If so, you may have run both systems from the same disk at the same time without proper cluster communications. This is the most likely cause of getting multiple allocate blocks on a shard system disk !

To clean up the MULTALLOC errors, you need to delete all the files sharing the multiple allocated blocks and then do ANAL/DISK/REPAIR.

Consider to re-install OpenVMS or restore the disk from the last good backup, as you cannot trust the contents of ANY file, which had the MULTALLOC error.

Volker.
Hakan Zanderau ( Anders
Trusted Contributor

Re: Decnet Phase V Question

Rename the NET$CHECKSUM files and do a NET$CONFIGURE again.

The configuration will start from scratch.

If you run NET$CONFIGURE without option, "BASIC" will be used.

Hakan
Don't make it worse by guessing.........
Jon Pinkley
Honored Contributor

Re: Decnet Phase V Question

Warren,

In thread http://forums13.itrc.hp.com/service/forums/questionanswer.do?threadId=1311866 you stated "I've got an Integrity Cluster (2 rx6600 nodes) running VMS 8.3 and DecNet Phase V and TCPIP 5.6-9"

My guess is that you have a much worse problem than you realize. As Volker pointed out, you have a serious inconsistency in the disk structure of the disk. What you are seeing is a symptom of the problem.

Hopefully you can determine a specific point in time that the problem was created. As stated by Volker, the most likely cause is having that disk mounted by two systems that were not part of the same cluster. That could be done by booting one of the systems with the cluster related sysgen parameters set incorrectly, or by booting from the installation DVD and mounting a disk that the other member has mounted.

If you don't know when it happened, restoring from an image backup won't necessarily do you any good, since the problem may have been present when the backup was made. When the files are restored, you will no longer have multiply allocated blocks, but some files may have blocks that were overwritten by other files.

My point is that a disk restored from an image backup of a disk that has multiply allocated blocks may pass ANALYZE/DISK/LOCK_VOLUME without any errors detected, but the damage to the files will still be there.

The following thread is one of several that discuss multiply allocated blocks.

http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1267096

Good Luck,

Jon
it depends
Hakan Zanderau ( Anders
Trusted Contributor

Re: Decnet Phase V Question

I fully agree with Jon.

Make sure the disk is OK before trying to repair DECnet.

Hakan
Don't make it worse by guessing.........
Warren G Landrum
Frequent Advisor

Re: Decnet Phase V Question

Thanks All,

Do any of you feel/think that an upgrade to VMS 8.3-1 will fix the probable system disk corruption problem that it sounds like I may have?

w
Hoff
Honored Contributor

Re: Decnet Phase V Question

> Do any of you feel/think that an upgrade to VMS 8.3-1 will fix the probable system disk corruption problem that it sounds like I may have?

An V8.3-1H1 upgrade might or will cure this for those system disk files that get replaced as part of the upgrade.

Such an upgrade won't address any (corrupted) files that might be outside the files replaced by the upgrade. The DECnet databases are typically not replaced by an upgrade, for instance, and local startup and configuration files are also typically maintained.

And this approach presumes that any lurking corruption does not somehow also destabilize the upgrade. That's unlikely, but if something wanted to rummage in the DECnet databases as part of the upgrade processing...


Colin Butcher
Esteemed Contributor

Re: Decnet Phase V Question

Having been there a few months back, I'll guess that the system disc became inadvertently corrupted when you added the second node.

One of the tricky little problems with OpenVMS on Integrity is that in order to have BOOT_OPTIONS.COM set up the boot paths for you - the target disc you want to boot from has to be mounted.

However, at that point, your system is not a member of the cluster, so the disc gets corrupted unless you mount the target disc read-only (and preferably no cache too) while booted from a local disc before you run BOOT_OPTIONS.COM to set up the boot paths.

It's not (as far as I recall) described in the install guide. I did discuss this with a couple of folks from Engineering at the last bootcamp after I'd got bitten in a similar manner.

So, you probably need to restore back to the point before you added the second node to the cluster.

Hope that helps explain things.

Cheers, Colin (http://www.xdelta.co.uk).
Entia non sunt multiplicanda praeter necessitatem (Occam's razor).
Warren G Landrum
Frequent Advisor

Re: Decnet Phase V Question

Colin, Hoff, et al:

Well that bites about the undocumented stuff that should have been done doing an Integrity original Cluster set-up. Sounds like I probably got bit as Colin suggested.

Only a few Multiply allocated files show up as I'm pasting below:

BHBO1>anal/disk sys$sysdevice
Analyze/Disk_Structure for _$1$DGA1419: started on 9-FEB-2009 09:08:55.34

%ANALDISK-I-OPENQUOTA, error opening QUOTA.SYS
-SYSTEM-W-NOSUCHFILE, no such file
%ANALDISK-W-MULTALLOC, file (5659,1,0) [SYS0.SYSMGR]ACCOUNTNG.DAT;1
multiply allocated blocks
VBN 1425 to 1440
LBN 2569104 to 2569119, RVN 1
%ANALDISK-W-MULTALLOC, file (8362,146,0) [SYSLOST]TCPIP$FTP_RUN.LOG;112
multiply allocated blocks
VBN 1 to 16
LBN 2569104 to 2569119, RVN 1
%ANALDISK-W-MULTALLOC, file (11969,24,0) [SYS0.SYSEXE]DNS$CACHE.0000004505;1
multiply allocated blocks
VBN 2145 to 2240
LBN 3297792 to 3297887, RVN 1
%ANALDISK-W-MULTALLOC, file (12045,2,0)
multiply allocated blocks
VBN 513 to 608
LBN 3297792 to 3297887, RVN 1
%ANALDISK-W-MULTALLOC, file (10106,9,0) [VMS$COMMON.SYSEXE]RIGHTSLIST.DAT;5
multiply allocated blocks
VBN 673 to 688
LBN 3298496 to 3298511, RVN 1
%ANALDISK-W-MULTALLOC, file (11969,24,0) [SYS0.SYSEXE]DNS$CACHE.0000004505;1
multiply allocated blocks
VBN 2305 to 2320
LBN 3298496 to 3298511, RVN 1
%ANALDISK-W-MULTALLOC, file (10380,14,0) [SYS0.SYSMGR]NET$ROUTING_STARTUP.NCL;1
multiply allocated blocks
VBN 1 to 16
LBN 3298544 to 3298559, RVN 1
%ANALDISK-W-MULTALLOC, file (12038,2,0) [SYS1.SYSMGR]USB$UCM_EVENTS.LOG;1
multiply allocated blocks
VBN 97 to 112
LBN 3298544 to 3298559, RVN 1

One of them DOES look as though they are related to Decnet - plus the RIGHTSLIST.DAT shows to be in that state. I Backed up the rightslist to a higher version copy so that users can grab that copy as they log in and I also rebooted one of the nodes this weekend.

Here are possible options I see that may or may not help:

1) Back out Decnet Phase V and install Decnet Phase IV - Decnet is really not even being used on this system by the application or the users.

2) Install latest consolidated update (#8) for Integrity 8.3

3) Upgrade to VMS 8.3-1H

I understand that for options 2 and 3 that if they don't replace the corrupted files, then I could conceivably have the same problem.

Based on the files above that I show being multiply allocated, what do you guys think? As I said, I can't even reconfig Phase V as stared above, because of the Checksum problem
Volker Halle
Honored Contributor

Re: Decnet Phase V Question

Warren,

to recover from those MULTALLOC errors, you need to delete ALL the files, that are reporting MULTALLOC errors. First check, whether you have good backups of those files.

Be aware, that you may have other corrupted files on your system disk, if you have done ANAL/DISK/REPAIR in the past and just deleted ONE of the involved files.

RIGHTSLIST.DAT is certainly the most critical file involved in those MULTALLOC errors reported. At least check with MC AUTHORIZE SHOW/ID/FULL * whether all existing data records seem to be readable.

You can restore NET$ROUTING_STARTUP.NCL from another system disk or from the DVD, this file is NOT system-specific.

From the options given, use upgrade to V8.3-1H1 plus install latest patches. This will help you most. Whether the upgrade really works, cannot be predicted, as the system disk of your system disk is in a questionable state.

Good luck,

Volker.
Colin Butcher
Esteemed Contributor

Re: Decnet Phase V Question

You could spend a lot of time trying to fix things and finding stuff for months to come.

I'd be exceedingly tempted to build a new system disc from scratch, starting with V8.3-1H1. If you choose to copy some files across from the corrupted disc - check each and every file's contents as you go.

Build the new system disc offline by borrowing an IA64 system if you can, or simply drop one node out of the existing cluster. Work quickly by first making a bootable disc copy of the installation DVD, then add all the layered products etc. you need, then add the command files and so on you'll be using. Also use command files to create the accounts, set up the poxy database, etc. etc. This way you'll also have a complete set of all the pieces you need to create the system. Don't forget the EFI settings backup & restore for the console too.

It's probably quicker in the long run - and you'll know you're in good shape from here on. Yes, it's painful to contemplate - but probably not as painful as having a major production problem later on that causes you to have to start again, but under severe pressure.

Cheers, Colin (http://www.xdelta.co.uk).
Entia non sunt multiplicanda praeter necessitatem (Occam's razor).
Hoff
Honored Contributor

Re: Decnet Phase V Question

Build a new system disk. No question.

Chasing an unstable system disk or chasing a system disk that's been hacked around on is not worth the effort; attempting to repair such a disk is the more expensive strategy.

A fresh install is also the opportunity to understand what's on your disk, and how it's set up. Which can be valuable from the perspective of recovery, and around such things as support contracts -- do you really need XYZ product? -- and software versions.

Sure. This hurts. So does chasing yet more DECnet-Plus weirdness, and whatever new permutations of weirdness arise if (when?) additional corrupt blocks are encountered.
Warren G Landrum
Frequent Advisor

Re: Decnet Phase V Question

Thanks Guys,

Much as it hurts, I'll go ahead and make plans to build a new system disk, to ensure that no more goblins come up and bite us when we least expect it !!!

Warren
Warren G Landrum
Frequent Advisor

Re: Decnet Phase V Question

Will build new system disk, per recommendations of my peers (and I use that word loosely to even DARE put myself in the same class with you guys :-)

w