Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Memory Channel error in VMS 7.2-2 cluster boot

SOLVED
Go to solution
Antonio Gonzalez_4
Regular Advisor

Memory Channel error in VMS 7.2-2 cluster boot

We are migrating a VMS 7.2-2 CI Cluster to Memory Channel & Fibre Channel.
There are two MC rails (Fiber Optics extensions):
Booting one node is OK, but when I boot the second, I get this error:

%MCA0 CPU00: 21-AUG-2006 14:48:49 Could not allocate SVAs for transmit space.
%MCA0 CPU00: 21-AUG-2006 14:48:49 Port going permanently offline.

After that, MCA0 & MCB0 are offline with errors.The nodes come up as cluster but don't see each other as members. MC doesn't work and they seem to be ignoring the quorum disk communication at all.

MC_diag & MC_cable are fine.
VMS722_MEM_CHAN-V0200 installed.
Disks are in one MA8000 at each site and shadowed (all but the qdsk).
Boxes are ES40.

BTW: update the OS version is not possible at all (I'm sorry).


Any idea so far ??
thanks
Antonio
10 REPLIES
Heinz W Genhart
Honored Contributor

Re: Memory Channel error in VMS 7.2-2 cluster boot

Hi Antonio

how does cluster communication work ? Did you enable NI? Otherwise I can not realy understand how the second machine joined the cluster (did it?)
There is the possibility for a partitioned cluster !!!

How is your interconnect definition inModparams.dat ? Is there MC defined ?

What Circuits are dispalyed with
$show cluster/cont
add circ

Regards

Heinz
Antonio Gonzalez_4
Regular Advisor

Re: Memory Channel error in VMS 7.2-2 cluster boot

Hi Heinz, thanks for the reply.

Actually the two boxes are not attached to LAN, only MC.

I've the disk from the old CI cluster (roots SYS0 & SYS1 are OK and don't like to recreate them). Now I'm reconfiguring the cluster to use the MC links instead the CI. These 2 boxes has no CI in it; just MC & FC.
Booting only NODE V I created the cluster (new one) ... OK. If I add the other node with CLUSTER_CONFIG from here, I'm afraid the script will overwrite SYS1 for the other node (Node M), thus I did this:
Booting only NODE N I created the cluster (new one) ... OK
Same boot disk, different root (SYS0 & SYS1), same Qdisk. Then I tryed to boot both at once and got the error.

Nodes can only talk through MC (and sort of by Qdisk). Both nodes come up as clusters BUT in a PARTITONED fashion !!.

I can see the MC error, and then problems with the VMS shadow disks, mount verifications, ...

I attach the info from node V only. I'll do also that for node M tomorrow.

thxs
Volker Halle
Honored Contributor

Re: Memory Channel error in VMS 7.2-2 cluster boot

Antonio,

the cluster data from Node V look o.k. - especially EXPECTED_VOTES=3, VOTES=1 and QDSKVOTES=1.

Make sure the other node also has EXPECTED_VOTES=1, otherwise you'll get a partitioned cluster.

Also make sure, that both nodes define the SAME value for DISK_QUORUM - if so, they should also detect via the quorum disk, if a partitioned cluster exists.

Volker.
Heinz W Genhart
Honored Contributor

Re: Memory Channel error in VMS 7.2-2 cluster boot

Hi Antonio

I think Volker would like to write:

'Make sure the other node also has EXPECTED_VOTES=3, otherwise you'll get a partitioned cluster.'

Expected Votes should be 3 on both nodes (and not one), if each of them has one vote and the quorum disk has one vote.

The starting position is:

- You have a Disk from a 2 node CI cluster
- You want to use this disk for another configuration (or same configuration) with a MC

I would do the following (assuming all other parameters are as in your attachement):

- I would boot node 1, then change interconnect in its modparams.dat to "MC", do autogen and then shutdown

- I would do same with the another node, but with reboot.

- after second node is up, it should be possible to boot also the first node into the cluster.

- After both nodes are up I would recommend to change cluster password and group.

Is there a reason, why you load MSCP?

Regards

Heinz


Volker Halle
Honored Contributor

Re: Memory Channel error in VMS 7.2-2 cluster boot

Antonio,

Heinz is right, I meant to say: EXPECTED_VOTES=3 !

You may still want to configure another (redundant) cluster interconnect (i.e. LAN) and not rely solely on memory channel.

MCDRIVER issues this error message, if it cannot allocate enough contiguous system virtual addresses to map the transmit space. Let's hope that AUTOGEN would take the right steps here.

Volker.

Heinz W Genhart
Honored Contributor

Re: Memory Channel error in VMS 7.2-2 cluster boot

Hi together

I agree with Volker.

But I would realy recommend to change cluster password and group as soon as possible.

If you enable NI for cluster communications, it is possible, that you boot your node into another existing cluster. This happens only if another existing cluster has also enabled NI and it has same cluster group and password.

Sounds maybee a litle bit strange... but we had a similar problem like this already ...

Regards

Heinz
Antonio Gonzalez_4
Regular Advisor

Re: Memory Channel error in VMS 7.2-2 cluster boot

I'll check:

1.- EXPECTED_VOTES = 3 in both nodes
2.- Both nodes has the same Qdisk
3.- I guess that AUTOGEN was wrong in one of the nodes (the other node don't shows the MC error !!). I'll double check AUTOGEN again.
4.- In fact I don't need/want MSCP; I'll disable MSCP_LOAD.
5.- Regarding cluster/password ... I thought thatthis was used just for NI clusters. In MC/CI this is not used. Right ?? Anyway I'll update/synchronize that.

And now one easy question: is a MC Cluster a Shared SCSI cluster ??. I think that the answer for this CLUSTER_CONFIG question is YES. Right ??


My problem is that MC is not working at all in one of the nodes, from this came all the other ugly stuff (I gess !!).

I'll update.
Thanks !
Antonio
Heinz W Genhart
Honored Contributor

Re: Memory Channel error in VMS 7.2-2 cluster boot

Hi Antonio

Regarding cluster/password ...
You are right, its only used for NI Clusters

But... Volker recommended to enable NI and I agree with him, but what happens if you use this systemdisk for another cluster too (with another Nodename and another addresses)
and this other cluster has also NI enabled and same cluster group and password.......

To change this cluster authorization you need 10 seconds and then you can forgett it.


I think this is not a shared SCSI Cluster. (I did not use cluster_config for a long time)

Regards

Heinz
Volker Halle
Honored Contributor
Solution

Re: Memory Channel error in VMS 7.2-2 cluster boot

Antonio,

the cluster group number and password are only used for cluster communication via the LAN (PEDRIVER).

The CLUSTER_CONFIG question regarding the shared SCSI bus is only to be answered with YES, if the two nodes share a directly attached SCSI bus. This answer determines further questions for disk and/or port allocation classes. Fibre Channel (although it is also a SCSI protocol) is not relevant in this context.

If AUTOGEN somehow failed on the other node, which is now giving you the %MCA0 errors during startup, that's probably the reason for those errors. AUTOGEN explicitly does calculations for MC (see symbol AGEN$MC_PAGES).

Volker.

Volker.
Antonio Gonzalez_4
Regular Advisor

Re: Memory Channel error in VMS 7.2-2 cluster boot

Autogen was run in both nodes but the roblem remains.

After some help from HP support, the problem is solved now. There were some parameters VERY missaligned, specially:

WSMAX
MAXPROCESSCNT
BALSETCNT
NPAGEVIR

after some reasonable adjust to them, the MC driver can find 128 MB in address space to load buffers.

If I find something else I'll post it.

Thanks you all
antonio