cancel
Showing results for 
Search instead for 
Did you mean: 

How to set up TruCluster.

Tim_SW
Advisor

How to set up TruCluster.

Can anybody help with follow problem?

My two ES40 nodes cannot properly join in cluster. I provide screen output information.
If someone knows what response send one node to other, please, say what it is. I mean ACK - acknowledge.

Thanks to all for any help.

First node:


P00>>>b dkb300 -fl A
(boot dkb300.3.0.2.1 -flags A)
jumping to bootstrap code

Digital UNIX boot - Mon Apr 12 12:39:50 EDT 1999

Loading vmunix ...
Loading at 0xfffffc0000230000
Current PAL Revision <0x4006800010162>
Switching to OSF PALcode Succeeded
New PAL Revision <0x400690002015c>

Sizes:
text = 4082528
data = 799808
bss = 2812688
Starting at 0xfffffc0000467a80

No B-cache detected
Alpha boot: available memory from 0xdea000 to 0xfffe000
Digital UNIX V4.0F (Rev. 1229); Wed Jun 28 02:02:55 MET DST 2000
physical memory = 256.00 megabytes.
available memory = 242.09 megabytes.
using 975 buffers containing 7.61 megabytes of memory
Firmware revision: 7.0-2
PALcode: Digital UNIX version 1.92-105
AlphaServer ES40
pci1 at nexus
isp0 at pci1 slot 1
isp0: QLOGIC ISP1020B/V2
isp0: Firmware revision 5.57 (loaded by console)
scsi0 at isp0 slot 0
isp1 at pci1 slot 2
isp1: QLOGIC ISP1020B/V2
isp1: Firmware revision 5.57 (loaded by console)
scsi1 at isp1 slot 0
rz11 at scsi1 target 3 lun 0 (LID=0) (DEC RZ28 (C)DEC 0200) (Wide32)
isp2 at pci1 slot 3
isp2: QLOGIC ISP1020B/V2
isp2: Firmware revision 5.57 (loaded by console)
scsi2 at isp2 slot 0
rz20 at scsi2 target 4 lun 0 (LID=1) (DEC RZ28 (C)DEC 0200) (Wide32)
gpc0 at isa0
pci0 at nexus
mchan0: sg_tlb_ptr = fffffc000049e9a0
mchan0: pct base io_handle = fffffd0020000000
mchan0: xmt base = fffffd0020010800
mchan0: sg_ctl = fffffc000f4aaf00
mchan0: softc structure address = fffffc000f462000
mchan0: xmt_phys_addr = 80020000000
mchan0: window size = 20000000
mchan0: Module revision = 65E
mchan0: supported 16 nodes
mchan0: Resetting controller, come_online = 1
mchan0: jumpered as HUB configuration
mchan0: Probe successful
mchan0 at pci0 slot 3
mchan0: controller_state = 10 node_state = 0 node_id = ffffffff
isa0 at pci0
ace0 at isa0
ata0 at pci0 slot 15
ata0: ACER M1543C
scsi3 at ata0 slot 0
rz24 at scsi3 target 0 lun 0 (LID=2) (_NEC CD-ROM CD-3002B C500)
mchan0: State change mcport = 52400001 mcerr= 0 lcsr = c078 hbeats = 0
mchan0: controller is coming online
scsi4 at ata0 slot 1
fru_table_binlog: can't allocate buffer for FRU table packet
kernel console: ace0
dli: configured
clubase: configured
mchan0: Scatter gather space allocated at c0000000
mchan0: mchan_delay_deassert_put - mcport_ptr = 52400001
mchan0: mchan_delay_deassert_put - initial lcsr = 0
mchan0: mchan_delay_deassert_put has deasserted PUT lcsr = 0
mchan0: mchan_delay_deassert_put - after enable interrupts lcsr = 78
mchan0: mchan_delay_deassert_put - after setting OLEN mcport_ptr = 72400001 lcsr_ptr = 18c078
mchan0: State change mcport = 72400001 mcerr= 0 lcsr = 18c078 hbeats = 0
mchan0: controller is coming online
dlmsl: configured
drd: configured.
cnxagent: configured
dlm: configured.
ADVFS: using 2322 buffers containing 18.14 megabytes of memory
mchan0: heartbeat started
mchan0: lcsr = 180000 mcport = 72400001
mchan0: mchan_delay_ctlr_online - node id = 0
mchan0 - mchan_delay_ctlr_online - mcport = 72400001
mchan0: old node state = 0 new node_state = 3
mchan0: Log to Binary Error log ... error = 1801
mchan0: node 1 has come online
mchan0: mchan_delay_ctlr_online - call's online handler
mchan0: lcsr = 180078 mcport = 72400001
mchan0: mchan_delay_ctlr_online - node id = 0
mchan0 - mchan_delay_ctlr_online - mcport = 72400001
mchan0: lcsr = 180078 mcport = 72400001
mchan0: mchan_delay_ctlr_online - node id = 0
mchan0 - mchan_delay_ctlr_online - mcport = 72400001
vm_swap_init: vm_swapon for /dev/rz19d device failed
vm_swap_init: swap is set to lazy (over commitment) mode
Checking local filesystems
/sbin/ufs_fsck -p
Mounting / (root)
user_cfg_pt: reconfigured
Mounting local filesystems
exec: /sbin/mount_advfs -F 81920 root_domain#root /
root_domain#root on / type advfs (rw)
/proc on /proc type procfs (rw)
exec: /sbin/mount_advfs -F 16384 usr_domain#usr /usr
usr_domain#usr on /usr type advfs (rw)
rm_sw_init: begin MC initialization.
rm_boot_am_i_alone: entered
checking for existing memory channel nodes
Jul 5 15:08:41 update: started
/dev/rz20d or an overlapping partition is open.
Quitting ....
unresponsive mc nodes - waiting for node mask 2
The system is coming up. Please wait...
Checking for crash dumps
unresponsive mc nodes - waiting for node mask 2
Initializing paging space
/dev/rz20d or an overlapping partition is open.
Quitting ....
unresponsive mc nodes - waiting for node mask 2
Mounting Memory filesystems
Streams autopushes configured
Configuring network
hostname: axp004
/sbin/rc3.d/S00inet: ifconfig failed - ifconfig: ioctl (SIOCGIFFLAGS): no such interface: tu0
Loading LMF licenses
unresponsive mc nodes - waiting for node mask 2
System error logger started
Binary error logger started



Second node (slave):


P00>>>b dkc0 -fl A
(boot dkc0.0.0.3.1 -flags A)
jumping to bootstrap code

Digital UNIX boot - Mon Apr 12 12:39:50 EDT 1999

Loading vmunix ...
Loading at 0xfffffc0000230000
Current PAL Revision <0x4006800010162>
Switching to OSF PALcode Succeeded
New PAL Revision <0x400690002015c>

Sizes:
text = 3541936
data = 684368
bss = 2729440
Starting at 0xfffffc00003e5c60

No B-cache detected
Alpha boot: available memory from 0x1a32000 to 0x3fffc000
Digital UNIX V4.0F (Rev. 1229); Sun Jun 14 17:55:06 MET DST 2009
physical memory = 1024.00 megabytes.
available memory = 997.80 megabytes.
using 3924 buffers containing 30.65 megabytes of memory
Firmware revision: 7.0-2
PALcode: Digital UNIX version 1.92-105
AlphaServer ES40
pci1 at nexus
isp0 at pci1 slot 1
isp0: QLOGIC ISP1020B/V2
isp0: Firmware revision 5.57 (loaded by console)
scsi0 at isp0 slot 0
isp1 at pci1 slot 2
isp1: QLOGIC ISP1020B/V2
isp1: Firmware revision 5.57 (loaded by console)
scsi1 at isp1 slot 0
isp2 at pci1 slot 3
isp2: QLOGIC ISP1020B/V2
isp2: Firmware revision 5.57 (loaded by console)
scsi2 at isp2 slot 0
rz16 at scsi2 target 0 lun 0 (LID=0) (DEC RZ29 (C)DEC 0200) (Wide32)
gpc0 at isa0
pci0 at nexus
mchan0: sg_tlb_ptr = fffffc000041cb80
mchan0: pct base io_handle = fffffd0020000000
mchan0: xmt base = fffffd0020010800
mchan0: sg_ctl = fffffc003db46b80
mchan0: softc structure address = fffffc003db00000
mchan0: xmt_phys_addr = 80020000000
mchan0: window size = 20000000
mchan0: Module revision = 65E
mchan0: supported 16 nodes
mchan0: Resetting controller, come_online = 1
mchan0: jumpered as HUB configuration
mchan0: Probe successful
mchan0 at pci0 slot 2
mchan0: controller_state = 10 node_state = 0 node_id = ffffffff
isa0 at pci0
ace0 at isa0
ata0 at pci0 slot 15
ata0: ACER M1543C
scsi3 at ata0 slot 0
rz24 at scsi3 target 0 lun 0 (LID=1) (TSSTcorpCDDVDW SH-S223Q SB00)
scsi4 at ata0 slot 1
Created FRU table binary error log packet
kernel console: ace0
dli: configured
clubase: configured
mchan0: Scatter gather space allocated at c0000000
mchan0: mchan_delay_deassert_put - mcport_ptr = 52410001
mchan0: mchan_delay_deassert_put - initial lcsr = 78
mchan0: mchan_delay_deassert_put has deasserted PUT lcsr = 78
mchan0: mchan_delay_deassert_put - after enable interrupts lcsr = 78
mchan0: mchan_delay_deassert_put - after setting OLEN mcport_ptr = 72410001 lcsr_ptr = 18c078
mchan0: State change mcport = 72410001 mcerr= 400 lcsr = 18c078 hbeats = 0
mchan0: controller is coming online
dlmsl: configured
drd: configured.
cnxagent: configured
dlm: configured.
mchan0: heartbeat started
Checking local filesystems
/sbin/ufs_fsck -p
mchan0: lcsr = 180000 mcport = 72410001
mchan0: mchan_delay_ctlr_online - node id = 1
mchan0 - mchan_delay_ctlr_online - mcport = 72410001
mchan0: old node state = 0 new node_state = 3
mchan0: Log to Binary Error log ... error = 1800
mchan0: node 0 has come online
mchan0: mchan_delay_ctlr_online - call's online handler
mchan0: lcsr = 180078 mcport = 72410001
mchan0: mchan_delay_ctlr_online - node id = 1
mchan0 - mchan_delay_ctlr_online - mcport = 72410001
/dev/rrz16a: UNREF FILE I=2442 OWNER=root MODE=100644
/dev/rrz16a: SIZE=0 MTIME=Jun 22 21:04 2009 (CLEARED)
/dev/rrz16a: FREE BLK COUNT(S) WRONG IN SUPERBLK (SALVAGED)
/dev/rrz16a: BLK(S) MISSING IN BIT MAPS (SALVAGED)
/dev/rrz16a: SUMMARY INFORMATION BAD (SALVAGED)
/dev/rrz16a: 1460 files, 68449 used, 60302 free (54 frags, 7531 blocks, 0.0% fragmentation)
rm_sw_init: begin MC initialization.
rm_boot_am_i_alone: entered
checking for existing memory channel nodes
rm_slave_init
slave unit boot phase 0: checking cables
slave unit boot phase 1: request data ...
/dev/rrz16e: 18031 files, 391832 used, 873502 free (1494 frags, 109001 blocks, 0.1% fragmentation)
/dev/rrz16f: FREE BLK COUNT(S) WRONG IN SUPERBLK (SALVAGED)
/dev/rrz16f: BLK(S) MISSING IN BIT MAPS (SALVAGED)
/dev/rrz16f: SUMMARY INFORMATION BAD (SALVAGED)
/dev/rrz16f: 1261 files, 31949 used, 1233385 free (737 frags, 154081 blocks, 0.1% fragmentation)
Mounting / (root)
user_cfg_pt: reconfigured
Mounting local filesystems
/dev/rz16a on / type ufs (rw)
/proc on /proc type procfs (rw)
/dev/rz16e on /usr type ufs (rw)
/dev/rz16f on /var type ufs (rw)
Jun 22 21:24:24 update: started
swapon: added /dev/rz16d as swap device.
The system is coming up. Please wait...
Checking for crash dumps
Initializing paging space
Mounting Memory filesystems
Streams autopushes configured
Configuring network
hostname: axp005
/sbin/rc3.d/S00inet: ifconfig failed - ifconfig: ioctl (SIOCGIFFLAGS): no such interface: tu0
Loading LMF licenses
System error logger started
Binary error logger started
writing to routing socket: Network is unreachable
add net default: gateway 192.168.149.1: ioctl returns 51
Network is unreachable
imc_init: MEMORY CHANNEL API - get_RM_information() failed with status 212
MEMORY CHANNEL API initialization pending
Setting kernel timezone variable
rm_boot_signal_any_node: node 0 seen as up but not rsponding
crashing node 0, caller = 0xfffffc00004caca0
ONC portmap service started
NFS IO service started
Mounting NFS filesystems
Preserving editor files
security configuration set to default (BASE).
Successful SIA initialization

Clearing temporary files
Unlocking ptys
SMTP Mail Service started.
Environmental Monitoring Subsystem Configured.
Using snmp service entry port 161.
Extensible SNMP master agent started
Base O/S sub-agent started
Server System sub-agent started
Server Management sub-agent started
Compaq Management sub-agent started
Insight Manager Agent started
Environmental Monitoring Daemon did not start...trying again
Environmental Monitoring Daemon did not start...trying again
rm_boot_signal_any_node: node 0 seen as up but not rsponding
crashing node 0, caller = 0xfffffc00004caca0
Environmental Monitoring Daemon did not start after 3 tries.
Internet services provided.
Cron service started
SuperLAT. Copyright 1994 Meridian Technology Corp. All rights reserved.
Can not add network adapter
Printer service started
The system is ready.



Digital UNIX Version V4.0 (axp005) console
8 REPLIES
Rob Leadbeater
Honored Contributor

Re: How to set up TruCluster.

Hi,

It looks like something is probably wrong with the memory channel configuration.

With both machines at the SRM (P00>>>) prompt, try running the commands

mc_diag
and
mc_cable

These will test that the cluster communication is working correctly.

Cheers,

Rob
Pieter 't Hart
Honored Contributor

Re: How to set up TruCluster.

both nodes report "mchan0: jumpered as HUB configuration"
so there must be a MC HUB (pc like box) connecting both MC interfaces from the nodes.
check if this is powered and functioning.

if no hub present, one must be jumpered vhub0 other vhub1.
Tim_SW
Advisor

Re: How to set up TruCluster.

Rob, I want to ask just one question. Do you know what value is stick out as ACK at transmitting messages through a cable. I heard it is 0x80, isn't it?
Vladimir Fabecic
Honored Contributor

Re: How to set up TruCluster.

Tim,
Do you have MC HUB or not?
As I can see, both cards are jumpered for HUB configuration:
"mchan0: jumpered as HUB configuration"
If you do not have MC HUB, do what Pieter said.
In vino veritas, in VMS cluster
Tim_SW
Advisor

Re: How to set up TruCluster.

Gentlemen, may I ask you whether you known activity of the adapter or not. I want ask this, nor another.

Thank you.
Tim_SW
Advisor

Re: How to set up TruCluster.

Sorry, I forgot to add - "activity at booting/initializing", especially what acknowledge response sents one node to other.
I'm interested in programming that, and nor administration. My cables are connected properly. And everything set up correctly. I want I would have been helped with low-level control.
Pieter 't Hart
Honored Contributor

Re: How to set up TruCluster.

I looked at Tim's other posts.
He is not trying to recover a broken cluster, but he is at the level of programming communication using the MC interface.

so from his first post his question is :
>>> If someone knows what response send one node to other, please, say what it is. I mean ACK - acknowledge.<<<<
to prevent the "node 0 seen as up but not rsponding" message at startup

But Tim,
also in this situation the MC-hardware must be correctly configured first.
the HUB-mode or "virtual-HUb mode" is comparible to crossed or straight UTP-cables.
The wrong connection wont be solved by programming!

But this is also discussed in your other posts. These threads are not closed yes, so not solved?????

Pieter
Rob Leadbeater
Honored Contributor

Re: How to set up TruCluster.

Hi,

> My cables are connected properly. And
> everything set up correctly.

How do you know ? From re-reading your previous posts, it sounds as though you don't have a memory channel hub, however the output above shows that your cards are configured in HUB mode.

mchan0: jumpered as HUB configuration

As has been stated numerous times now, the output of the SRM commands mc_diag and mc_cable should give you some insight into what's wrong.

As to the specifics of ACK messages, I haven't got a clue, and I don't really see why you would be interested either...

The other question that springs to mind after re-reading your posts, is "exactly what are you trying to achieve ?"

Cheers,

Rob