Re: Problem with Cluster

Goriik · ‎05-05-2010

Hi. I have two node in Cluster Service Guard.

I had change IP address of interface NODE1.

I did this step by step.

1.Stop the cluster that you are running for reconfiguring from old to new IP address.

# cmhaltcl -f

2.change proper the IP address for heartbeat in /etc/cmcluster/cluster.ascii ( cluster configuration file)

# vi /etc/cmcluster/cluster.ascii

--------------------------------------------
NETWORK_INTERFACE lan1
HEARTBEAT_IP xxx.xxx.xxx.xxx <- new

FIRST_CLUSTER_LOCK_PV /dev/dsk/c7t0d4

--------------------------------------------

3. Check the changed configuration file in the cluster.

# cmcheckconf -C /etc/cmcluster/cluster.ascii

4. copy and apply for making binary file for taking effect

# cmapplyconf -C /etc/cmcluster/cluster.ascii

5. run the cluster and monitor the control logs.

# cmruncl

When cluster start TWO NODE will rebooted.

After NODE1 will rebooted once again.

Log NODE1 (When i change ip address):

17:54 Tue May 04 2010. Reboot after panic: SafetyTimer expired, INIT, IIP:0x00000707fc4a2b60 IFA:0xe0000001205cfd28
18:04 Tue May 04 2010. Reboot after panic: SafetyTimer expired, INIT, IIP:0x00000707fc4b0910 IFA:0xe0000001205cfd28
_______________________________________________

Message from syslogd@NODE1 at Tue May 4 17:49:54 2010 ...
vparcher cmcld[9565]: Halting vparcher to preserve data integrity
May 4 17:49:54 vparcher cmcld[9565]: Reason: A crucial package failed
May 4 17:49:54 vparcher cmcld[9565]: Reason: A crucial package failed

Message from syslogd@NODE1 at Tue May 4 17:49:54 2010 ...
NODE1 cmcld[9565]: Reason: A crucial package failed

INIT occurs.
INIT: make crash event table.
INIT: Waiting for processors to save state.
INIT: Invoking callbacks.
Calling function e00000000160c700 for Shutdown State 9 type 0x10
Calling function e0000000020304e0 for Shutdown State 9 type 0x10
SafetyTimer expired, INIT, IIP:0x00000707fc4a2b60 IFA:0xe0000001205cfd28
INIT: Executing platform dependent procedures.
INIT: Begin crashdump.
i 0 pfn 0x1080000 pages 0x7cdd4
i 1 pfn 0x10fce7c pages 0x172
i 2 pfn 0x1100000 pages 0x180000
i 3 pfn 0x1780000 pages 0x200000
*** Not enough CPUS for a compressed dump ***

*** A system crash has occurred. (See the above messages for details.)
*** The system is now preparing to dump physical memory to disk, for use
*** in debugging the crash.

*** The dump will be a SELECTIVE dump with
compression OFF and concurrency ON: 2067 of 16350 megabytes.
*** To change this dump type, press any key within 10 seconds.
*** Proceeding with selective dump, with compression off and concurrency on.

Primary Dump Header Location :
Device details:
Major number: 31 Minor number:0x30100
Offset: 2349920.
*** The dump may be aborted at any time by pressing ESC.
*** Dumping: 100% complete (2067 of 2067 MB)
time: 35 seconds, Number of Dump units: 1
INIT[0]: OS_INIT ends. Resetting the system.
Initializing IO Devices ...
LBA Cell 03 (12): Occupied PCI-X 133MHz
Scan PCI:
Rope Slot Seg Bus Dev Fun Card
====================================================================
12 08 0x39 0x00 0x01 0x00 PCI Bridge (0x01a7,0x1014)
12 08 0x39 0x01 0x04 0x00 Ethernet (0x1079,0x8086)
12 08 0x39 0x01 0x04 0x01 Ethernet (0x1079,0x8086)
12 08 0x39 0x01 0x06 0x00 Ethernet (0x1079,0x8086)
12 08 0x39 0x01 0x06 0x01 Ethernet (0x1079,0x8086)
LBA Cell 03 (04): Occupied PCIe x8
Scan PCI:
Rope Slot Seg Bus Dev Fun Card
====================================================================
04 03 0x33 0x00 0x00 0x00 PCIe Root Port (0x403b,0x103c)
04 03 0x33 0x01 0x00 0x00 Fibre Channel (0x2532,0x1077)
LBA Cell 03 (02/03): Occupied PCIe x8
Scan PCI:
Rope Slot Seg Bus Dev Fun Card
====================================================================
02 02 0x32 0x00 0x00 0x00 PCIe Root Port (0x403b,0x103c)
02 02 0x32 0x01 0x00 0x00 Fibre Channel (0x2532,0x1077)
LBA Cell 03 (00): Occupied PCI 33MHz
Scan PCI:
Rope Slot Seg Bus Dev Fun Card
====================================================================
00 00 0x30 0x00 0x01 0x00 Network (0xb921,0x1133)
Complete

Log NODE2

17:57 Tue May 04 2010. Reboot after panic: SafetyTimer expired, INIT, IIP:0x00000707fc4a2b60 IFA:0xe0000001205cfd28

S-M-S · ‎05-05-2010

hi Goriik,
I think monitored subnet is lost.

Please check for subnet enrty in .ascii file Whether it is true for your new IP

Vishu · ‎05-05-2010

Hi Goriik,

if you see, you panic occured due to "Safety timer expiration", which calls INIT to reboot the server and safety timer expiration comes, when cmcld is not able to communicate with the cluster nodes.

1) is your new IP address having the same subnet as your second node has?
2) i agree with S.N.S, check your monitored subnet. it might have lost.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Problem with Cluster

Problem with Cluster

Re: Problem with Cluster

Re: Problem with Cluster