Server Clustering
cancel
Showing results for 
Search instead for 
Did you mean: 

CMU 8.0: clone fails with it's own image, CERBST_ERROR

brs
Occasional Contributor

CMU 8.0: clone fails with it's own image, CERBST_ERROR

Hi I recently installed CMU 8.0 on our new cluster.

Everything seems to work as intended and I have succesfully created a backup image of the golden node.

To test the image I'm attempting to clone it back to the node I just created it from, but this fails with the following output. (the reconf.sh script doesn't do anything by the way.)

[11-Nov-2016_11:44:54] [CerbereInquisitor] connected nodes: 1 (pn101): still to be cloned:0
[11-Nov-2016_11:44:58] [CerbereServer] Update Status
[11-Nov-2016_11:44:58] [CerbereDB] Checking node named pn101
[11-Nov-2016_11:44:58] [CerbereDB] Node pn101 checked
[11-Nov-2016_11:44:58] Node pn101 changing status: CERBST_EXTRACTING_BACKUP_IMAGE -> CERBST_EXECUTING_POST_RECONF
[11-Nov-2016_11:44:59] [CerbereServer] Update Status
[11-Nov-2016_11:44:59] [CerbereDB] Checking node named pn101
[11-Nov-2016_11:44:59] [CerbereDB] Node pn101 checked
[11-Nov-2016_11:44:59] Node pn101 changing status: CERBST_EXECUTING_POST_RECONF -> CERBST_ERROR
[11-Nov-2016_11:44:59] [CerbereServer] Node pn101 is disconnected
[11-Nov-2016_11:44:59] Node pn101 changing status: CERBST_ERROR -> CERBST_DISCONNECTED
[11-Nov-2016_11:44:59] [CerbereServer] Node pn101 has closed its main connection
[11-Nov-2016_11:44:59] (information),two gettimeofday measures into the same second
[11-Nov-2016_11:44:59] [CerbereInquisitor] connected nodes: 0: still to be cloned:0
[11-Nov-2016_11:44:59] [CerbereInquisitor] No consumer, no client. Time to stop
[11-Nov-2016_11:44:59] [CerbereNB] Netboot cleaning (/etc)
[11-Nov-2016_11:44:59] [CerbereNB] Netboot opens /etc/exports
[11-Nov-2016_11:44:59] [CerbereNB] Removing netboot tags in file /etc/exports
[11-Nov-2016_11:44:59] [CerbereDB] Database report:
[11-Nov-2016_11:44:59] [CerbereDB] 	                               | cloned | error | unknown
[11-Nov-2016_11:44:59] [CerbereDB] 	                      prod1_p0 |      0 |     1 |       0
[11-Nov-2016_11:44:59] [CerbereDB] 	                         Total |      0 |     1 |       0
[11-Nov-2016_11:44:59] [CerbereDB] List of nodes in error: pn101
[11-Nov-2016_11:44:59] [CerbereServer] Delete "/opt/cmu/ntbt/rp/x86_64/etc/rc.d/auto/cmucerbere.sh-21309"
[11-Nov-2016_11:44:59] [CerbereTypes] Delete Cerbere Data
[11-Nov-2016_11:44:59] [CerbereTypes] Delete Cerbere Data - Stage 0
[11-Nov-2016_11:44:59] [CerbereTypes] Delete Cerbere Data - Stage 1
[11-Nov-2016_11:44:59] [CerbereTypes] Delete Cerbere Data - Stage 2
[11-Nov-2016_11:44:59] [CerbereTypes] Delete Cerbere Data - Stage 3
[11-Nov-2016_11:44:59] Cerbere is terminating with status 0

And the node is left with power on. If I log into the console I see the following:pn101.png

The only thing I had to do to get the backup to work, was specify the root partition (partition 3), the automatic mode couldn't detect it correctly. But I don't see how it should influence the cloning.

Otherwise the system is rather straight forward (everything is CentOS 7.2).

Has anyone experienced similar or do you have any good ideas on how to debug the problem further.

Cheers,

Brian Højen-Sørensen

3 REPLIES
Abhishekc
Advisor

Re: CMU 8.0: clone fails with it's own image, CERBST_ERROR

 

Hello,

What is the nic interface name on the backup node ? Is it a “systemd” based consistent naming convention (like enp0s1) ?  You can check this by 'ip addr' output on the backup node.

Is the backup node and cloned nodes are in homogenous ?

Looks like cloning failed at post reconf stage while reconfiguring the network on cloned node. Please send the failed node specific cmucerbere log file [cmucerbere-<nodename>-<pid>.log] in /opt/cmu/log on the head node.

Also, for more details, please go through the section "5.21 Admin NIC configuration may fail on heterogeneous compute nodes that are cloned with the same RHEL 7 backup image" of HPE Insight CMU 8.0 release notes.

BTW, is it a customer cluster ? if yes please send us the customer details and also raise an issue with your local HPE support center.

 

 

brs
Occasional Contributor

Re: CMU 8.0: clone fails with it's own image, CERBST_ERROR

Hello,

I also came to the conclusion that it must be related to the NICs (see the attached cmucerbere-pn101-21309.log)

As you say it's systemd based and are indeed using the consistent naming convention (eno1,eno2,ens1f0,ens1f1,ib0,ib1) which means that there is no eth0.

The backup node and clone node are one and the same (and all other nodes are identical). The failed attempt to clone is to the exact same node (golden node) as I used to create the image (pn101 -> pn101). So there are no heteregenous nodes.

All that is done is (GUI gives same result):

/opt/cmu/bin/cmu_backup -l prod_node1 -n pn101 -r 3

and then:

/opt/cmu/bin/cmu_clone -n pn101 -i prod_node1 -s /opt/cmu/log/pn101_clone.log

Yes it is a customer cluster, but we are managing it ourselves (what customer details are needed).

extract from cmucerbere-pn101-21309.log (I can only attach image files)

+ filepath=/etc/sysconfig/network-scripts/ifcfg-eth0
+ getopts i:n:t:b:p:h:d:f: Option
+ '[' -z 10.230.98.64 ']'
+ '[' -z 255.255.255.0 ']'
+ '[' -z 10.230.98.0 ']'
+ '[' -z 10.230.98.255 ']'
+ '[' -z 24 ']'
+ '[' -z pn101 ']'
+ '[' -z /opt/cmu/mnt/sda3 ']'
+ '[' -z /etc/sysconfig/network-scripts/ifcfg-eth0 ']'
+ NIC_FILE=/opt/cmu/mnt/sda3//etc/sysconfig/network-scripts/ifcfg-eth0
+ '[' '!' -e /opt/cmu/mnt/sda3//etc/sysconfig/network-scripts/ifcfg-eth0 ']'
+ echo 'file /opt/cmu/mnt/sda3//etc/sysconfig/network-scripts/ifcfg-eth0 doesn'\''t exists'
file /opt/cmu/mnt/sda3//etc/sysconfig/network-scripts/ifcfg-eth0 doesn't exists
+ exit 1
[11-Nov-2016_11:44:58] [CerbereRcfg] Error while reconfiguring /opt/cmu/mnt/sda3/etc/sysconfig/network-scripts/ifcfg-eth0
[11-Nov-2016_11:44:58] [CerbereRcfg] Error while reconfiguring client files
[11-Nov-2016_11:44:58] [CerbereClient] Error while extracting Backup Image
[11-Nov-2016_11:44:58] [CerbereClient] Error in Cerbere Secondary Client
[11-Nov-2016_11:44:58] [CerbereClient] Sends status CERBST_ERROR for node pn101
[11-Nov-2016_11:44:58] [CerbereServer] Error in Cerbere Secondary Server Boot phase
[11-Nov-2016_11:44:58] [CerbereNB] Netboot cleaning (/etc)
[11-Nov-2016_11:44:58] [CerbereNB] Netboot opens /etc/exports
[11-Nov-2016_11:44:58] [CerbereNB] Removing netboot tags in file /etc/exports
[11-Nov-2016_11:44:58] [CerbereServer] Error while registering server2 as producer
[11-Nov-2016_11:44:58] [CerbereNB] Netboot cleaning (/etc)
[11-Nov-2016_11:44:58] [CerbereNB] Netboot opens /etc/exports
[11-Nov-2016_11:44:58] [CerbereNB] Removing netboot tags in file /etc/exports
[11-Nov-2016_11:44:58] [CerbereDB] Database report:
[11-Nov-2016_11:44:58] [CerbereDB] | cloned | error | unknown
[11-Nov-2016_11:44:58] [CerbereDB] prod1_p0 | 0 | 0 | 0
[11-Nov-2016_11:44:58] [CerbereDB] Total | 0 | 0 | 0
[11-Nov-2016_11:44:58] [CerbereDB] no node in error
[11-Nov-2016_11:44:58] [CerbereTypes] Delete Cerbere Data
[11-Nov-2016_11:44:58] [CerbereTypes] Delete Cerbere Data - Stage 0
[11-Nov-2016_11:44:58] [CerbereTypes] Delete Cerbere Data - Stage 1
[11-Nov-2016_11:44:58] [CerbereTypes] Delete Cerbere Data - Stage 2
[11-Nov-2016_11:44:58] [CerbereTypes] Delete Cerbere Data - Stage 3
[11-Nov-2016_11:44:58] Cerbere is terminating with status 0

Abhishekc
Advisor

Re: CMU 8.0: clone fails with it's own image, CERBST_ERROR

Hello,

Thank you for the log.  From the log cmu is trying to reconfigure the eth0 interface which is not present on the golden image.

On the backup node have you configured the network ? If yes, while capturing the backup image, the nic interface file to be reconfigured is captured. Can you please send the backup log cmudolly-<nodename>-<pid>.log for the cmu_find_nic output?

You can also check this by opening the image

# /opt/cmu/bin/cmu_image_open -i  <imagename>

Please open the golden image and send theoutput of "cat etc/sysconfig/network-scripts/CMU_FILE_TO_RECONF "

Also, copy the cmucerbere log snippet from the line - "[CerbereClient] Reconfiguring " 

Eg: on my system

[root@n13 image]# /opt/cmu/bin/cmu_image_open -i rhel7_a
/opt/cmu/bin/tar --xattrs --xattrs-include=* --acls
checking that the filesystem has acl and user_xattr support
unpacking root partition
unarchiving cmu_pci0000:00_0000:00:1f.2_ata3_host2_target2:0:0_2:0:0:0-part1 into /boot

image <rhel7_a> untarred into </opt/cmu/image/rhel7_a/image_mountpoint>
after editing the image, commit changes using:
/opt/cmu/bin/cmu_image_commit -i rhel7_a

[root@n13 image]# cd /opt/cmu/image/rhel7_a/image_mountpoint

[root@n13 image_mountpoint]# cat etc/sysconfig/network-scripts/CMU_FILE_TO_RECONF 

/etc/sysconfig/network-scripts/ifcfg-enp2s2f0
[root@n13 image_mountpoint]#

In my case /etc/sysconfig/network-scripts/ifcfg-enp2s2f0 is the file to reconfigure.