- Community Home
- >
- Servers and Operating Systems
- >
- Apollo
- >
- Re: CMU v.8.0 clone fail on Apollo r2200 with XL17...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-17-2016 02:35 AM
тАО08-17-2016 02:35 AM
Hi. We have problems cloning from golden image on node1 on a Apollo r2200 LFF chassis with XL170r nodes, to the rest of the nodes in the chassis. All the nodes have same hw config with Smart Array H240 and two 300GB SAS LFF drives in Raid1 configured. The cloning fails on disk setup/provisoning. We use "/opt/cmu/bin/cmu_backup -l <logical_group> -n <node>" to bacup node1, and "/opt/cmu/bin/cmu_clone -i <imagename> -n <nodeslist>" to clone onto the other nodes
The backup form node1 is set to disk "sda", but i see when using hpssa on the other nodes in the chassis, they come up with sdb, sdc and sdd. We suspect this is because the share disks shelf in the front of the chassis. If we clone to node1 in other chassis, it works fine. Have patced CMU to latest release. Tried both UEFI and Legacy BIOS options.
Any ideas?
Could not attach log files, since only jpg,gif,png are the valid extensions.
CMU log:
node list found in backup group : node02 { 1 node }
using single-stage cloning
***
*** the clone disk selection mode is set to STRICT in cmuserver.conf
*** in STRICT mode, clone operation expects that the nodes-to-be-cloned
*** and the backup node are exactly similar in their hardware and
*** disk controller configuration, otherwise the clone operation
*** may fail to select a disk
*** this is a new setting introduced in v8.0 to avoid inadvertent data
*** loss by cloning a wrong disk
***
*** for cloning nodes with different h/w config to that of backup node,
*** set CMU_CLONE_DISK_SELECTION_MODE=FLEXIBLE in cmuserver.conf, which
*** directs the cloning engine to heuristically select the most suitable disk
*** however, there is a risk that a wrong disk is selected by the
*** FLEXIBLE mode, especially while cloning nodes with multiple disks/luns,
*** resulting in data loss
***
making node(s) reservation(s) for cloning ( id: 13590 )
cleaning /etc/dhcpd.conf
cleaning boot directory
configuring the system
copying ssh settings
sending power off to selected nodes
rebuilding network-boot image
starting cloning # 13590
cloning started on [17-Aug-2016_11:18:16]
+-------------------------------------+--------+
| 1 x PREPARING_DISK ==> ERROR | node02 |
+-------------------------------------+--------+
cloning process finished on 2016-08-17 at [17-Aug-2016_11:23:21]
[CerbereDB] Database report:
[CerbereDB] | cloned | error | unknown
[CerbereDB] ComputeNodes_p0 | 0 | 1 | 0
[CerbereDB] Total | 0 | 1 | 0
[CerbereDB] List of nodes in error: node02
[CerbereServer] Delete "/opt/cmu/ntbt/rp/x86_64/etc/rc.d/auto/cmucerbere.sh-135
[CerbereTypes] Delete Cerbere Data
[CerbereTypes] Delete Cerbere Data - Stage 0
[CerbereTypes] Delete Cerbere Data - Stage 1
[CerbereTypes] Delete Cerbere Data - Stage 2
[CerbereTypes] Delete Cerbere Data - Stage 3
Cerbere is terminating with status 0
detailed logs are in /opt/cmu/log/cmucerbere-13590.log and
/opt/cmu/log/cmucerbere-*.log
releasing node(s) reservation(s) for cloning ( id: 13590 )
logout
Solved! Go to Solution.
- Tags:
- xl170r:apollo:cmu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-17-2016 04:17 AM - edited тАО08-17-2016 04:19 AM
тАО08-17-2016 04:17 AM - edited тАО08-17-2016 04:19 AM
Re: CMU v.8.0 clone fail on Apollo r2200 with XL170r nodes
Hi,
Is it a customer cluster?
Please provide us the following logs from management node:
#cat /opt/cmu/image/<image name>/header.txt
# /opt/cmu/log/cmucerbere-node02-13590.log
please try changing the file extension to jpg etc or copy/paste entire log here.
Regards,
Pradeep
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-17-2016 04:55 AM
тАО08-17-2016 04:55 AM
Re: CMU v.8.0 clone fail on Apollo r2200 with XL170r nodes
Its customer cluster, but we run it. CMU is on support.
#cat /opt/cmu/image/XLnodesGen9/header.txt
date:00h54m06s 17-Aug-2016
ostype:CentOS Linux release 7.2.1511 (Core)
imagename:XLnodesGen9
hostname:node01
root:disk/by-path/cmu_pci0000:00_0000:00:03.0_0000:03:00.0_host0_target0:0:0_0:0:0:0-part4
rootctrlvendorid:103c:3239
rootctrlbusid:0000:03:00.0
disk:disk/by-path/cmu_pci0000:00_0000:00:03.0_0000:03:00.0_host0_target0:0:0_0:0:0:0
partition:disk/by-path/cmu_pci0000:00_0000:00:03.0_0000:03:00.0_host0_target0:0:0_0:0:0:0-part4
partition:disk/by-path/cmu_pci0000:00_0000:00:03.0_0000:03:00.0_host0_target0:0:0_0:0:0:0-part1
partition:disk/by-path/cmu_pci0000:00_0000:00:03.0_0000:03:00.0_host0_target0:0:0_0:0:0:0-part3
terminated:noerror
timespent:77sec
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-17-2016 05:02 AM
тАО08-17-2016 05:02 AM
Re: CMU v.8.0 clone fail on Apollo r2200 with XL170r nodes
renamed logfile cmucerbere-node02-13590.txt
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-17-2016 06:23 AM
тАО08-17-2016 06:23 AM
Re: CMU v.8.0 clone fail on Apollo r2200 with XL170r nodes
Hi
cmucerbere-node002-13590.log
......
+ grep -qE 'disk:.*by-path.*|root:.*by-path.*' /opt/cmu/image/XLnodesGen9/header.txt
+ rc=0
+ '[' 0 -ne 0 ']'
+ /opt/cmu/tools/cmu_wait_dev -d /dev/disk/by-path
+ echo 'error: cannot find /dev/disk/by-path, exiting...'
error: cannot find /dev/disk/by-path, exiting...
+ exit 1
From the cloning logs, we could see that by-path entries of disk are not visible during cloning which is strange.
How many times did you tried cloning on this problematic node?
Can you please try again with the same image, if it fails, login into node02 from management node , and run the below command (see whether by-path entries are visible under cmu netbooted node)
#ls -l /dev/disk/by-path
As a last try, please copy the below code in "custom code" of /opt/cmu/image/XLnodesGen9/pre_reconf.sh and retry cloning again. The below code causes some delay (120sec) & it makes sure that dev disk path entries will be loaded by udev in cmu netboot environment.
Example:
[root@mn-head1 ~]# cat /opt/cmu/image/lg_rh6u6_13_7/pre_reconf.sh
#!/bin/bash
#cmu_begin_interface
#do not change anything in this section
#add custom code after this section
CMU_PRE_RECONF_VERSION=1
#starting from cmu version 4.2 this script is dedicated to custom code
#it is running at cloning time after netboot is done and before the
#filesystems or even the partitioning is created.
# this script is invoked by cmu_pre_cloning stored on the management node
# into /opt/cmu/ntbt/rp/<arch>/opt/cmu/tools/
#cmu_end_interface
# - custom code starts here -
echo "running pre_reconf script ....loading by-path entries"
for((i=1;i<=25;i++)); do
if [ -d /dev/disk/by-path ]; then
echo "found dev/disk/by-path dir"
break;
fi;
sleep 5;
done
exit 0
Note: Please call HPE Support Center for the official CMU support to debug the issue. HPE forum is not meant for customer issues.
Pradeep
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-18-2016 02:26 AM
тАО08-18-2016 02:26 AM
Re: CMU v.8.0 clone fail on Apollo r2200 with XL170r nodes
Hi, and thank for reply. Will continue this with HPE support. The script change did not work. CMU is configured by HPE Core HPC stack using Cluster Setup Tool. The pre_reconf.sh file looks like this:
#!/bin/bash
#cmu_begin_interface
#do not change anything in this section
#add custom code after this section
CMU_PRE_RECONF_VERSION=1
#starting from cmu version 4.2 this script is dedicated to custom code
#it is running at cloning time after netboot is done and before the
#filesystems or even the partitioning is created.
# this script is invoked by cmu_pre_cloning stored on the management node
# into /opt/cmu/ntbt/rp/<arch>/opt/cmu/tools/
#cloning will fail if this script returns non-zero exit code
#cmu_end_interface
# - custom code starts here -
#Added by CST start
CN_BASE_DIR=/tmp
HEAD_NODE=skadi.ngu.no
SHARED_DIR=/share/apps
DISK_ARRAY_CONFIGURATION_FILE=/share/apps/diskarray/XLnodesGen9_disk_array_configuration
declare -r mounted_dir="$CN_BASE_DIR$SHARED_DIR"
declare -r hpssascripting=$mounted_dir/diskarray/hpssascripting
umount -f $mounted_dir >/dev/null 2>&1
grep -q $HEAD_NODE:$SHARED_DIR /etc/fstab
if [ $? -eq 0 ]; then
grep -v $HEAD_NODE:$SHARED_DIR /etc/fstab > /etc/fstab.new
mv -f /etc/fstab /etc/fstab.orig
mv -f /etc/fstab.new /etc/fstab
fi
cat >> /etc/fstab <<-CST_MOUNT
$HEAD_NODE:$SHARED_DIR $mounted_dir nfs defaults 0 0
CST_MOUNT
mkdir -p $mounted_dir
mount $mounted_dir
$hpssascripting -reset -i "$CN_BASE_DIR$DISK_ARRAY_CONFIGURATION_FILE"
#Added by CST end
exit 0
and
cat /share/apps/diskarray/XLnodesGen9_disk_array_configuration
; Date captured: Tue Aug 16 22:06:57 2016
; Version: 2.50.1.0
Action= Configure
Method= Custom
; __________________________ Controller Specifications SLOT 1 ________________________________
;
; Controller HPE Smart HBA H240, FirmwareVersion 3.56, License Keys Supported
; SerialNumber PDNNK0BRH240VH
; DriverName hpsa
; DriverVersion 3.4.10
; SSDSmartPath Supported
Controller= SLOT 1
; PowerMode= MaxPerformance
RebuildPriority= High
ExpandPriority= Medium
ParallelSurfaceScanCount= 1
SurfaceScanMode= Idle
SurfaceScanDelay= 3
Latency= Disable
DriveWriteCache= Disabled
MNPDelay= 60
IRPEnable= Disabled
DPOEnable= Disabled
ElevatorSortEnable= Enabled
QueueDepth= Automatic
PredictiveSpareActivation= Disable
; Array Specifications
Array= A
; Array Drive Type is SAS
; Array Free Space 0 GBytes
; 1I:1:1 (300 GB, SAS), 1I:1:2 (300 GB, SAS)
Drive= 1I:1:1, 1I:1:2
OnlineSpare= No
; Logical Drive Specifications
LogicalDrive= 1
RAID= 1
Size= 286070
; SizeBlocks= 585871964
Sectors= 32
StripSize= 256
Caching= Disabled
; VolumeUniqueID= 600508B1001C55A3FF6AE21FB36E5B50
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-18-2016 03:17 AM
тАО08-18-2016 03:17 AM
SolutionHi,
Please raise a issue with HPE core HPC stack team for debugging the issue.
May be you should had tried copying our script at the end of CST customization in prereconf script . see as below
#!/bin/bash
#cmu_begin_interface
#do not change anything in this section
#add custom code after this section
CMU_PRE_RECONF_VERSION=1
#starting from cmu version 4.2 this script is dedicated to custom code
#it is running at cloning time after netboot is done and before the
#filesystems or even the partitioning is created.
# this script is invoked by cmu_pre_cloning stored on the management node
# into /opt/cmu/ntbt/rp/<arch>/opt/cmu/tools/
#cloning will fail if this script returns non-zero exit code
#cmu_end_interface
# - custom code starts here -
#Added by CST start
CN_BASE_DIR=/tmp
HEAD_NODE=skadi.ngu.no
SHARED_DIR=/share/apps
DISK_ARRAY_CONFIGURATION_FILE=/share/apps/diskarray/XLnodesGen9_disk_array_configuration
declare -r mounted_dir="$CN_BASE_DIR$SHARED_DIR"
declare -r hpssascripting=$mounted_dir/diskarray/hpssascripting
umount -f $mounted_dir >/dev/null 2>&1
grep -q $HEAD_NODE:$SHARED_DIR /etc/fstab
if [ $? -eq 0 ]; then
grep -v $HEAD_NODE:$SHARED_DIR /etc/fstab > /etc/fstab.new
mv -f /etc/fstab /etc/fstab.orig
mv -f /etc/fstab.new /etc/fstab
fi
cat >> /etc/fstab <<-CST_MOUNT
$HEAD_NODE:$SHARED_DIR $mounted_dir nfs defaults 0 0
CST_MOUNT
mkdir -p $mounted_dir
mount $mounted_dir
$hpssascripting -reset -i "$CN_BASE_DIR$DISK_ARRAY_CONFIGURATION_FILE"
#Added by CST end
echo "running pre_reconf script ....loading by-path entries"
for((i=1;i<=25;i++)); do
if [ -d /dev/disk/by-path ]; then
echo "found dev/disk/by-path dir"
break;
fi;
sleep 5;
done
exit 0
Pradeep