HPE Ezmeral Software platform
1821964 Members
2854 Online
109638 Solutions
New Discussion

Multiple Zookeeper and CLDB Installation Fails at Data Fabric

 
SOLVED
Go to solution
msaidbilgehan
Advisor

Multiple Zookeeper and CLDB Installation Fails at Data Fabric

We have been collecting logs for every attempt at installing the Data Fabric Customer Edition. I have attached a few of them for your review.
 
After you have examined the logs and screenshots, could we discuss the issues we are encountering and possible solutions?
 
Here are the resources;
 

 

  • OS Features 
  • Ubuntu 18.04
  • No Fips Installed
  • Java 11 installed
  • No Firewall Activated
  • VMWare Local VLAN
  • No Public Hostname (IPs and hostnames are configured with hosts file)

 

  • Storage (900 Gb);
  • / - 275 Gb
  • /var - 175 Gb
  • /srv - 150 Gb
  • /opt - 150 Gb
  • swap - 30 Gb
  • No-Formatted - 120 Gb

 

  • Resource Requirements
  • Memory - 65 Gb - 100 Gb
  • CPU 20 Core
  • We have tried to install Mapr Version 7.3, 7.2, 7.0
  • Tried Node Number while installing or expanding the cluster: 3, 5, 6
 
Here are some key points regarding the issues:
 
  • Multiple Zookeeper and CLDB node installation attempts have failed.
  • The zookeeper service was down without no output during installation, and the installer reported a timeout.
  • The Warden service was not active during installation, and the logs of the warden indicate that the CLDBs are not responding. RPC responses are also missing from the CLDB and Zookeeper hosts.
  • While installing the mapr-setup.sh, the installer script displayed an error, even though the service appears to be functioning properly.
  • We also tried using the container mapr-installer as recommended in the HPE documentation, but it seems like an older version of the installer was used in the container. We can create our own docker image for mapr-installer and other services but it may fail if there is a mismatch in system requirements or any other situation. 
  • The Extend Cluster operation fails when adding new nodes to the cluster; this action also disrupts the nodes that have already been established.
  • After setting up the cluster, some services shut down and do not restart, even when manually forced to start.
  • The list goes on...
Perhaps we should initially focus on the problems with the installation of multiple Zookeeper and CLDB nodes, and the issues with these services crashing. We can also proceed in accordance with your advice. We would also like to set up high availability (HA) for our installation. Therefore, multiple instances of ZK and CLDB are necessary at this stage.
 
4 REPLIES 4
okalinin
HPE Pro
Solution

Re: Multiple Zookeeper and CLDB Installation Fails at Data Fabric

Hello,

I noticed logs that seem to indicate local hostname resolution problem. While remote nodes seem to resolve correctly, local hostname resolves to 127.0.1.1.

node3:
2023-07-26 04:08:51,698 [myid:0] - INFO [QuorumPeerListener:QuorumCnxManager$Listener@913] - My election bind port: node_3.cluster/127.0.1.1:3888

node4:
2023-07-26 04:08:51,512 [myid:1] - INFO [QuorumPeerListener:QuorumCnxManager$Listener@913] - My election bind port: node_4.cluster/127.0.1.1:3888

node5:
2023-07-26 04:08:51,970 [myid:2] - INFO [QuorumPeerListener:QuorumCnxManager$Listener@913] - My election bind port: node_5.cluster/127.0.1.1:3888

Above is very likely the reason for errors like:
2023-07-26 04:08:53,046 [myid:1] - WARN [WorkerSender[myid=1]:QuorumCnxManager@681] - Cannot open channel to 0 at election address node_3.cluster/10.34.2.156:3888 java.net.ConnectException: Connection refused

Please check and make sure hostnames are resolving to correct IP addresses that can be used for internal communication and retry again.

Best Regards.



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
support_s
System Recommended

Query: Multiple Zookeeper and CLDB Installation Fails at Data Fabric

System recommended content:

1. mkdir fails with EBUSY on HPE Ezmeral Data Fabric FUSE Mount

2. HPE Ezmeral Runtime Enterprise 5.5 Documentation | Expanding a Data Fabric Cluster

 

Please click on "Thumbs Up/Kudo" icon to give a "Kudo".

 

Thank you for being a HPE valuable community member.


Accept or Kudo

msaidbilgehan
Advisor

Re: Multiple Zookeeper and CLDB Installation Fails at Data Fabric

Thanks for your support, we fixed this wrong configuration. For now, it seems at least working or not down because of the wrong DNS configuration at hosts file. 

Sunitha_Mod
Moderator

Re: Multiple Zookeeper and CLDB Installation Fails at Data Fabric

Hello @msaidbilgehan,

We are glad to know the problem has been resolved. 



Thanks,
Sunitha G
I'm an HPE employee.
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo