HPE Ezmeral Software platform
1833188 Members
2891 Online
110051 Solutions
New Discussion

Re: DataFabric CLDB still not started

 
SergioMarquez
Occasional Advisor

DataFabric CLDB still not started

Hi, I've started warden and zookeeper on my cluster configuration, both are up and running but CLDB seems to be executing (via JPS) but when try to check node (maprcli node cldbmaster) it returns "ERROR (10009) Couldn't connect to the CLDB service", additionally verified ,cldb status (systemctl status mapr-cldb) and appears as "inactive (dead)"

 

Any ideas? Thanks again

9 REPLIES 9
Dave Olker
Neighborhood Moderator

Re: DataFabric CLDB still not started

Is this a single-node cluster, where all services are running on this node?  Is this a secure cluster, and if so have you generated a valid security ticket via maprlogin?  Even if the CLDB is up, you'd need a valid ticket to talk to a secure cluster.  If you suspect the CLDB is not coming up you can start by checking the log files /opt/mapr/logs/cldb.log and /opt/mapr/logs/cldb.out to see if they give you any clues as to why the CLDB is not starting.  Typically I start looking at the end of the logs, as that is where you'd usually find any hard error message stopping the CLDB from running.



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
SergioMarquez
Occasional Advisor

Re: DataFabric CLDB still not started

Thank you Dave, 

Once again you are right, I've succesfully generated a new ticket for each node in cluster, the finding I see on cldb.log is a set of communication reject from node master, "Rejecting RPC xxx from yyy, sequence num"...

Thank you

JulianDA
New Member

Re: DataFabric CLDB still not started

Hello, 

 

any updates? I have the same Issue. My Cluster contains of 4 Nodes. 

Regards

SergioMarquez
Occasional Advisor

Re: DataFabric CLDB still not started

Hi unfortunately not yet,

The only finding I have is in cldb.log which reflects a behaviour of CLDB sutting down because it is not capable to become "Master" awaiting for local KV Store.

So apparently KV Store is related with root cause, but at this time Im not finding out some documentation of this point.

 

ldarby
HPE Pro

Re: DataFabric CLDB still not started

Hi @JulianDA and @SergioMarquez

Just a guess but try running this command:

maprcli volume modify -name mapr.cldb.internal -minreplication 1

The issue could be that it's stuck in a state where it won't start up because there aren't enough copies of CID:1 available (default minimum is 3), and there aren't enough copies of CID:1 available because it won't start up.  That command sets the minimum required to 1 which should allow it to start up.  Also if the issue is that no CLDBs are running at all preventing that command running, you would need to restart warden (systemctl restart mapr-warden), this will restart CLDB (which will shutdown again), but keep running that command continually to try to catch it while it's alive briefly. 

If you have a support contract then please raise a case for further investigation.

Regards,
Laurence Darby

 

I'm an HPE employee.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
SergioMarquez
Occasional Advisor

Re: DataFabric CLDB still not started

Thank you Laurence, but I've tried but cluster is down and is not possible to modify replication

hiteshingole
HPE Pro

Re: DataFabric CLDB still not started

Hello,

awaiting for local KV Store means that the cldb is waiting for the local MFS to come up, check if the MFS is up and the storage pool is/are online using mrconfig command(https://docs.ezmeral.hpe.com/datafabric-customer-managed/70/ReferenceGuide/mrconfig-sp-list.html).

I'm an HPE employee.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
SergioMN
Visitor

Re: DataFabric CLDB still not started

@hiteshingole  @ldarby  Thanks, storage pool was unable to init, but in other node I observed a similar behaviour with an available pool. 

hiteshingole
HPE Pro

Re: DataFabric CLDB still not started

>> I observed similar behavior with an available pool. 

Ideally the the CLDB always waits for local MFS to come up and tehrer is a timeout

if the MFS is not available due to any reason between this timeout the cldb will crash and try again 

 

I'm an HPE employee.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo