Re: DataFabric CLDB still not started

SergioMarquez · ‎08-22-2023

Hi, I've started warden and zookeeper on my cluster configuration, both are up and running but CLDB seems to be executing (via JPS) but when try to check node (maprcli node cldbmaster) it returns "ERROR (10009) Couldn't connect to the CLDB service", additionally verified ,cldb status (systemctl status mapr-cldb) and appears as "inactive (dead)"

Any ideas? Thanks again

Dave Olker · ‎08-23-2023

Is this a single-node cluster, where all services are running on this node? Is this a secure cluster, and if so have you generated a valid security ticket via maprlogin? Even if the CLDB is up, you'd need a valid ticket to talk to a secure cluster. If you suspect the CLDB is not coming up you can start by checking the log files /opt/mapr/logs/cldb.log and /opt/mapr/logs/cldb.out to see if they give you any clues as to why the CLDB is not starting. Typically I start looking at the end of the logs, as that is where you'd usually find any hard error message stopping the CLDB from running.

I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

SergioMarquez · ‎08-23-2023

Thank you Dave,

Once again you are right, I've succesfully generated a new ticket for each node in cluster, the finding I see on cldb.log is a set of communication reject from node master, "Rejecting RPC xxx from yyy, sequence num"...

Thank you

JulianDA · ‎08-25-2023

Hello,

any updates? I have the same Issue. My Cluster contains of 4 Nodes.

Regards

SergioMarquez · ‎08-25-2023

Hi unfortunately not yet,

The only finding I have is in cldb.log which reflects a behaviour of CLDB sutting down because it is not capable to become "Master" awaiting for local KV Store.

So apparently KV Store is related with root cause, but at this time Im not finding out some documentation of this point.

ldarby · ‎08-25-2023

Hi @JulianDA and @SergioMarquez,

Just a guess but try running this command:

maprcli volume modify -name mapr.cldb.internal -minreplication 1

The issue could be that it's stuck in a state where it won't start up because there aren't enough copies of CID:1 available (default minimum is 3), and there aren't enough copies of CID:1 available because it won't start up. That command sets the minimum required to 1 which should allow it to start up. Also if the issue is that no CLDBs are running at all preventing that command running, you would need to restart warden (systemctl restart mapr-warden), this will restart CLDB (which will shutdown again), but keep running that command continually to try to catch it while it's alive briefly.

If you have a support contract then please raise a case for further investigation.

Regards,
Laurence Darby

I'm an HPE employee.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

SergioMarquez · ‎08-28-2023

Thank you Laurence, but I've tried but cluster is down and is not possible to modify replication

hiteshingole · ‎08-30-2023

Hello,

awaiting for local KV Store means that the cldb is waiting for the local MFS to come up, check if the MFS is up and the storage pool is/are online using mrconfig command(https://docs.ezmeral.hpe.com/datafabric-customer-managed/70/ReferenceGuide/mrconfig-sp-list.html).

I'm an HPE employee.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

SergioMN · ‎09-22-2023

@hiteshingole @ldarby Thanks, storage pool was unable to init, but in other node I observed a similar behaviour with an available pool.

hiteshingole · ‎09-25-2023

>> I observed similar behavior with an available pool.

Ideally the the CLDB always waits for local MFS to come up and tehrer is a timeout

if the MFS is not available due to any reason between this timeout the cldb will crash and try again

I'm an HPE employee.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: DataFabric CLDB still not started

DataFabric CLDB still not started