HPE Ezmeral Software platform
1833758 Members
2182 Online
110063 Solutions
New Discussion

cldb.pid exists with pid 11461 but no CLDB

 
SOLVED
Go to solution
msaidbilgehan
Advisor

cldb.pid exists with pid 11461 but no CLDB

After shutting down 2 nodes at the OS level and restarting these nodes, the cluster can not communicate with CLDB. I put a few command outputs below and the full cluster logs are in the drive link.  

> jps

16643 FsShell
25748 Jps
16390 FsShell
24621 CentralConfigCopyHelper
11533 WardenMain
2189 QuorumPeerMain
13327 CLDB

> sudo /etc/init.d/mapr-cldb status

/opt/mapr/pid/cldb.pid exists with pid 11461 but no CLDB.

 

> tail -n 25 /opt/mapr/logs/cldb.log

 

 

mapr@node0:~$ tail -n 25 /opt/mapr/logs/cldb.log
2023-09-26 08:36:05,591 INFO  ZooKeeperClient [main-EventThread]: Setting Cldb Info in ZooKeeper, external Port:7222
2023-09-26 08:36:05,598 INFO  CLDBServer [main-EventThread]: The CLDB received notification that a ZooKeeper event of type None occurred on path null
2023-09-26 08:36:05,603 INFO  CLDBServer [ZK-Connect]: Previous CLDB was not a clean shutdown waiting for 20000ms before attempting to become master
2023-09-26 08:36:05,614 INFO  ECTierManager [main]: Subscribed for EC gateway registration notifications. Current gateways...
2023-09-26 08:36:05,619 INFO  ClusterGroup [Thread-7]: isInited: isClusterGroupDbInited:false, isExternalServerDbInited:false
2023-09-26 08:36:05,619 INFO  ClusterGroup [Thread-7]: isInited: isClusterGroupDbInited:false, isExternalServerDbInited:false
2023-09-26 08:36:07,115 INFO  HttpServer [main]: Disabled algorithms are : TLS_AES_128_GCM_SHA256
2023-09-26 08:36:07,115 INFO  HttpServer [main]: Disabled protocols are : TLSv1.3
2023-09-26 08:36:07,157 INFO  CLDB [main]: CLDBState: CLDB State change : WAIT_FOR_FILESERVERS
2023-09-26 08:36:07,160 INFO  CLDBWatchdog [main]: CLDB memory threshold(heap + non heap) is set to : 8096 MB. Xmx: 4000, Configured Non-Heap: 4096
2023-09-26 08:36:07,161 INFO  CLDB [main]: [Starting RPCServer] port: 7222 num threads: 10 heap size: 4000MB IPGutsShm 32768 startup options: -Xms2400m -Xmx4000m -XX:ErrorFile=/opt/cores/hs_err_pid%p.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/cores -XX:ThreadStackSize=1024
2023-09-26 08:36:07,164 INFO  CLDB [main]: Starting 2 RPC Instances for CLDB
2023-09-26 08:36:07,169 ERROR CLDB [main]: Exception in RPC init
2023-09-26 08:36:07,169 ERROR CLDB [main]: Could not initialize RPC.. aborting
2023-09-26 08:36:22,953 INFO  CLDB [main]: Loading properties file : /opt/mapr/conf/cldb.conf
2023-09-26 08:36:23,297 INFO  CLDBMetrics [main]: Initializing CLDB Metrics with serviceName: cldbServer
2023-09-26 08:36:23,301 INFO  CLDB [main]: CLDBInit: Using hostname file /opt/mapr/hostname and hostid file /opt/mapr/hostid
2023-09-26 08:36:23,302 INFO  CLDB [main]: CLDB Properties from configuration file: num.volmirror.threads=1cldb.numthreads=10cldb.web.https.port=7443cldb.port=7222cldb.detect.dup.hostid.enabled=falsecldb.min.fileservers=1cldb.zookeeper.servers=node0.cluster:5181cldb.web.port=7221enable.replicas.invariant.check=falsecldb.jmxremote.port=7220hadoop.version=3.3.4
2023-09-26 08:36:23,302 INFO  CLDB [main]: CLDB Command line args: /opt/mapr/conf/cldb.conf
2023-09-26 08:36:23,302 INFO  CLDB [main]: CLDBInit: Initializing CLDB
2023-09-26 08:36:23,303 INFO  CLDB [main]: MapR BuildVersion: 7.4.0.0.20230728133744.GA
2023-09-26 08:36:23,303 INFO  CLDB [main]: $Id: mapr-version: 7.4.0.0.20230728133744.GA 9601b852f5443980b667e1a7910c66d39cb84c77 $
2023-09-26 08:36:23,303 INFO  CLDB [main]: CLDBInit: Start CLDBServer
2023-09-26 08:36:23,349 INFO  CLDBServer [main]: CLDBInit: HostName: node0.cluster ServerId: 3977623316996854432
2023-09-26 08:36:23,444 ERROR CLDBServer [main]: Username in ticket file /opt/mapr/conf/maprserverticket doesn't match with cluster owner's username. ticket's user: mapr cluster owner: root

mapr@node0:~$ sudo maprlogin password
[Password for user 'root' at cluster 'cluster.treo.com.tr': ]
MapR credentials of user 'root' for cluster 'cluster.treo.com.tr' are written to '/tmp/maprticket_0'

mapr@node0:~$ sudo /etc/init.d/mapr-cldb stop
CLDB not running.

mapr@node0:~$ sudo /etc/init.d/mapr-cldb start
Starting CLDB, logging to /opt/mapr/logs/cldb.log

mapr@node0:~$ sudo /etc/init.d/mapr-cldb status
/opt/mapr/pid/cldb.pid exists with pid 30042 but no CLDB.

mapr@node0:~$ tail -n 15 /opt/mapr/logs/cldb.log
2023-09-26 08:36:23,303 INFO  CLDB [main]: $Id: mapr-version: 7.4.0.0.20230728133744.GA 9601b852f5443980b667e1a7910c66d39cb84c77 $
2023-09-26 08:36:23,303 INFO  CLDB [main]: CLDBInit: Start CLDBServer
2023-09-26 08:36:23,349 INFO  CLDBServer [main]: CLDBInit: HostName: node0.cluster ServerId: 3977623316996854432
2023-09-26 08:36:23,444 ERROR CLDBServer [main]: Username in ticket file /opt/mapr/conf/maprserverticket doesn't match with cluster owner's username. ticket's user: mapr cluster owner: root
2023-09-26 08:39:28,657 INFO  CLDB [main]: Loading properties file : /opt/mapr/conf/cldb.conf
2023-09-26 08:39:29,008 INFO  CLDBMetrics [main]: Initializing CLDB Metrics with serviceName: cldbServer
2023-09-26 08:39:29,012 INFO  CLDB [main]: CLDBInit: Using hostname file /opt/mapr/hostname and hostid file /opt/mapr/hostid
2023-09-26 08:39:29,013 INFO  CLDB [main]: CLDB Properties from configuration file: num.volmirror.threads=1cldb.numthreads=10cldb.web.https.port=7443cldb.port=7222cldb.detect.dup.hostid.enabled=falsecldb.min.fileservers=1cldb.zookeeper.servers=node0.cluster:5181cldb.web.port=7221enable.replicas.invariant.check=falsecldb.jmxremote.port=7220hadoop.version=3.3.4
2023-09-26 08:39:29,013 INFO  CLDB [main]: CLDB Command line args: /opt/mapr/conf/cldb.conf
2023-09-26 08:39:29,013 INFO  CLDB [main]: CLDBInit: Initializing CLDB
2023-09-26 08:39:29,014 INFO  CLDB [main]: MapR BuildVersion: 7.4.0.0.20230728133744.GA
2023-09-26 08:39:29,014 INFO  CLDB [main]: $Id: mapr-version: 7.4.0.0.20230728133744.GA 9601b852f5443980b667e1a7910c66d39cb84c77 $
2023-09-26 08:39:29,014 INFO  CLDB [main]: CLDBInit: Start CLDBServer
2023-09-26 08:39:29,059 INFO  CLDBServer [main]: CLDBInit: HostName: node0.cluster ServerId: 3977623316996854432
2023-09-26 08:39:29,154 ERROR CLDBServer [main]: Username in ticket file /opt/mapr/conf/maprserverticket doesn't match with cluster owner's username. ticket's user: mapr cluster owner: root

 

 

 

> tail -n 25 /opt/mapr/logs/cldb.out

 

fs/common/daremgr.cc:189: HSM enabled, but DARE key not found on HSM. Check log for details
2023-09-26 09:00:05,8499 :2732 Listen: 2732: bind: error 98 port 7222java.io.IOException: Could not intialize RPC java.io.IOException: Exception in RPC init        at com.mapr.fs.cldb.CLDB.initializeRpcInstances(CLDB.java:179)
        at com.mapr.fs.cldb.CLDB.<init>(CLDB.java:95)
        at com.mapr.fs.cldb.CLDB.main(CLDB.java:411)
CLDBShm: ***shmget with key 7222, size: 70848
CLDBShm created rpc guts shared memory, size 70848

2023-09-26 09:00:06,4176 :1606 Obtained CLDB key from PKCS#11 file store
CLDBJNI: Initializing cldb jni with memory 838860800 estContainerSize:144 maxContainersInCache:5825422 mapr-version: $Id: mapr-version: 7.4.0.0.20230728133744.GA 9601b852f5443980b667e1a7910c66d39cb84c77 $
fs/common/daremgr.cc:189: HSM enabled, but DARE key not found on HSM. Check log for details
2023-09-26 09:00:08,5433 :2732 Listen: 2732: bind: error 98 port 7222java.io.IOException: Could not intialize RPC java.io.IOException: Exception in RPC init        at com.mapr.fs.cldb.CLDB.initializeRpcInstances(CLDB.java:179)
        at com.mapr.fs.cldb.CLDB.<init>(CLDB.java:95)
        at com.mapr.fs.cldb.CLDB.main(CLDB.java:411)
CLDBShm: ***shmget with key 7222, size: 70848
CLDBShm created rpc guts shared memory, size 70848

2023-09-26 09:00:09,1707 :1606 Obtained CLDB key from PKCS#11 file store
CLDBJNI: Initializing cldb jni with memory 838860800 estContainerSize:144 maxContainersInCache:5825422 mapr-version: $Id: mapr-version: 7.4.0.0.20230728133744.GA 9601b852f5443980b667e1a7910c66d39cb84c77 $
fs/common/daremgr.cc:189: HSM enabled, but DARE key not found on HSM. Check log for details
2023-09-26 09:00:11,2179 :2732 Listen: 2732: bind: error 98 port 7222java.io.IOException: Could not intialize RPC java.io.IOException: Exception in RPC init        at com.mapr.fs.cldb.CLDB.initializeRpcInstances(CLDB.java:179)
        at com.mapr.fs.cldb.CLDB.<init>(CLDB.java:95)
        at com.mapr.fs.cldb.CLDB.main(CLDB.java:411)

 

 

I've tried the command below to create new serverticket;

mapr@node0:~$ sudo /opt/mapr/server/configure.sh -N cluster.treo.com.tr -C node0.cluster -Z node0.cluster -secure

Node setup configuration:  apiserver cldb collectd drill-bits drill-internal fileserver gateway grafana hadoop-client hadoop-util hbase hbaserest hbmaster hbregionserver httpfs mastgateway nodemanager resourcemanager s3server spark spark-historyserver spark-thriftserver zookeeper
Log can be found at:  /opt/mapr/logs/configure.log
CLDB node list: node0.cluster:7222
Zookeeper node list: node0.cluster:5181
External Zookeeper node list:
FIPS is not enabled. Verifying JKS, P12 and PEM key and trust stores
ERROR: Required /opt/mapr/conf/ssl_truststore.pem not present. Please copy from first CLDB node.
Configuring nodemanager
Configuring hbase
Configuring collectd
find: paths must precede expression: `/opt/mapr/lib/slf4j-api-1.7.36.jar'
find: possible unquoted pattern after predicate `-regex'?
awk: not an option: -r
awk: not an option: -r
awk: not an option: -r
awk: not an option: -r
awk: not an option: -r
awk: not an option: -r
awk: not an option: -r
Configuring resourcemanager
Configuring hadoop-util
Configuring httpfs
Configuring spark
Configuring hadoop-client
Configuring grafana
usage: /opt/mapr/grafana/grafana-7.5.10/bin/configure.sh [-help] [-nodeCount <cnt>] [-nodePort <port>] [-grafanaPort <port>]
        [-loadDataSourceOnly] [-customSecure] [-secure] [-unsecure] [-EC <commonEcoOpts>]
        [-password <pw>] [-R] -OT "ip:port,ip1:port,"
Configuring drill
OTNodesList:
Configuring apiserver
Running restart script /opt/mapr/conf/restart/hbaserest-1.4.14.restart
Running restart script /opt/mapr/conf/restart/hbmaster-1.4.14.restart
Running restart script /opt/mapr/conf/restart/hbregionserver-1.4.14.restart

 

Cluster Logs: https://drive.google.com/open?id=199KEwyPeXtmXI1zCR4XkHV0usd7krcZp&usp=drive_fs

2 REPLIES 2
msaidbilgehan
Advisor
Solution

Re: cldb.pid exists with pid 11461 but no CLDB

With the help of @Mirza12332 ,The solution is:

  1. Stop all MapR processes on the node by stopping mapr-warden, kill the remaining stuck/unresponsive processes, if necessary.

    1. cd /opt/mapr/initscripts

    2. stop warden and cldb services

  2. Clean up all pid files:

    1. sudo rm-f /opt/mapr/pid

  3. Then start mapr-warden.

Keep in mind that the only initscripts MapR supports are mapr-warden and mapr-zookeeper. The others are wrappers for warden.

Sunitha_Mod
Honored Contributor

Re: cldb.pid exists with pid 11461 but no CLDB

Hello @msaidbilgehan,

That's awesome!

We are extremely glad to know that you were able to find the solution and we appreciate you for keeping us updated.