HPE Ezmeral Software platform
1834650 Members
1849 Online
110069 Solutions
New Discussion

Re: cldb.pid exists with pid 11461 but no CLDB

 
SOLVED
Go to solution
msaidbilgehan
Advisor

cldb.pid exists with pid 11461 but no CLDB

After shutting down 2 nodes at the OS level and restarting these nodes, the cluster can not communicate with CLDB. I put a few command outputs below and the full cluster logs are in the drive link.  

> jps

16643 FsShell
25748 Jps
16390 FsShell
24621 CentralConfigCopyHelper
11533 WardenMain
2189 QuorumPeerMain
13327 CLDB

> sudo /etc/init.d/mapr-cldb status

/opt/mapr/pid/cldb.pid exists with pid 11461 but no CLDB.

 

> tail -n 25 /opt/mapr/logs/cldb.log

 

 

mapr@node0:~$ tail -n 25 /opt/mapr/logs/cldb.log
2023-09-26 08:36:05,591 INFO  ZooKeeperClient [main-EventThread]: Setting Cldb Info in ZooKeeper, external Port:7222
2023-09-26 08:36:05,598 INFO  CLDBServer [main-EventThread]: The CLDB received notification that a ZooKeeper event of type None occurred on path null
2023-09-26 08:36:05,603 INFO  CLDBServer [ZK-Connect]: Previous CLDB was not a clean shutdown waiting for 20000ms before attempting to become master
2023-09-26 08:36:05,614 INFO  ECTierManager [main]: Subscribed for EC gateway registration notifications. Current gateways...
2023-09-26 08:36:05,619 INFO  ClusterGroup [Thread-7]: isInited: isClusterGroupDbInited:false, isExternalServerDbInited:false
2023-09-26 08:36:05,619 INFO  ClusterGroup [Thread-7]: isInited: isClusterGroupDbInited:false, isExternalServerDbInited:false
2023-09-26 08:36:07,115 INFO  HttpServer [main]: Disabled algorithms are : TLS_AES_128_GCM_SHA256
2023-09-26 08:36:07,115 INFO  HttpServer [main]: Disabled protocols are : TLSv1.3
2023-09-26 08:36:07,157 INFO  CLDB [main]: CLDBState: CLDB State change : WAIT_FOR_FILESERVERS
2023-09-26 08:36:07,160 INFO  CLDBWatchdog [main]: CLDB memory threshold(heap + non heap) is set to : 8096 MB. Xmx: 4000, Configured Non-Heap: 4096
2023-09-26 08:36:07,161 INFO  CLDB [main]: [Starting RPCServer] port: 7222 num threads: 10 heap size: 4000MB IPGutsShm 32768 startup options: -Xms2400m -Xmx4000m -XX:ErrorFile=/opt/cores/hs_err_pid%p.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/cores -XX:ThreadStackSize=1024
2023-09-26 08:36:07,164 INFO  CLDB [main]: Starting 2 RPC Instances for CLDB
2023-09-26 08:36:07,169 ERROR CLDB [main]: Exception in RPC init
2023-09-26 08:36:07,169 ERROR CLDB [main]: Could not initialize RPC.. aborting
2023-09-26 08:36:22,953 INFO  CLDB [main]: Loading properties file : /opt/mapr/conf/cldb.conf
2023-09-26 08:36:23,297 INFO  CLDBMetrics [main]: Initializing CLDB Metrics with serviceName: cldbServer
2023-09-26 08:36:23,301 INFO  CLDB [main]: CLDBInit: Using hostname file /opt/mapr/hostname and hostid file /opt/mapr/hostid
2023-09-26 08:36:23,302 INFO  CLDB [main]: CLDB Properties from configuration file: num.volmirror.threads=1cldb.numthreads=10cldb.web.https.port=7443cldb.port=7222cldb.detect.dup.hostid.enabled=falsecldb.min.fileservers=1cldb.zookeeper.servers=node0.cluster:5181cldb.web.port=7221enable.replicas.invariant.check=falsecldb.jmxremote.port=7220hadoop.version=3.3.4
2023-09-26 08:36:23,302 INFO  CLDB [main]: CLDB Command line args: /opt/mapr/conf/cldb.conf
2023-09-26 08:36:23,302 INFO  CLDB [main]: CLDBInit: Initializing CLDB
2023-09-26 08:36:23,303 INFO  CLDB [main]: MapR BuildVersion: 7.4.0.0.20230728133744.GA
2023-09-26 08:36:23,303 INFO  CLDB [main]: $Id: mapr-version: 7.4.0.0.20230728133744.GA 9601b852f5443980b667e1a7910c66d39cb84c77 $
2023-09-26 08:36:23,303 INFO  CLDB [main]: CLDBInit: Start CLDBServer
2023-09-26 08:36:23,349 INFO  CLDBServer [main]: CLDBInit: HostName: node0.cluster ServerId: 3977623316996854432
2023-09-26 08:36:23,444 ERROR CLDBServer [main]: Username in ticket file /opt/mapr/conf/maprserverticket doesn't match with cluster owner's username. ticket's user: mapr cluster owner: root

mapr@node0:~$ sudo maprlogin password
[Password for user 'root' at cluster 'cluster.treo.com.tr': ]
MapR credentials of user 'root' for cluster 'cluster.treo.com.tr' are written to '/tmp/maprticket_0'

mapr@node0:~$ sudo /etc/init.d/mapr-cldb stop
CLDB not running.

mapr@node0:~$ sudo /etc/init.d/mapr-cldb start
Starting CLDB, logging to /opt/mapr/logs/cldb.log

mapr@node0:~$ sudo /etc/init.d/mapr-cldb status
/opt/mapr/pid/cldb.pid exists with pid 30042 but no CLDB.

mapr@node0:~$ tail -n 15 /opt/mapr/logs/cldb.log
2023-09-26 08:36:23,303 INFO  CLDB [main]: $Id: mapr-version: 7.4.0.0.20230728133744.GA 9601b852f5443980b667e1a7910c66d39cb84c77 $
2023-09-26 08:36:23,303 INFO  CLDB [main]: CLDBInit: Start CLDBServer
2023-09-26 08:36:23,349 INFO  CLDBServer [main]: CLDBInit: HostName: node0.cluster ServerId: 3977623316996854432
2023-09-26 08:36:23,444 ERROR CLDBServer [main]: Username in ticket file /opt/mapr/conf/maprserverticket doesn't match with cluster owner's username. ticket's user: mapr cluster owner: root
2023-09-26 08:39:28,657 INFO  CLDB [main]: Loading properties file : /opt/mapr/conf/cldb.conf
2023-09-26 08:39:29,008 INFO  CLDBMetrics [main]: Initializing CLDB Metrics with serviceName: cldbServer
2023-09-26 08:39:29,012 INFO  CLDB [main]: CLDBInit: Using hostname file /opt/mapr/hostname and hostid file /opt/mapr/hostid
2023-09-26 08:39:29,013 INFO  CLDB [main]: CLDB Properties from configuration file: num.volmirror.threads=1cldb.numthreads=10cldb.web.https.port=7443cldb.port=7222cldb.detect.dup.hostid.enabled=falsecldb.min.fileservers=1cldb.zookeeper.servers=node0.cluster:5181cldb.web.port=7221enable.replicas.invariant.check=falsecldb.jmxremote.port=7220hadoop.version=3.3.4
2023-09-26 08:39:29,013 INFO  CLDB [main]: CLDB Command line args: /opt/mapr/conf/cldb.conf
2023-09-26 08:39:29,013 INFO  CLDB [main]: CLDBInit: Initializing CLDB
2023-09-26 08:39:29,014 INFO  CLDB [main]: MapR BuildVersion: 7.4.0.0.20230728133744.GA
2023-09-26 08:39:29,014 INFO  CLDB [main]: $Id: mapr-version: 7.4.0.0.20230728133744.GA 9601b852f5443980b667e1a7910c66d39cb84c77 $
2023-09-26 08:39:29,014 INFO  CLDB [main]: CLDBInit: Start CLDBServer
2023-09-26 08:39:29,059 INFO  CLDBServer [main]: CLDBInit: HostName: node0.cluster ServerId: 3977623316996854432
2023-09-26 08:39:29,154 ERROR CLDBServer [main]: Username in ticket file /opt/mapr/conf/maprserverticket doesn't match with cluster owner's username. ticket's user: mapr cluster owner: root

 

 

 

> tail -n 25 /opt/mapr/logs/cldb.out

 

fs/common/daremgr.cc:189: HSM enabled, but DARE key not found on HSM. Check log for details
2023-09-26 09:00:05,8499 :2732 Listen: 2732: bind: error 98 port 7222java.io.IOException: Could not intialize RPC java.io.IOException: Exception in RPC init        at com.mapr.fs.cldb.CLDB.initializeRpcInstances(CLDB.java:179)
        at com.mapr.fs.cldb.CLDB.<init>(CLDB.java:95)
        at com.mapr.fs.cldb.CLDB.main(CLDB.java:411)
CLDBShm: ***shmget with key 7222, size: 70848
CLDBShm created rpc guts shared memory, size 70848

2023-09-26 09:00:06,4176 :1606 Obtained CLDB key from PKCS#11 file store
CLDBJNI: Initializing cldb jni with memory 838860800 estContainerSize:144 maxContainersInCache:5825422 mapr-version: $Id: mapr-version: 7.4.0.0.20230728133744.GA 9601b852f5443980b667e1a7910c66d39cb84c77 $
fs/common/daremgr.cc:189: HSM enabled, but DARE key not found on HSM. Check log for details
2023-09-26 09:00:08,5433 :2732 Listen: 2732: bind: error 98 port 7222java.io.IOException: Could not intialize RPC java.io.IOException: Exception in RPC init        at com.mapr.fs.cldb.CLDB.initializeRpcInstances(CLDB.java:179)
        at com.mapr.fs.cldb.CLDB.<init>(CLDB.java:95)
        at com.mapr.fs.cldb.CLDB.main(CLDB.java:411)
CLDBShm: ***shmget with key 7222, size: 70848
CLDBShm created rpc guts shared memory, size 70848

2023-09-26 09:00:09,1707 :1606 Obtained CLDB key from PKCS#11 file store
CLDBJNI: Initializing cldb jni with memory 838860800 estContainerSize:144 maxContainersInCache:5825422 mapr-version: $Id: mapr-version: 7.4.0.0.20230728133744.GA 9601b852f5443980b667e1a7910c66d39cb84c77 $
fs/common/daremgr.cc:189: HSM enabled, but DARE key not found on HSM. Check log for details
2023-09-26 09:00:11,2179 :2732 Listen: 2732: bind: error 98 port 7222java.io.IOException: Could not intialize RPC java.io.IOException: Exception in RPC init        at com.mapr.fs.cldb.CLDB.initializeRpcInstances(CLDB.java:179)
        at com.mapr.fs.cldb.CLDB.<init>(CLDB.java:95)
        at com.mapr.fs.cldb.CLDB.main(CLDB.java:411)

 

 

I've tried the command below to create new serverticket;

mapr@node0:~$ sudo /opt/mapr/server/configure.sh -N cluster.treo.com.tr -C node0.cluster -Z node0.cluster -secure

Node setup configuration:  apiserver cldb collectd drill-bits drill-internal fileserver gateway grafana hadoop-client hadoop-util hbase hbaserest hbmaster hbregionserver httpfs mastgateway nodemanager resourcemanager s3server spark spark-historyserver spark-thriftserver zookeeper
Log can be found at:  /opt/mapr/logs/configure.log
CLDB node list: node0.cluster:7222
Zookeeper node list: node0.cluster:5181
External Zookeeper node list:
FIPS is not enabled. Verifying JKS, P12 and PEM key and trust stores
ERROR: Required /opt/mapr/conf/ssl_truststore.pem not present. Please copy from first CLDB node.
Configuring nodemanager
Configuring hbase
Configuring collectd
find: paths must precede expression: `/opt/mapr/lib/slf4j-api-1.7.36.jar'
find: possible unquoted pattern after predicate `-regex'?
awk: not an option: -r
awk: not an option: -r
awk: not an option: -r
awk: not an option: -r
awk: not an option: -r
awk: not an option: -r
awk: not an option: -r
Configuring resourcemanager
Configuring hadoop-util
Configuring httpfs
Configuring spark
Configuring hadoop-client
Configuring grafana
usage: /opt/mapr/grafana/grafana-7.5.10/bin/configure.sh [-help] [-nodeCount <cnt>] [-nodePort <port>] [-grafanaPort <port>]
        [-loadDataSourceOnly] [-customSecure] [-secure] [-unsecure] [-EC <commonEcoOpts>]
        [-password <pw>] [-R] -OT "ip:port,ip1:port,"
Configuring drill
OTNodesList:
Configuring apiserver
Running restart script /opt/mapr/conf/restart/hbaserest-1.4.14.restart
Running restart script /opt/mapr/conf/restart/hbmaster-1.4.14.restart
Running restart script /opt/mapr/conf/restart/hbregionserver-1.4.14.restart

 

Cluster Logs: https://drive.google.com/open?id=199KEwyPeXtmXI1zCR4XkHV0usd7krcZp&usp=drive_fs

2 REPLIES 2
msaidbilgehan
Advisor
Solution

Re: cldb.pid exists with pid 11461 but no CLDB

With the help of @Mirza12332 ,The solution is:

  1. Stop all MapR processes on the node by stopping mapr-warden, kill the remaining stuck/unresponsive processes, if necessary.

    1. cd /opt/mapr/initscripts

    2. stop warden and cldb services

  2. Clean up all pid files:

    1. sudo rm-f /opt/mapr/pid

  3. Then start mapr-warden.

Keep in mind that the only initscripts MapR supports are mapr-warden and mapr-zookeeper. The others are wrappers for warden.

Sunitha_Mod
Honored Contributor

Re: cldb.pid exists with pid 11461 but no CLDB

Hello @msaidbilgehan,

That's awesome!

We are extremely glad to know that you were able to find the solution and we appreciate you for keeping us updated.