HPE Ezmeral Software platform
1825750 Members
2873 Online
109687 Solutions
New Discussion

MapR 7.0 sandbox doesn't work next day after successful installation

 
SOLVED
Go to solution
rbukarev
Advisor

MapR 7.0 sandbox doesn't work next day after successful installation

I installed a MapR 7.0 docker-based sandbox and enabled security in the cluster. I tested that it works by using a hadoop command from a client machine. I stopped and started the docker container and tested that it still works. Then I stopped the container and the whole Azure VM it was running on.

Next day I start the docker container, and from the logs I see that MapR didn't start (see below). Is there some "expiry time" on the cluster? Something that gets broken when I enable security?

 * Starting RPC port mapper daemon rpcbind
ln: failed to create symbolic link '/run/sendsigs.omit.d/rpcbind': No such file or directory
   ...fail!
Tue Nov  7 03:45:37 UTC 2023
0+0 records in
0+0 records out
0 bytes copied, 2.7301e-05 s, 0.0 kB/s
create /opt/mapr/conf/conf.old
Node setup configuration:  AWSMarketplace apiserver cldb data-access-gateway drill-bits drill-internal fileserver gateway hadoop-client hadoop-util hive nfs s3server spark zookeeper
Log can be found at:  /opt/mapr/logs/configure.log
CLDB node list: 172.17.0.2:7222
Zookeeper node list: 172.17.0.2:5181
External Zookeeper node list: 172.26.0.10:5181
ls: cannot access '/opt/mapr/AWSMarketplace': No such file or directory
Configuring hive
Configuring data-access-gateway
create root-ca conf file
create signing-ca conf file
creating cert dirs
root-ca generated
creating cert dirs
signing-ca generated
Adding root CA to trust store
Adding root signing CA to trust store
Creating 100 year certificate with subjectDN='CN=*.mapr.io' and alias maprdemo.mapr.io
Importing cluster certificate into the keystore
Importing key pair into the user keystore
Creating 100 year certificate with subjectDN='CN=admin' and alias admin
Creating 100 year certificate with subjectDN='CN=fluentd' and alias fluentd
Creating 100 year certificate with subjectDN='CN=kibana' and alias kibana
Creating 100 year certificate with subjectDN='CN=kibanaserver' and alias kibanaserver
Creating 100 year certificate with subjectDN='CN=grafana' and alias grafana
Creating 100 year certificate with subjectDN='CN=monet' and alias monet
Creating 100 year certificate with subjectDN='CN=*.mapr.io' and alias moss
ssl.server.keystore.password has been successfully created.
Provider localjceks://file/opt/mapr/conf/maprkeycreds.jceks has been updated.
ssl.server.keystore.keypassword has been successfully created.
Provider localjceks://file/opt/mapr/conf/maprkeycreds.jceks has been updated.
ssl.server.truststore.password has been successfully created.
Provider localjceks://file/opt/mapr/conf/maprtrustcreds.jceks has been updated.
ssl.client.truststore.password has been successfully created.
Provider localjceks://file/opt/mapr/conf/maprtrustcreds.jceks has been updated.
ssl.client.keystore.password has been successfully created.
Provider localjceks://file/opt/mapr/conf/maprkeycreds.jceks has been updated.
ssl.client.keystore.keypassword has been successfully created.
Provider localjceks://file/opt/mapr/conf/maprkeycreds.jceks has been updated.
==================================================================================
The new passwords have been saved to /opt/mapr/conf/store-passwords.txt.
You will need the passwords for various operations such as manipulating key and
trust stores using keytool. Copy this file to a safe place, and then delete it
from /opt/mapr/conf
==================================================================================
Configuring drill
OTNodesList:
Configuring spark
Configuring hadoop-util
Configuring hadoop-client
Configuring apiserver
/opt/mapr/disks/docker.disk added.
Run "service mapr-zookeeper start" in order to start the zookeeper node and then run "service mapr-warden start" in order to start this node
Tue Nov  7 03:49:28 UTC 2023
sed: can't read /opt/mapr/spark/spark-2.4.5/conf/spark-env.sh: No such file or directory
sed: can't read /opt/mapr/spark/spark-2.4.5/conf/spark-env.sh: No such file or directory
Changing /opt/mapr/warden.conf
JMX disabled by user request
Using config: /opt/mapr/zookeeper/zookeeper-3.5.6/conf/zoo.cfg
Starting zookeeper ... STARTED
 * Starting SMP IRQ Balancer: irqbalance
   ...done.
Starting WARDEN, logging to /opt/mapr/logs/warden.log.
.
For diagnostics look at /opt/mapr/logs/ for createsystemvolumes.log, warden.log and configured services log files
/etc/init.d/mapr-warden: line 508: /sys/fs/cgroup/systemd/system.slice/mapr-warden.service/cgroup.procs: No such file or directory
before cldb Tue Nov  7 03:49:31 UTC 2023
ERROR (10009) -  Couldn't connect to the CLDB service
ERROR (10009) -  Couldn't connect to the CLDB service
after cldb Tue Nov  7 03:49:44 UTC 2023
 drill /apps  Tue Nov  7 03:52:43 UTC 2023
Node setup configuration:  AWSMarketplace apiserver cldb data-access-gateway drill-bits drill-internal fileserver gateway hadoop-client hadoop-util hive nfs s3server spark zookeeper
Log can be found at:  /opt/mapr/logs/configure.log
ls: cannot access '/opt/mapr/AWSMarketplace': No such file or directory
Configuring hive
Configuring data-access-gateway
Configuring drill
OTNodesList:
Configuring spark
Configuring hadoop-util
Configuring hadoop-client
Configuring apiserver
 * Restarting warden daemon warden
WARDEN not running.
looking to stop mapr-core processes not started by warden
 * Starting SMP IRQ Balancer: irqbalance
 * . Already running
   ...done.
Starting WARDEN, logging to /opt/mapr/logs/warden.log.
.
For diagnostics look at /opt/mapr/logs/ for createsystemvolumes.log, warden.log and configured services log files
/etc/init.d/mapr-warden: line 508: /sys/fs/cgroup/systemd/system.slice/mapr-warden.service/cgroup.procs: No such file or directory
   ...done.
clnt_create: RPC: Unknown host
clnt_create: RPC: Unknown host
 * Starting RPC port mapper daemon rpcbind
ln: failed to create symbolic link '/run/sendsigs.omit.d/rpcbind': No such file or directory
   ...fail!
WARDEN not running.
looking to stop mapr-core processes not started by warden
Using config: /opt/mapr/zookeeper/zookeeper-3.5.6/conf/zoo.cfg
/opt/mapr/zookeeper/zookeeper-3.5.6/bin/zkServer.sh: line 386: kill: (79447) - No such process
Stopping zookeeper ... STOPPED
Node setup configuration:  AWSMarketplace apiserver cldb data-access-gateway drill-bits drill-internal drill-qs fileserver gateway hadoop-client hadoop-util hive nfs s3server spark zookeeper
Log can be found at:  /opt/mapr/logs/configure.log
CLDB node list: 172.17.0.2:7222
Zookeeper node list: 172.17.0.2:5181
External Zookeeper node list:
FIPS is not enabled. Verifying JKS, P12 and PEM key and trust stores
ls: cannot access '/opt/mapr/AWSMarketplace': No such file or directory
Configuring hive
Configuring data-access-gateway
Configuring drill
OTNodesList:
Configuring spark
Configuring hadoop-util
Configuring hadoop-client
Configuring apiserver
Run "service mapr-zookeeper start" in order to start the zookeeper node and then run "service mapr-warden start" in order to start this node
JMX disabled by user request
Using config: /opt/mapr/zookeeper/zookeeper-3.5.6/conf/zoo.cfg
Starting zookeeper ... STARTED
 * Starting SMP IRQ Balancer: irqbalance
   ...done.
Starting WARDEN, logging to /opt/mapr/logs/warden.log.
.
For diagnostics look at /opt/mapr/logs/ for createsystemvolumes.log, warden.log and configured services log files
/etc/init.d/mapr-warden: line 508: /sys/fs/cgroup/systemd/system.slice/mapr-warden.service/cgroup.procs: No such file or directory
Found 4 items
drwxrwxrwt   - mapr mapr          1 2023-11-07 03:53 /apps/kafka-streams
drwxrwxrwt   - mapr mapr          1 2023-11-07 03:53 /apps/ksql
drwxrwxrwt   - mapr mapr          0 2023-11-07 03:53 /apps/schema-registry
drwxrwxrwx   - root root          0 2023-11-07 03:52 /apps/spark
clnt_create: RPC: Unknown host
clnt_create: RPC: Unknown host
This container IP : 172.17.0.2
 * Starting RPC port mapper daemon rpcbind
ln: failed to create symbolic link '/run/sendsigs.omit.d/rpcbind': No such file or directory
   ...fail!
WARDEN not running.
looking to stop mapr-core processes not started by warden
Using config: /opt/mapr/zookeeper/zookeeper-3.5.6/conf/zoo.cfg
Stopping zookeeper ... no zookeeper to stop (could not find file /opt/mapr/zkdata/zookeeper_server.pid)
JMX disabled by user request
Using config: /opt/mapr/zookeeper/zookeeper-3.5.6/conf/zoo.cfg
/opt/mapr/zookeeper/zookeeper-3.5.6/bin/zkServer.sh: line 342: /opt/mapr/zkdata/zookeeper_server.pid: Permission denied
Starting zookeeper ... FAILED TO WRITE PID
/opt/mapr/zkdata/zookeeper_server.pid is not created - systemd will exit with no pidfile issue
 * Starting SMP IRQ Balancer: irqbalance
   ...done.
sysctl: cannot stat /proc/sys/net/ipv4/tcp_mem: No such file or directory
Starting WARDEN, logging to /opt/mapr/logs/warden.log.
.............................
cat: /opt/mapr/pid/warden.pid.tmp: No such file or directory
Error: warden can not be started. See /opt/mapr/logs/warden.log for details

 

4 REPLIES 4
Dave Olker
Neighborhood Moderator

Re: MapR 7.0 sandbox doesn't work next day after successful installation

It looks like multiple subsystems are having problems starting, especially Zookeeper and Warden.  Have you checked the logs for those specific services to see what is stopping them from starting?  Are you certain this VM is in a good state - i.e. all disk resources are present and available, etc?  Are there any errors logged in the dmesg output of the Azure VM itself that might cause the container to not come up as it had previously?  



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
rbukarev
Advisor

Re: MapR 7.0 sandbox doesn't work next day after successful installation

hi @Dave Olker yes VM is in a good state, 20G available on disk, and I believe there should be enough available RAM, since the VM has 16GB total, and this docker image is the only thing running. In fact, it's been running fins for 2 weeks, then I had to shutdown the VM for a couple days, and now MapR can't restart.

Here's warden log in full (it's relatively short because there was practically no actiity on the cluster) here: https://pastebin.com/A2AA9xeJ

Are lines like this ok, or point at some error?

in mfsinit.log:
/opt/mapr/server/initaudit.sh: line 17: /opt/mapr/pid/initaudit.sh.pid: Permission denied
INFO removing /opt/mapr/pid/initaudit.sh.pid

in warden.log:
sh: 1: cannot create /opt/mapr/pid/warden.pid.tmp: Permission denied

(I can send over the whole /opt/mapr/logs directory's contents, it's very small, as I said.

 

Dave Olker
Neighborhood Moderator
Solution

Re: MapR 7.0 sandbox doesn't work next day after successful installation

I don't think the dev sandbox was designed to gracefully recover from a docker restart or container restart.  There is an engineering ticket requesting this feature dated from 2018 that has not been worked on.  My guess is engineering sees this sandbox as just that - a temporary environment that is only designed for experimentation and development work.  

I actually ran into this same problem in my lab last week when I was experimenting with the dev sandbox for another customer, and after rebooting the VM the container was deployed on it would not come back clean. I spent an hour or so trying different things and eventually gave up and simply re-deployed the sandbox.

One thing I didn't try is re-running the sandbox setup script with the container running.  When it detects there is a running sandbox it prompts you with the following options:

MapR sandbox container is already running.
1. Kill the earlier run and start a fresh instance
2. Reconfigure the client and the running container for any network changes
Please enter choice 1 or 2 :

You could try option 2 and see if it heals your existing sandbox.  If that doesn't work, you can use option 1 to clean it and start a new one.

Regards,

Dave



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
rbukarev
Advisor

Re: MapR 7.0 sandbox doesn't work next day after successful installation

Thank you @Dave Olker , yeah if it's a known issue I'm not sure there's much sense to pursue it.

Interestinly, when I just stop and start the container, it resumes fine. It's when the whole VM is reboted the container crashes. That might be due to VM changing its external IP address, but why would it matter, as I use eth0 NIC during installation?

Using the option 2 wouldn't work for me, as I reconfigure the cluster to make it secure after installation (yeah I know later versions of MapR sandbox are secured, but I have to use specifically v.7.0).

Anyway, as there's not much to do here, let's close this. Thank you for your help!