HPE Ezmeral Software platform
1818476 Members
3717 Online
109591 Solutions
New Discussion юеВ

Ezmeral installation woes

 
SOLVED
Go to solution
CBENilsson
Occasional Contributor

Ezmeral installation woes

I'm trying to set up a new Ezmeral cluster and move from an old MapR cluster used mainly for data storage on NFS and hbase. I'm configuring it as secure and data encrypted at rest, but otherwise as few features and extras as possible. We intend to use MapR-DB behind thrift, so no real hbase installed.

The first problem that required us to read through the installation scripts was that "hostname -f" didn't return a fully qualified address. Updating /etc/hostname took care of that.

The machines have four NICs connected to different networks. One admin network, one data network for external access to the APIs, and two networks that can be used for intra node communication. All networks are in different IP ranges and with different domain suffixes. We suspected this setup was a problem for certificates, so we've configured the instalaltion to use the external data network both for access and internal communication in the web installation.

Where we are right now when test installing a two node cluster it stops with a "Failed to format" error on the node the installer isn't running on. The disksetup.0.log there however shows everything going fine with the actual formatting, but then fails with

2022-08-30 14:55:00,628 25603 ExitDiskSetup:294 ERROR Error 110, Connection timed out. SP make /dev/sdl['  File "/opt/mapr/server/disksetup", line 1636, in <module>\n    RunDiskSetup();\n', '  File "/opt/mapr/server/disksetup", line 1510, in RunDiskSetup\n    GroupDisksAndCreateSPs(force);\n', '  File "/opt/mapr/server/disksetup", line 924, in GroupDisksAndCreateSPs\n    primary = FormatSPOnDisks(diskList, cid, force);\n', '  File "/opt/mapr/server/disksetup", line 892, in FormatSPOnDisks\n    MakeSP(primary, cid, force);\n', '  File "/opt/mapr/server/disksetup", line 535, in MakeSP\n    RunCmd(cmd, msg);\n', '  File "/opt/mapr/server/disksetup", line 350, in RunCmd\n    AbortWithError(rc, msg);\n', '  File "/opt/mapr/server/disksetup", line 324, in AbortWithError\n    stack_trace = traceback.format_stack(frame)\n']

 

we'll continue to chip away on this, but the installer is doing its best to shield us from the actual failure messages. Any insights appreciated of course.

8 REPLIES 8
Dave Olker
Neighborhood Moderator

Re: Ezmeral installation woes

Are both nodes CLDB nodes or is one a CLDB and the other node a non-CLDB node?  There are different steps for formatting storage with DARE enabled between CLDB and non-CLDB nodes.  This is described here: 

https://docs.datafabric.hpe.com/70/AdvancedInstallation/InstallingMapRSoftware-config-storage-DARE-enabled.html

Might have something to do with the dare.master.key.



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
Erdincka
HPE Pro
Solution

Re: Ezmeral installation woes

First of all, 2 node installation is not a good idea, ZK needs at least 3 (or just 1 for test/dev deployments). 

The problem you are seeing is most probably a racing condition. Second node is trying to communicate with CLDB but it is still not up and running yet on the first node. So if you can try installation on single node first, then add the second node after few minutes, I expect it to go through. That's what happened for me when automating installation (and even using the Installer). 

Hope that helps.

Erdinc

Erdinc
- I work for HPE
CBENilsson
Occasional Contributor

Re: Ezmeral installation woes

Yes, installing it on one node and then expanding it to two nodes takes me past the disk formattnig error and I have what appears to be two working nodes. After adding a license key the NFS still won't start though. Only one NFS is configured (and obviously mfs runs on both machines).

2022-08-31 16:54:37,3124 INFO nfs:496614 fs/nfsd/main.cc:1091 NFS server started ... pid=496614, uid=5000
2022-08-31 16:54:37,3142 INFO nfs:496614 fs/nfsd/nfsha.cc:1087 exiting: license only allows 1 NFS/mfs server(s), currently alive=1

 

Erdincka
HPE Pro

Re: Ezmeral installation woes

It says 1 NFS service is already running. Did you check the service on the management console (MCS) or CLI? It might be already running on the other node. 

PS: I didn't have to do any troubleshooting for NFS, so not sure what might be the problem.

Erdinc
- I work for HPE
CBENilsson
Occasional Contributor

Re: Ezmeral installation woes

The MCS shows 0 running nodes, no standby nodes, 1 failed node, and 0 stopped nodes for the NFS V3 Gateway.

I've looked through the process list and there are no processes with "nfs" in them on the first node. The second node has two mapr-loopbacknfs services (and nfsiod) running, but stopping that service still doesn't allow the mapr NFS gateway to start.

If I remove the second node and go back to one node NFS works. Readding the second node with DATA (hbase REST), CLIENT (libkafka, HBase client, Apache Kafka Java Client, Async HBase) and DEFAULT (File Server, Mastgateway, Collectd. Core Services, Objectstore) breaks the NFS server again.

tdunning
HPE Pro

Re: Ezmeral installation woes

Is the old cluster under support?
I work for HPE
tdunning
HPE Pro

Re: Ezmeral installation woes

You don't, by chance, have the native Linux NFS server running, do you?

(that's what I often do)

I work for HPE
CBENilsson
Occasional Contributor

Re: Ezmeral installation woes

(Getting time to work on this again)

@tdunning 

No, the old cluster isn't under support, unfortunately.

The old cluster is exporting NFS on the same network, but that would be aggressive license enforcement to not allow any other computer to run NFS.

There are no running NFS services on the machines running the new version.