HPE Ezmeral Software platform
1842192 Members
2208 Online
110188 Solutions
New Discussion

Re: How to safely remove disk from MaprFS?

 
filip_novak
Occasional Advisor

How to safely remove disk from MaprFS?

I want to remove a disk from MaprFS to replace it with a bigger one. How am I supposed to do it without data loss if I have some volumes with 1 replication factor?
My target disk is the only one in the Storage Pool, so I was thinking about managing the Storage Pool by mrconfig sp command.
There are plenty of these:
mrconfig sp offline - to run fsck on sp
mrconfig sp unload - In the docs, it's not mentioned what it does to sp
mrconfig sp flush - not listed in documentation at all

So the question is - how to tell MFS that the storage pool is decommissioned and all replicas should be distributed among other SPs?

So far, I'm thinking about changing all volumes that have a replication factor of 1 to 2 (or 3 to be sure), wait till replication is done, and then just do mrconfig sp offline /dev/sde && mrconfig disk remove /dev/sde

Also, there is some reasonable info in the docs about mrconfig disk remove - It suggests that I can take the Storage Pool offline, then make /opt/mapr/server/fsck -r to repair lost data.

P.S. I have 1 cldb node cluster, the target disk is located in the data node, so there is no risk of losing container 1

4 REPLIES 4
tdunning
HPE Pro

Re: How to safely remove disk from MaprFS?

I am sure you are aware of this, but running with replication=1 is not a great idea. Any loss of a disk containing such data will cause data loss (which is the crux of your question ... you want to essentially cause the disk to fail by removing it). All administration tasks are much more difficult as well as you are demonstrating. Having a single CLDB node is also problematic. In my own tiny, hobby-grade clusters, I put the CLDB on all three nodes to massively simplify life. Even if you don't have the right license, you can still get container #1 to replicate and you can survive the loss of a node with a delay to manually change over which node handles CLDB tasks.

So, with that out of the way, your general outline of causing the data on that disk to be moved onto other storage devices and then decommissioning the small drive is fairly sound. I can't comment on the detailed commands you are using, but I would suggest simply adding the new drive to the storage pool before changing the replication. If you do that, I think you can simply cause the system to evacuate your deprecated disk and be done with the problem in fewer steps

 

I work for HPE
support_s
System Recommended

Query: How to safely remove disk from MaprFS?

System recommended content:

1. HPE Ezmeral Data Fabric – Customer-Managed 7.7.0 Documentation | Recovering from Disk Failure

2. HPE Ezmeral Data Fabric – Customer-Managed 7.8.0 Documentation | Recovering from Disk Failure

 

Please click on "Thumbs Up/Kudo" icon to give a "Kudo".

 

Thank you for being a HPE valuable community member.


Accept or Kudo

ParvYadav
HPE Pro

Re: How to safely remove disk from MaprFS?

If preserving data is critical, I strongly recommend increasing the replication factor (RF) first. This will help safeguard against data loss in worst-case scenarios. Additionally, consider adding at least one more CLDB instance on a different node. If the node hosting CLDB crashes, your entire cluster could go down, so having redundancy is essential.

Regarding draining data from the storage pool (SP) you plan to remove, the best practice is to increase the RF to 3 and then take one SP offline. This ensures that data is replicated properly, reducing the risk of data loss. If there are few container with RF 1 on the SP which need to be decomissioned, we can consider moving those containers manually but that is not the recommended approach.

The fsck -r command is primarily useful when you want to repair a bad disk within the current SP without recreating the entire SP. If the SP is offline due to a bad disk or minor inconsistencies, running fsck -r may help recover it.

I'm an HPE employee.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
filip_novak
Occasional Advisor

Re: How to safely remove disk from MaprFS?

Thanks for your answer! 
I have set up numreplicas to 6 for mapr.cldb.internal volume. That 1 replicated volume is basically a trash can, so it's not a big deal to lose some data from it, but I increased the replication factor for now. Will wait for replication and just take the SP offline, wait for replication alarms to clear, and then delete the disk.