Re: Allocated snapshot space exceeded the configured limit

AllanClark · ‎11-26-2019

Just started replication between local & remote site - both MSA2052s

The remote site will be used 80% for DR purposes. Therefore 80% of the space on the remote MSA2052 will be used for repliction volumes.

The volumes are set to 4Tb at the source MSA2052. I have been running the initial sync at the source end but now received

Allocated snapshot space exceeded the configured limit error messages - However replication still seems to be running. At the remote site the settings are.

We have 1 pool with 21.5Tb in size - Free 16.5Tb.

Pool overcommit is set to True

low threshold - 50% Medium Threshold - 75% High Threshold - 99%

allocated pages - 1154195

snapshot pages - 576044

available pages - 3990718

Do I need to modify settings?

Given that I will need to enable replication on another 2 * 4Tb volumes at the source site soon.

Shawn_K · ‎11-26-2019

Hello Alan,

The information you have provided is not really detailed enough to determine where your problem lies. But let's start by looking at some basic numbers.

Your destination site (remote site) has a 21.5Tb Pool with 16.5Tb as free. If that is all replication data it indicates that 5Tb has replicated from a 4Tb volume on the source. It may be you have more than one 4Tb volume from the source replicating which would explain this difference. Without more data from both systems and their volume usage stats it is hard to provide more information.

The other explination could be you have multiple snapshots you are attempting to replicate. You need to remember that for every volume you are replicating there are internal snapshots also being replicated to maintain consistency. I suggest you review the SMU Guide starting around page 117. https://support.hpe.com/hpsc/doc/public/display?docId=a00017707en_us

Regardless, on the destination array you have the Pool configured to overcommit (Pool overcommit is set to True). Please review what having this feature provides you and understand some of the boundries of having overcommit enabled. Using the SMU Guide you can find this information starting on page 24.

Your error message might have occurred as you passed the Low Threshold. This is normal and if you review the system space use and have enough space for all the volumes + internal snapshots you should be good.

Cheers,
Shawn

I work for Hewlett Packard Enterprise. The comments in this post are my own and do not represent an official reply from HPE. No warranty or guarantees of any kind are expressed in my reply.

I work for HPE

AllanClark · ‎11-27-2019

Shawn,

Thanks for your reply. This was exactly the document that I was looking for all the the other guides I had found were the advanced ones.

I I am reading this correctly then if we are replicating a 4Tb volume then we need at 3 * size of the primary volume i.e 12Tb of free space for volume plus internal snapshots? Is this at both primary & scondary sites.

Th error message was for the High threshold being breached.

Many Thanks

Allan

AllanClark · ‎11-28-2019

The initial replication of 4Tb LUN completed successfully yesterday.

I ran a scheduled subsequent replication to cathc up on all the changes.

After 20 mins it logged a warning message

Allocated snapshot space exceeded the high threshold of 99%. (pool: A, SN: 00c0ff3cf170000017c7a15c01000000) (snapshot space used: 509347 of 514491 pages, or 99% of the snapshot space)

EVENT ID:#A4385

EVENT CODE:571

EVENT SEVERITY:Warning

EVENT TIME:2019-11-27 18:21:07

Then seconds later

Allocated snapshot space exceeded the configured limit. (pool: A, SN: 00c0ff3cf170000017c7a15c01000000) (snapshot space used: 514491 of 514491 pages, or 100% of the snapshot space)

EVENT ID:#A4386

EVENT CODE:571

EVENT SEVERITY:Error

EVENT TIME:2019-11-27 18:21:07

then 2 secs later it logged messages event code 572 snapshot space below threshold.

Should I be worried about this?

Many Thanks

Allan

AllanClark · ‎11-28-2019

It is the remote system that is showing these errors. From the CLI the show snapshot-space gives us.

login as: manage
Using keyboard-interactive authentication.
Password:

HPE MSA Storage MSA 2050 SAN
System Name: Santry-MSA2052
System Location: Santry
Version: VL270R001-01
#
#
#
# show snapshots
Pool Name Creation Date/Time Status Status-Reason Parent Volume Base Vol Snaps TreeSnaps Snap-Pool Snap Data Unique Data Shared Data Retention Priority
-------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Success: Command completed successfully. (2019-11-28 11:15:27)
#
#
#
# show snapshot-space
Snapshot Space
--------------
Pool: A
Limit (%Pool): 10%
Limit Size: 2157.9GB
Allocated (%Pool): 0.1%
Allocated (%Snapshot Space): 0.9%
Allocated Size: 18.7GB
Low Threshold (%Snapshot Space): 75%
Middle Threshold (%Snapshot Space): 90%
High Threshold (%Snapshot Space): 99%
Limit Policy: Notify Only

Success: Command completed successfully. (2019-11-28 11:15:43)
#

Do we to mofiy the pool sizes? Especially as I have further 2 * 4Tb that will need to be replicated.

SUBHAJIT KHANBARMAN_1 · ‎11-28-2019

It's really difficult to answer your queries as there are many data missing in terms of both MSA systems like both Array Pool size, how many volumes, each volume size, how much space set as limit for snapshot space, etc

You have seen event ID 571 logged earlier when replication was going on means Copy the primary volume’s current snapshot data to the secondary volume’s current snapshot and during this time allocated snapshot space exceeded the configured percentage limit of the virtual pool. The moment replication completed which means Rollback the secondary volume to the secondary volume’s current snapshot then event ID 572 gets logged which means the indicated virtual pool has dropped below one of its snapshot space thresholds.

I would suggest to try the below command,

# show replication-snapshot-history

Hope this helps!
Regards
Subhajit

I am an HPE employee

If you feel this was helpful please click the KUDOS! thumb below!

************************************************************************

I work for HPE

AllanClark · ‎11-28-2019

Many Thanks for your help so far

The output from

# show replication-snapshot-history
Name                   Snapshot History   Count      Snapshot Basename           Retention Priority
------------------------------------------------------------------------------------------------------
repSet-vol004-dub-san disabled           1                                      never-delete
repSet0001             disabled           1                                      never-delete
------------------------------------------------------------------------------------------------------
Success: Command completed successfully. (2019-11-28 15:59:23)
#

the first one is the real replication one.

This is the 4TB LUN which we have initially successfully replcated & then we have subsequently run a second scheduled replication as a ctach up.

We got 571 errors & subsequently got a few 571 event IDs in the same time frame (all logged at 18:21:07).

The pools at the remote site us

show pools
Name Serial Number                    Blocksize Total Size Avail Snap Size OverCommit Disk Groups Volumes Low Thresh Mid Thresh High Thresh Sec Fmt
Health     Reason Action
------------------------------------------------------------------------------------------------------------------------------------------------------------
A    00c0ff3cf170000017c7a15c01000000 512       21.5TB     15.8TB 18.7GB    Enabled     2           4        50.00 %     75.00 %     99.00 %      512e
OK
------------------------------------------------------------------------------------------------------------------------------------------------------------
Success: Command completed successfully. (2019-11-28 16:08:35)

Teh volumes that are available are.

show volumes
Pool Name               Total Size Alloc Size Type Large Virtual Extents Health Reason Action
-----------------------------------------------------------------------------------------------
A    Vol0001            99.9GB     763.3MB    base Disabled               OK
A    sa-vol0001         3999.9GB   2375.2GB   base Disabled               OK
A    santry-vol0001-dr 99.9GB     49.5GB     base Disabled               OK
A    santry-vol0004-dr 4299.9GB   3271.7GB   base Disabled               OK
-----------------------------------------------------------------------------------------------
Success: Command completed successfully. (2019-11-28 16:07:49)
#

The volume santry-vol0004-dr is the volume at remote site that is being replicated

The out put from show snapshot-space is

how snapshot-space
Snapshot Space
--------------
Pool: A
Limit (%Pool): 10%
Limit Size: 2157.9GB
Allocated (%Pool): 0.1%
Allocated (%Snapshot Space): 0.9%
Allocated Size: 18.7GB
Low Threshold (%Snapshot Space): 75%
Middle Threshold (%Snapshot Space): 90%
High Threshold (%Snapshot Space): 99%
Limit Policy: Notify Only

Success: Command completed successfully. (2019-11-28 16:11:44)

SUBHAJIT KHANBARMAN_1 · ‎11-28-2019

It will be difficult to explain everything here but let me try.

As per the output of command "show replication-snapshot-history" we see that there are two replication set. Out of which repSet-vol004-dub-san is responsible for volume name santry-vol0004-dr

Here volume santry-vol0004-dr total size 4299.9GB but allocated size 3271.7GB which means actual data size 3271.7GB

Now if you see output "show snapshot-space" there Snapshot Limit Size shows 2157.9GB but this is far less than the actual allocated size of the volume which is 3271.7GB and that's why when you did 1st time replication that time entire 3271.7GB data got replicated which is more than what is set as snapshot limit. This is the reason you got 571 event and the moment replication got completed this space got cleaned up as Current snapshot data rolled back to Secondary volume and 572 event got logged. You can customize this snapshot limit size as per your requirement as well in order to avoid this type of alerts.

I would suggest to go through commands like show snapshot-space, set snapshot-space, show replication-snapshot-history, show replication-sets. These will clear your doubts.

https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-a00017709en_us

Hope this helps!
Regards
Subhajit

I am an HPE employee

If you feel this was helpful please click the KUDOS! thumb below!

*************************************************************************

I work for HPE

AllanClark · ‎11-29-2019

Many thanks for this.

It starting to make sense now. I am guessing as I will need to start replicating approx another 5Tb over an additiional 2 * 4TB LUNs that we should adjust the limit setting to be about 40% of the Pool.

One other point at the source MSA2052 I am suprised that we don't get any of the allocated snapshot space errors - as assume the same snapshotting happening there?

if I adjust the snapshot space at the target side then do I need to adjust at source side.

Many thanks again

SUBHAJIT KHANBARMAN_1 · ‎11-29-2019

See Source Array initially not much snapshot space required for 1st replication because Volume data and snapshot point to same LBA from same Pool of space. That's why no extra space required for source array to keep snapshot. However you need to keep 3 times size of the volume inside pool just to be safer side as per the rule for future.

When 1st time replication happening that time entire data getting copied from Source to Destination array and that's why you need more snapshot space dedicated to Secondary Array to accomodate Source volume replicated data.

In your case 1st replication Source Volume data 3271.7GB and that's why you need more snapshot space to accomodate this data in destination array. After this replicated data gets copied it will (Current Snapshot) will roll back to secondary volume.

Hope it's clear now why you get 571 event for destination Array and why 572 event as well.

Hope this helps!
Regards
Subhajit

I am an HPE employee

If you feel this was helpful please click the KUDOS! thumb below!

**************************************************************************

I work for HPE

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Allocated snapshot space exceeded the configured limit

Allocated snapshot space exceeded the configured limit