Re: MSA 2052 configuration sanity check

Scott Huston · ‎08-24-2020

I'm looking to replace an aging P2000 G3 SAS with a MSA 2052 SAS. The P2000 has a wild mix of drives assembled over 10 years in 24 SFF + 25 SFF add-on shelf. It's been flawless. Never a support issue. Rarely a drive failure even though many of the drives have been refurbs.

For the 2052 I'm looking at adding 3 x 800GB SDD + 17 2.4TB HDD to have two identical volumns of with auto-tiering:

2 x 800GB SSD RAID 1 (Performance tier)
8 x 2.4TB HDD RAID6 (Other tier)

And 1 spare of each type.

Does this sound Ok? Any compelling reason to go 8 x 2.4TB RAID10? Total capacity isn't critical, nor is performance as 800GB RAID1 x 2 is enough to handle all of the hot data in the systems (10-15 VMs).

Is there sufficient risk of a failure during a RAID6 rebuild of a 8x2.4TB array to make avoiding it worth considering?

SUBHAJIT KHANBARMAN_1 · ‎08-25-2020

@Scott Huston

I am assuming you want to implement everything under one Pool.

Considering choosing RAID, you need to check few factors between RAID10 vs RAID6

Disk utilization
---------------------
RAID 10: No matter how many disks the RAID 10 comprises, it only utilizes half space of the disks, because the other half capacity of a RAID 10 array will always be dedicated to protection (backup).
RAID 6: Its disk utilization is more than 50%. If the RAID 6 array comprises four disks, only 50% of that space is usable capacity, but the proportion of usable space increases as you add more drives. For example, if the number of disks in a RAID 6 array is increased to eight, the space utilization would increase to 75% (only 25% of space is used for parity data).

Reliability
----------------
RAID 10: Whether RAID 10 can handle two disk failures simultaneously depends on where they occur. If both failed disks are located in one mirrored set, the data will be lost and the Pool will stop work. If the failed disks are not located in the same mirrored set, even if half of disks fail, you can ensure MSA operational and then rebuild RAID 10.
RAID 6: RAID 6 can ensure the normal operation of the MSA when any two disks fail simultaneously. And then, you can replace these failed disks to rebuild RAID 6.

Performance
---------------
Write speed always faster than RAID6 because no parity calculation overhead, read speed also faster. Rebuild speed faster compared to RAID6 because it just need to rebuild or copy data from mirror set whereas in RAID6 need to calculate parity to rebuild data.

Now one important point to remind you as you are planning to build RAID1 Performance tier with 2 SSD drives. The moment one SSD drive failed System will automatically drain all pages from SSD to next spinning drive tier which is configured. Provided you have sufficient space available in spinning tier. This occurs because similar wear across the SSDs is likely, so more failures may be imminent.

Hope this helps!
Regards
Subhajit

I am an HPE employee

If you feel this was helpful please click the KUDOS! thumb below!

*************************************************************************

I work for HPE

SanjeevGoyal · ‎08-25-2020

Hello,

You can also follow the below link for more clarification.

https://support.hpe.com/hpesc/public/docDisplay?docId=emr_na-a00017822en_us

If you feel this was helpful please click the KUDOS! thumb below!

Regards,

I am a HPE Employee.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

Accept or Kudo

Scott Huston · ‎08-25-2020

I understand all the RAID levels very well, hence the question, if not direct enough.

The question was intended to ask about RAID5/6 reliability re: very large disks. Does the rebuild time/number of IOs required vs the UER (unrecoverable error rate) of the drives, with 2.4TB drives, create undue risk? I could not find a spec on the 2.4TB SAS drives rate; 10^16? 10^15?

What's HPE's guidance on RAID5/6 on multi-TB HDDs with many spindles? Where's the edge of the cliff?

And perhaps my terminology was inexact. I'm looking to have both controllers actively working. Each controller owning a virtual volume with 2 disk groups:

2 x 800GB SSD performance tier disk group
8 x 2.4TB HDD standard tier disk group

So, is that 2 pools, each of one vVol comprising 2 DG each?

If I have a spare SSD and HDD (I noted the spares) then wouldn't the MSA not bleed out the pages on an SSD failure because it would have a spare to incorporate? Or is there no sparing for SSDs?

Scott Huston · ‎03-29-2021

Can I get a reply on the following?

"If I have a spare SSD and HDD (I noted the spares) then wouldn't the MSA not bleed out the pages on an SSD failure because it would have a spare to incorporate? Or is there no sparing for SSDs?"

Shawn_K · ‎03-31-2021

Hi Scott,

I am not sure I am fully understanding your question. But let me see if we can continue this conversation and get your questions answered fully.

Your question - Does the rebuild time/number of IOs required vs the UER (unrecoverable error rate) of the drives, with 2.4TB drives, create undue risk? The rebuild time is not dependant on the number or UREs but rather on factors which include the criticality of the RAID level, host IO load, utilities (scrub, rebuild, priority configuration, etc) and RAID level. For example, if you have a RAID5 DG and lose a drive the rebuild criticality will be higher than a rebuild of a RAID6 with a single drive failure. (Hopefully that makes sense.) Now to address the UER rate. Every drive can withstand a certain number of UREs. There is a "Bad Block List" that is internal to a drive that is used to internally "spare" out UREs within the drive itself. Once that spare block list has been consumed by the drive, the drive will fail. The array will then fail out the drive from the DG and then if a spare is available, a rebuild begins. This is a high level overview of the internal processes that happen within a drive and the array. If you have a DG made of SSDs, the same sort of internal process happens when an SSD fails.

Your rebuild time is dependent on the drive, size of the drive, speed of the drive, IO/utilities on the array, and the amount of data that needs to be relocated. A 800GB SSD will rebuild faster than a 2.4TB HDD if all other factors are exactly the same just due to speed/performance of the SSD. However, there are reasonable situations where a 2.4TB HDD could rebuild quicker than an 800G SSD.

Now let me try to answer the rest of your query as I think you are asking. If you have an SSD performance tier, that tier will be handling the most recently accessed pages or "hot" pages. This means that data which is frequently accessed or used is on the performance tier in order to provide quick access by the hosts. Pages that are not as frequently accessed are migrated down to "cold" regions which will be stored on the standard teir group. There is an internal algorythm that determines when a page is moved between the two tiers. If you have an SSD failure and have a spare SSD available the rebuild will begin. This will be an activity that will occur below the level of the tiering of pages. You would have to have an etremely busy array (host IO, large data writes/reads, full performance tier, etc.) to see a noticable imact to performance. You would not encounter a situation where the performance tier would not be active.

Does this help to answer your query?

I work for Hewlett Packard Enterprise. The comments in this post are my own and do not represent an official reply from HPE. No warranty or guarantees of any kind are expressed in my reply.

Cheers,
Shawn

I work for HPE

Scott Huston · ‎03-31-2021

Shawn,

Thank you for your extended answer.

I've let go of the UER issue as there's not anything I can do about it shy of only having RAID 10 arrays.

re: SSDs Subhajit suggested that an SSD performance tier would be bled out with an SSD failure and my current question was really "What if there's a spare?"

Thank you for answering that.

In relation to my configuration question in the OP, my thoughts have changed. I had thought that I'd implement two identical pools, but now I'm thinking 1 pool with performance tiering (as in OP) and 1 pool with SSD read cache. This is because I have two distinctly different types of load: database-like (R/W = 60/40%) and file server-like (R/W = 90%/10%)

Any obvious reason this isn't a good approach?

Finally, I'm struggling to find the endurance rating for the 2062's included 1.92TB RI drives. Link?

Shawn_K · ‎03-31-2021

Hi Scott,

I think we are getting closer toa complete answer for you, which is good.

To Subhajit's point, a performance tier will move pages down to a lower tier, if it becomes degraded. Degraded can be a widely used term in regards to drive/RAID failure and the fullness of the performance tier. So we do need to remember that the swapping of pages is at a different level that the RAID layer.

At a high level, it depends on whether you are looking for a pure performance boost or whether you are looking for transactional/response time improvements of whether you use performance or read cache tiering. So if you are going to need a lot of transactional counts you would go performance tiering. If you want your average I/O performance to have low latency go for read cache tiering.

As I do not know what use the database of file server will fullfill, it is hard to state which one to land on the performance tier and which to land on the read cache tier. Hopefully, my answer helps.

I work for Hewlett Packard Enterprise. The comments in this post are my own and do not represent an official reply from HPE. No warranty or guarantees of any kind are expressed in my reply.

Cheers,
Shawn

I work for HPE

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: MSA 2052 configuration sanity check

MSA 2052 configuration sanity check