Risk Assessment of Disk Group Failure on EVA5000

SAKET_5 · ‎02-24-2005

Hi All,

I am trying to assess the risks of losing one disk group failure on an EVA5000 as opposed to all disk groups failure on the same storage array.

We have a few EVA 5000s and they are configured with 3 Disk Groups - I understand the performance cost and sparing costs due to multiple disk groups, however this was configured in order to reduce the service downtime period should we lose all the LUNs on a single disk group.

On the other hand, if we lose 2 disks (all our Vdisks are Vraid 5) on different shelves at the same time or a 2nd disk failure before the first disk failure has been rectified - we run into the risk of losing all our LUNs in that disk group. So, if it happens to be just a dodgy bunch of disks in our EVAs with a very low MTBF and they all decide to fail at around the same time, it is likely that we might lose multiple disks (enought to bring the RSSs down and hence the disk group) across different disk groups. In this event, we are likely to lose the whole EVA as in all the disk groups on it. So, just wondering if there is any advantage in configuring multiple disk groups from a manageable recovery time perspective should a disk group dies - most likely all the disk groups will die at the same time (?).

What are the forums thoughts on this?

I tend to think that may be a better way to approach fault tolerant is to distribute LUNs across different EVAs, such as Oracle DB on a disk group from EVA 1, logs from a Disk group on EVA 2. Any inputs?

What are the likely scenarios where only one disk group fails and all the other DGs survie?

Regards,
Saket.

Uwe Zessin · ‎02-24-2005

I don't have my notes with me, but according to Ken Bates@HP, a disk group's MTBF is measured in thousands of years. I've written down the exact number, but it is not really meaningfull as Ken avoided to tell about the exact configuration ;-)

The EVA is supposed to store its meta data redundantly if multiple disk groups exist, so that a single group failure will not take down the entire array.

I agree about your allocation policy. For a single EVA, separate database and logs on different disk groups. If it is a good idea to simply split a database over multiple EVAs, I am not so sure. Imagine you have two databases split them over two EVAs. One EVA goes down and you have downtime on both databases. In that case I would rather use mirrored redoes and host-mirror the archive logs on both EVAs in order not to loose any data.

.

SAKET_5 · ‎02-25-2005

G'day Uwe,

Thanks for the input.

I am more interested in dreaming up scenarios where a failure of a component causes downtime just on a single DG and not on all DGs on an EVA. Yes, I am sure you would discard the obvious stuff such as two disks failure in a RSS just within a disk group.

My thoughts are if we ever encounter multiple disks failure around the same time, we are likely to observe that across the arrays thus affecting all our disk groups.

Any thoughts?

Regards,
Saket.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Risk Assessment of Disk Group Failure on EVA5000

Risk Assessment of Disk Group Failure on EVA5000

Re: Risk Assessment of Disk Group Failure on EVA5000

Re: Risk Assessment of Disk Group Failure on EVA5000