1752644 Members
5708 Online
108788 Solutions
New Discussion юеВ

Re: RAID

 
Aaron_134
New Member

RAID

Deciding on RAID for a server which has four hard drives.

I understand that one of the major advantages of RAID 5 is that you lose less hard drives space than with RAID 1.

But with RAID 1, if one of the hard drives fail, doesn't the server continue functioning by running off the second drive?

8 REPLIES 8
Steven Clementi
Honored Contributor

Re: RAID

Aaron:


With both RAID1 and RAID5 you can lose one disk and keep on going. The real question is what do you need the space for? Will you be doing alot of Read I/O? alot of Write I/O? or alot of both? or a little of both?

Once you can answer that, then the decision is usually easier.


Steven
Steven Clementi
HP Master ASE, Storage, Servers, and Clustering
MCSE (NT 4.0, W2K, W2K3)
VCP (ESX2, Vi3, vSphere4, vSphere5, vSphere 6.x)
RHCE
NPP3 (Nutanix Platform Professional)
Mike Reznak
Trusted Contributor

Re: RAID

Hi,

generally, RAID1 is faster for writing operations, so it gives you better performance, but you lose capacity. It's then allways matter of your needs and budget. ;o)

Mike
...and I think to myself, what a wonderful world ;o)
Aaron_134
New Member

Re: RAID

In RAID I understand that there is two copies of the same file. If one drive fails it uses the other copy of the file.

But if one of the drives fail in a RAID 5 array, where does it get the lost data from and where does it store it?

If you have four 100GB disks in a RAID 5 array that means that 300GB of data hase to be available from two sources in the event that one drive fails.

How can it store 300GB + 300GB of data using 400GB of hard drive capacity?
Steven Clementi
Honored Contributor

Re: RAID

Aaron:


In a RAID5 array, if you lose a drive, the data for that drive is rebuild on the fly from the parity information stored on the other drives. The data is already stored. The data is stored in stripes accross all drives in the set, then parity information about the other drives is writen to each drive so that incase of lose of 1 disk, the controller can read that parity info and rebuild the data from the lost drive.


"If you have four 100GB disks in a RAID 5 array that means that 300GB of data hase to be available from two sources in the event that one drive fails."

No, it is still available from 3 sources. 4 - 1 = 3

"How can it store 300GB + 300GB of data using 400GB of hard drive capacity?"

It does not do full mirroring like RAID1, it only stores enough information to be able to rebuild the lost data, which is spread out across all the drives.


I had a RAID presentation once. I will try to find it and post it. It did a very basic job of showing how RAID5 worked.


Steven
Steven Clementi
HP Master ASE, Storage, Servers, and Clustering
MCSE (NT 4.0, W2K, W2K3)
VCP (ESX2, Vi3, vSphere4, vSphere5, vSphere 6.x)
RHCE
NPP3 (Nutanix Platform Professional)
Vincent Fleming
Honored Contributor

Re: RAID

RAID 5 uses a parity scheme, based on using the XOR (Exclusive OR) of the bits in each word of the data on the disks.

XOR parity has an interesting property, where if you have N chunks of data, and you calculate the XOR of them all, it produces an XOR parity block. With this block, you can recover lost data by successively XOR'ing the parity with the remaining data blocks - after the last data block is XOR'ed out, you get the original data.

For example, if you use the following nibbles (in binary, for clarity):

1001
1101
0110

And you XOR each in turn:

1001 XOR 1101 = 0100 then XOR that with the next nibble:

0100 XOR 0110 = 0010 <-----this is your parity.

Now assume you lose some data (drive failure)... let's pick the middle nibble,(1101).

You take the parity, 0010, and XOR it with the data you still have:

0010 XOR 1001 = 1011 XOR 0110 = 1101 !!!!

Ta Da! The missing data.

Note that although I used only 4 bits (a nibble) in my example, the data blocks may be of ANY size, and this still works.

Taking my above example, and assuming that each nibble is data from a different DISK, then if you sustain a disk failure, you can reproduce that lost data from the remaining drives and parity blocks.

Now, to better explain how this works in an array... let's think about RAID4 instead of RAID5 - it's easier to explain, and is VERY similar to RAID5.

RAID4 has a groups of data drives and a dedicated parity drive. For example, 4 data drives and a parity drive. So, Block 1 on drive 1 is XOR'ed with Block 1 on drive 2, which is XOR'ed with Block 1 on drive 3, which is XOR'ed with Block 1 on drive 4, which produces the Parity Block, which is written to Block 1 of the 5th drive.

The data is written to the drives in stripes to increase performance, but this does not affect the parity scheme.

Now - RAID5 - very similar to RAID 4, except that the Parity is ALSO striped across the drives - it "rotates" - ie: on the first stripe, it's on Drive 1; On the 2nd stripe, it's on Drive 2; On the 3rd stripe, it's on Drive 3, and so on.

The reason for doing this is that in RAID4, the Parity drive becomes a bottleneck; in RAID5, there's no dedicated parity drive, so it eliminates that bottleneck.

There... whew - that was a lot to explain!

Any questions?

Regards,

Vince
No matter where you go, there you are.
Aaron_134
New Member

Re: RAID

OK I see now. So if the data has to be rebuilt, doesn't that mean that the data will not be available until the rebuild is complete?
Vincent Fleming
Honored Contributor

Re: RAID

It can re-create the data on the fly - if you ask for a data block that's "missing", it just goes and calculates it.

Of course, it slows down quite a bit - to read a single data block from the failed drive of an 8-drive RAID5, you have to read a block from 7 drives, and calculate the parity!

Also, the overhead of rebuilding an entire drive can be very significant. How bad your performance gets is VERY hardware dependent, though. Some array hardware is MUCH better at rebuilds than others are.

Regards,

Vince
No matter where you go, there you are.
Stuart Abramson
Trusted Contributor

Re: RAID

The "tradeoff" is that:
o .. Raid 1 is the fastest, but you only get 50% usable from your disks (4 x 100 Gb raw = 400 GB raw * 50% = 200 GB usable).
o .. RAID 5 is slower, but you get 75% (In a 4-disk RAID Group) usable from your raw disks (4 x 100 GB = 400 GB raw * 75% = 300 GB usable.) So you get more usable storage per $. You also get more usable storage per square foot, which can be an issue.

Also, the performance degrades from RAID 1 to RAID 5 depending on the ratio of writes to reads, the lenght of reads/writes and other factors - almost impossible to predict in advance.