09/16/2024 - RAID-10 is Obsolete

Many companies build their own All-Flash Arrays as a part of their offerings and infrastructure. If you need redundancy and speed, this array is usually configured RAID-10. The benefit of mirroring is that even simple software RAID in Linux works well, is stable, and does the job. Getting decent IOPS and bandwidth on reads and writes is also pretty easy, if you throw decent hardware at the problem.

There are still times when RAID-1 makes sense. If you want to mirror your boot drive, or have a small, non-demanding data set, mirrors are still cheap and easy. It is when the capacity gets larger, or the performance matters, or Flash endurance becomes an issue, that there are better options than RAID-10.

So what are the downsides of RAID-10? First, and foremost, you are spending more on your storage than you need to. This becomes even more important when Flash endurance is also a consideration. But, until now, there was no real “other option”.

Enterprise Compressed RAID (ECR) re-thinks All-Flash Arrays. ECR is not some combination of open-source software configured in a unique way. Instead, ECR is a patented software layer that implements an FTL (Flash Translation Layer) on the host. This creates a scenario where an All-Flash Array only sees linear writes. And, when you add in block-level compression, the linear writes are smaller which creates bandwidth and capacity out of thin air.

The key here is that linear writes are special, both to the SSDs and to the parity array logic inherent in RAID-5 and RAID-6. Because writes are controlled, linear, and perfectly aligned, the overhead incurred by the SSDs goes way down. In fact, the SSD becomes a “tape” that linearly records data at near 1:1 efficiency. A key metric of Flash SSDs, WAF (Write Amplification Factor), goes to 1:1, regardless of the SSD design. Finally, the RAID layer itself gets more efficient. Every Parity RAID configuration has an “optimal IO size”. If you look in /sys/block/md0/queue/optimal_io_size , you will see the magic size and alignment that keeps the array happy. ECR does 100% of its writes at exactly this size and alignment, completely eliminating the typical read/modify/write overhead associated with RAID-5 and RAID-6. Put simply, the array itself streams perfect linear writes.

On top of this, ECR compresses blocks as they are written. The compression is simple but fast, based on the LZ4 algorithm, but tuned for 4K blocks. The compression ratios are not stellar, but any compression ratio is literally “free space” and “free bandwidth”, both of which are good things in a storage stack.


An Example Write Workload:

Let us suppose that your application is generating mixed writes. Some are 4K random blocks. Some are longer. Pretty much a jumbled mess. The easiest way to simulate this is to look at 16K random writes in a benchmark like FIO. Not perfect, but better than a blind guess.

We tested 16K 100% random writes on an Amazon i4i.metal “bare metal” server. This is a 3rd gen dual-socket Scalable Xeon Platinum server with 8 x 3.75 TB NVMe locally attached SSDs. We tested with 6 drives to match what many web hosters use on their converged compute/storage nodes. Here are the baseline write performance numbers for various RAID levels:

This shows how linear writes are “just faster”. At some level, we all know that linear writes are more efficient for both SSDs and arrays, but ECR is the first option you have to convert your random workload into the linear workload that just “works better”.

Raid Level 16K Random Writes
(no compression)
MB/sec
ECR / RAID-5 11,043
RAID-0 6,398
RAID-10 3,303
RAID-5 269

Mixed Read/Write Workloads:

The benefits of ECR extend to the cases of reads and writes mixed. While ECR does not generally impact the performance of reads, lowering the overhead of writes gives the SSDs more time to do reads. Only 100% read workloads tend to not benefit from ECR.


Compression:

Then you add in the benefits of compression. But do you have a compressible data set? Odd are, you do, at least to some extent. Most file systems themselves have compressible blocks. Small files are also inherently compressible because the file allocation gets rounded up to a whole block.

Then there are data types that compress well. Text compresses OK. What compresses really well is databases. The design of databases often use file structures that are “sparse filled”. By leaving extra space in block structures, databases run faster. This also means they waste space, but this is the trade-off they make. And this trade-off is good for storage layers that compress, like ECR. Most databases that are not storing binary “blobs” reach 60% compressibility or more (some users report 80%).

This means that databases, which are often some of the worst workloads for All-Flash Arrays in terms of write overhead, benefit most from compression. Compression lowers bandwidth. Compression lowers wear, which results in much longer drive life. Compression can even let you store more in the same amount of Flash.


Space and Cost:

Storage is all about space and cost, first and foremost. Speed is important, but once a storage solution is “fast enough”, more speed might not have a business case. ECR happens to be the fastest solution for many configurations, but some use cases are all about cost 1st, cost 2nd, and cost 3rd.

ECR can save you money, both when you build the array, and when you don’t need to replace worn out SSDs. The amount can be massive.

For an “apples to apples” comparison, consider an array that can tolerate a single SSD failure. This is RAID-10 / single mirror for traditional arrays, and RAID-5 for ECR (ECR also works with RAID-6 when this makes sense).

The SSD pricing is based on $1,152/SSD or about $150/TB (at the 1 DWPD level). Your vendor might be higher or lower, but the math still works out.

To get “mixed workload” endurance out of RAID-10, you generally need 3 DWPD SSDs. ECR is actually better suited to 1 DWPD media. The 3 DWPD SSDs get their extra endurance by lowering the WAF (Write Amplification Factor) by increasing the over-provisioning amount. With the ECR linear write structure, over-provisioning is no longer needed at the SSD level.

The cost/TB for the base array is about $338/TB usable. With ECR it is a little more complicated. If you choose to not “thin provision” the array, meaning that there is guaranteed space regardless of data compressibility, then the cost is $212/TB. With a database, it is generally conservative to 150% thin provision driving the cost down to $141/TB. This is an “inclusive” cost including the ECR license (at the 500+ server volume level). Either way, the $/TB is much lower with ECR.

If you try to save money by using 1 DWPD media with RAID-10, but then end up replacing drives in a few years, you are actually worse off. Your $/TB actually goes up to $600/TB or more, over time.

If you try to save money with RAID-5, the performance is terrible, plus your wear remains at RAID-10 levels (remember RAID-5 does two writes for every random write). Again, you are above $360/TB (over time) with performance levels that are truly sad.

So, if you want to cut your storage costs in half, ditch RAID-10 for ECR. But will you really save 50%. Some use cases are better and some are worse. The cost savings for ECR is usually a bit lower with non-compressible datasets. For use cases where SSDs wear out, the cost savings can be over 50% over time. Even retrofitting existing arrays with partially worn out drives often makes sense. And remember that the cost savings are “all in” and don’t count the intangibles like lower labor because drives don’t get replaced or higher customer satisfaction because applications run faster.

Enterprise Compressed RAID breaks the rules, and the race is not even close. By implementing an FTL and block level compression in software, Enterprise Compressed RAID delivers performance, capacity, and low wear, all at the same time.

RAID-10 RAID-5 w/ ECR
SSDs 6 x 6.80 TB 6 x 7.86 TB
SSD Type NVMe 3 DWPD NVMe 1 DWPD
SSD Cost $1,152 x 6 = $6,912 $1,152 x 6 = $6,912
ECR Cost $841
Total Cost $6,912 $7,753
Usable Capacity
(base)
20.4 TB 36.5 TB
Usable Capacity
(compressed)
54.7 TB
Cost/TB
(base)
$338/TB $221/TB
Cost/TB
(compressed)
$141/TB

Some more Performance Numbers: 6 NVMe SSDs (AWS i4i.metal)

This is just a couple of data points. We have complete profiles for 4 and 6 drives RAID-0/5/10 and ECR wins every time. It is not just the linear writes, or the compression that reduces data transfers. ECR just keeps the SSDs happy. Most of the variability that you see in SSD performance is caused when the SSD has to deal with random writes. By eliminating these, the SSDs can sit there and just “do IO” without thinking a lot.

You should also note that ECR tends to do better with lower queue depths than stock RAID. Often benchmarks quote “hero numbers” at Q=1024 using LibAIO, We test at Q=120 without LibAIO as this seems to more closely match real workloads.

Raid Level 70/30 Mix, 16K blocks
(no compression)
MB/sec
ECR RAID-5 8,962
RAID-0 5,727
RAID-10 5,089
RAID-5 835

Capacity:

Usable Capacity for 6 x 7.68 SSDs with minimal compressibility. Higher compression levels can increase ECR space further.

Raid Level Usable Capacity
TB
ECR RAID-0 43.8
ECR RAID-5 36.5 + TB
RAID-0 46.1 + TB
RAID-5 38.4 TB
RAID-10 23.0 TB

SSD Wear:

This is also not even close. Using 1 DWPD SSDs and 30% compressibility, 6 x 7.68TB SSDs can be expected to write 3X+ more, over the life of the SSDs.

Often, this transforms a scenario where you need to replace SSDs every 3-4 years, into a scenario where the SSDs are obsolete before they wear out.

Raid Level Total Bytes Written (TBW)
(30% compression)
ECR RAID-0 174 PB
ECR RAID-5 145 PB
RAID-0 84 PB
RAID-5 42 PB
RAID-10 42 PB


+1 (888) 473-7866      +1 (610) 237-2000

 

Copyright © 2024 EasyCo LLC dba WildFire-Storage