- Community Home
- >
- Storage
- >
- Around the Storage Block
- >
- Big Data update: Hadoop 3.0 HDFS erasure coding pe...
-
-
Categories
- Topics
- Hybrid IT with Cloud
- Mobile & IoT
- IT for Data & Analytics
- Transformation
- Strategy and Technology
- Products
- Cloud
- Integrated Systems
- Networking
- Servers and Operating Systems
- Services
- Storage
- Company
- Events
- Partner Solutions and Certifications
- Welcome
- Welcome
- Announcements
- Tips and Tricks
- Feedback
-
Blogs
- Alliances
- Around the Storage Block
- Behind the scenes @ Labs
- Converged Data Center Infrastructure
- Digital Transformation
- Grounded in the Cloud
- HPE Careers
- HPE Storage Tech Insiders
- Infrastructure Insights
- Inspiring Progress
- Internet of Things (IoT)
- My Learning Certification
- Networking
- OEM Solutions
- Servers: The Right Compute
- Telecom IQ
- Transforming IT
-
Quick Links
- Community
- Getting Started
- FAQ
- Ranking Overview
- Rules of Participation
- Contact
- Email us
- Tell us what you think
- Information Libraries
- Integrated Systems
- Networking
- Servers
- Storage
- Other HPE Sites
- Support Center
- Enterprise.nxt
- Marketplace
- Aruba Airheads Community
-
Categories
-
Forums
-
Blogs
-
InformationEnglish
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Receive email notifications
- Email to a Friend
- Printer Friendly Page
- Report Inappropriate Content
Big Data update: Hadoop 3.0 HDFS erasure coding performance
Big Data update: Hadoop 3.0 HDFS erasure coding performance
There’s no question that Big Data is getting bigger every day. According to various reports, 90% of the today’s data was created in the last 2 years [1]. Because of this, for most Big Data customers, $/TB is the most important purchasing decision factor. In this blog, we present a way of combining our HPE Elastic Platform for Analytics (EPA) Hadoop architecture with Hadoop 3.0 HDFS erasure coding and hardware compression to provide an increase of >6x in HDFS usable space, without sacrificing performance or cost in order to obtain it. This in turn ends up lowering $/TB metric by >6x.
It is said that the three most important factors for a Big Data platform are storage cost, storage cost and storage cost. While in truth other factors are important, customers are acutely aware of the need for cost-effective storage given the volumes of data being retained. In fact, the early genesis of Big Data was the creation of the Google File System as a means to reliably manage large volumes of data more cost effectively than traditional storage arrays. This is exacerbated by the fact that most Big Data solutions store data using replication where 3 copies of the data are maintained. While this can improve read throughput it is primarily being done for fault tolerance. It is very unusual to have any kind of a backup solution on a big data platform not only for cost reasons but because the data is often arriving at a rate that makes it infeasible to backup. For this reason, Big Data is at odds with itself. Storage cost is paramount but data durability requires us to overprovision our storage by 3x. And as Big Data becomes more structured it will have more auxiliary structures such as indices and aggregates which will increase write activity yet triple replication means 3x the writes.
Our solution to improve the storage efficiency and to lower the $/TB metric is to leverage the HPE EPA Workload and Density Optimized (WDO) Big Data architecture, combining the Hadoop Distributed File System (HDFS) Erasure Coding (EC) feature introduced in Hadoop 3.0 with transparent hardware compression resulting in > 6x more HDFS usable space and lowering the $/TB metric by >6x. Find more details about HPE EPA architecture here: www.hpe.com/info/bigdata-ra.
This is a diagram with the building blocks used to create a HPE EPA WDO solution. The most important aspect is separating storage and compute resources on different physical servers.
The default HDFS replication scheme uses 3 replicas thereby incurring 200% overhead whereas a typical EC scheme has an overhead of only 40-50% and can store double the amount of data compared to a 3x replication scheme. This savings alone is beneficial but in addition, by making use of a hardware compression solution from MaxLinear (Model DX2040 [3]), you can gain another 3x improvement in storage efficiency for a total of >6x improvement. The processing overhead to implement the EC encoding/decoding is mitigated in the solution by making use of Intel’s open source Intelligent Storage Acceleration Library (ISA-L) which exploits advanced processor instruction sets. The hardware compression solution seamlessly integrates into the HDFS stack and provides automatic compression of all files in a transparent manner.
In addition to the storage efficiency, erasure coding also improves performance by reducing the amount of I/O performed by applications. The solution also makes use of HPE’s innovative WDO architecture for Big Data which lends itself very well to address the drawbacks of EC scheme over replication, such as remote I/O and the potential interference of EC calculations with compute intensive workloads, thus allowing us to use this solution to store all HDFS data in a compressed form, not just COLD data.
Erasure Coding schemes have been used in other storage solutions, including RAID, to provide high availability while being more efficient with respect to storage space. EC schemes encode the data with additional data, i.e. parity data, such that the original data can be reconstructed using the parity data in case of failures. There are number of EC schemes available, Reed-Solomon (RS) being one of the popular schemes. RS uses sophisticated linear algebra operations to generate multiple parity data blocks and can tolerate multiple failures.
Our solution makes use of the EC feature being introduced in upcoming Hadoop 3.0 release. For example, a RS(6,3) scheme will generate 3 parity blocks to encode 6 data blocks and provides protection against 3 failures. Thus, by using EC, our solution gains 2x usable HDFS space. Also since the same data is now stored on 9 disks (6+3) instead of 3 disks for replication=3, we get more parallelism to disks which increases performance.
The drawback of an EC scheme is that it needs compute cycles to encode the data while writing and decode it while reading. To address this issue, our solution makes use of the open source ISA-L library which accelerates the linear algebra calculations by making use of advanced instruction sets e.g. SSE, AVX, and AVX2 available for Xeon E5-2660v4 processors used in Apollo 4200 storage nodes. The ISA-L EC implementation outperforms the Java implementation by more than 4x [2]. Because of this acceleration and due to the increased I/O parallelization mentioned above, erasure coding outperforms standard Hadoop replication.
To improve the storage efficiency of HDFS further, we make use of hardware compression technology which has the advantage of offloading the host CPU to accelerate the compression/decompression tasks. We are currently making use of PCI-e cards from MaxLinear (Model DX2040 [3]) which can provide up to 5 GB/sec throughput. The MaxLinear solution provides a transparent Linux file system filter driver that sits below the file system to automatically compress/decompress all files. The storage space gains from this hardware compression technology can vary depending on the data being used, but we can expect a typical storage efficiency of ~5x with Hadoop data sets. The MaxLinear solution can also be used for encrypting sensitive Hadoop data at the same time as compressing the data with no performance penalty. This is an area for future exploration. It is to be emphasized that our solution to improve the storage efficiency is transparent to Hadoop software components and does not require any application changes to reap the benefits of the solution. Moreover, since HPE EPA WDO uses separate storage nodes for HDFS, the hardware compression cards are needed only in those nodes, which provides cost savings.
Finally, the most important element of our solution is the HPE EPA WDO architecture for Big Data. First, by disaggregating Storage nodes from Compute nodes, it allows us to add the hardware compression cards to improve storage efficiency only on the Storage nodes instead of on all nodes in traditional Hadoop architecture. Next, the EC striped block layout scheme is exploited to the fullest because of the increased I/O parallelization due to the “storage dense” nature (i.e. lot of drives working in parallel) of the WDO architecture. Further, our solution eliminates a major objection of current EC approach of striped block layout compared to the replication model i.e. that the EC approach foregoes the locality advantage of the replication model. In a WDO architecture design, storage is disaggregated from compute and sufficient network bandwidth is allocated to accommodate remote I/O access. This allows us to use this solution for all HDFS data, not just for COLD storage. In traditional architectures, one is forced to make new design decisions when the EC usage is being contemplated to improve storage efficiency.
We configured a HPE EPA WDO cluster in our lab using 7 storage nodes to store HDFS data. We compared both capacity utilization and performance (Teragen 3TB benchmark test) between replication with 3 replicas and erasure coding with RS(5,2) and XOR(5,2), all environments being able to sustain the loss of 2 storage node without losing data. We chose these custom erasure coding schemes in order to showcase the custom EC capabilities available in Hadoop 3.0 and better align with the number of nodes available for this test.
Testing using the Teragen 3TB benchmark shows >6x increase in usable space in a transparent manner. There are no code changes necessary in order to take advantage of these new features and all data is compressed transparently. The following table shows the amount of data used to store the 3TB dataset and Teragen 3TB execution runtimes.
You’ll notice that RS encoding scheme doesn’t compress very well, mostly because the parity data generated by RS is not compressible. The addition of MaxLinear hardware compression cards increases the total solution list price less than 1%, providing an excellent ROI-based significant decrease in $/TB it generates.
The Teragen dataset uses randomly generated binary data which is not very compressible. In general text data compression ratios are in the 5x-10x range, while log compression ratios can get closer to 20x. Taking all this into consideration, an average compression ratio of ~5x for all data stored in HDFS seems reasonable, which in turns means this solution can increase the HDFS usable space by ~10x (2x from erasure coding and multiplied by 5x for compression).
In conclusion, combining our HPE EPA WDO architecture with HDFS erasure coding and hardware compression allows our customers to use erasure-coded storage and compression for all data without any code changes and benefit from significantly reduced $/TB and improved performance.
Featured articles:
- Don't let your data lake leak
- 14 Linux and open source conferences worth attending in 2018
- Want to know the future of technology? Sign up for weekly insights and resources
Meet Around the Storage Block blogger Daniel Pol, Data and Analytics Architect, HPE.
[1] https://www.mediapost.com/publications/article/291358/90-of-todays-data-created-in-two-years.html
[2] http://blog.cloudera.com/blog/2016/02/progress-report-bringing-erasure-coding-to-apache-hadoop/
- Back to Blog
- Newer Article
- Older Article
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Receive email notifications
- Email to a Friend
- Printer Friendly Page
- Report Inappropriate Content
- HPEStorageGuy on: Scality RING shines bright in Gartner Critical Cap...
- HPEStorageGuy on: Top 10 storage trends for 2018 and why you should ...
- HPEStorageGuy on: HPE Storage Point of View
- Pro4people on: Q&A with DreamWorks Animation
- Magali Bellin on: Watch live HPE Discover video from Madrid
- HPEStorageGuy on: When you want more: Announcing new HPE LTO-8 Tape ...
- John F Kim on: Object Storage for Beginners: Podcast with Scality...
- Dennis Faucher on: Simple vs Flexible - Where Nimble Storage Fits in ...
- Pierre-François Guglielmi on: Backup from Secondary Storage Array 3PAR
- StorageExperts on: Better Together: Coffee & Doughnuts. 3PAR & Nimble...
-
3PAR
362 -
3PAR Thursday
144 -
all flash
137 -
Apollo
3 -
Archiving
12 -
artificial intelligence
1 -
availability guarantee
1 -
backup
34 -
Big Data
6 -
Bloggers
32 -
BURA
24 -
BURA Sunday
125 -
chalktalk
45 -
Cloud
7 -
Cloud Bank
10 -
Cloud Bank Storage
1 -
Cloud Storage
5 -
Cloud Volumes
4 -
CommVault
1 -
Composable Data Fabric
9 -
Composable Infrastructure
9 -
Containers
11 -
Converged Infrastructure
9 -
Converged Storage
18 -
ConvergedSystem
6 -
Customer
18 -
Customer Share Secrets
11 -
data protection
138 -
Data Security
4 -
Dell
4 -
demo
13 -
disaster recovery
5 -
Discover 2017
10 -
Discover Archives
126 -
Discover2017
5 -
Docker
1 -
EMC
14 -
entry storage
22 -
Ethernet Storage Fabric
3 -
Exchange
1 -
file storage
14 -
flash storage
90 -
flash-optimized
7 -
Gartner
11 -
Hadoop
3 -
Hewlett Packard Labs
4 -
high performance computing
2 -
HPC
2 -
HPE Complete
1 -
Hybrid IT
17 -
hyper-converged
40 -
hyperconverged
1 -
InfoSight
13 -
LTO
14 -
Microsoft
17 -
Microsoft Azure
2 -
MSA
21 -
Nimble Storage
43 -
Object Storage
7 -
OneView
6 -
OpenStack
8 -
Oracle
7 -
Peer Motion
2 -
Peer Persistence
4 -
persistent storage
3 -
podcast
74 -
predictive flash storage
1 -
Priority Optimization
3 -
ProLiant
10 -
ransomware
1 -
RDX
5 -
RMC
51 -
SAP
10 -
SAP HANA
5 -
Scality
2 -
Scality RING
2 -
SDS Saturday
77 -
Serviceguard
1 -
SFA
2 -
SimpliVity
2 -
Smart SAN
5 -
SMB
16 -
software defined storage
57 -
SQL
5 -
SQL Server
1 -
SSMC
3 -
storage media
2 -
Storage News
36 -
storage predictions
1 -
storage supplies
2 -
storage trends
1 -
storage webinar
4 -
StoreEasy
13 -
StoreEver
62 -
StoreFabric
14 -
StoreFront
4 -
StoreFront Remote
1 -
StoreOnce
143 -
StoreOnce Catalyst
4 -
StoreOnce VSA
11 -
StoreOpen
2 -
StoreServ
29 -
StoreVirtual
57 -
StoreVirtual 3000
3 -
StoreVirtual VSA
45 -
SUSE
1 -
synergy
1 -
System Center
1 -
Tape
70 -
TechDay
5 -
The Machine
1 -
tNAS
8 -
Training
2 -
Transformation
1 -
Veeam
25 -
Veritas
2 -
virtualization
4 -
VMware
43 -
VMworld
14 -
VSA
21 -
Webinar
1 -
World Backup Day
1 -
XP Disk Array
10


Hewlett Packard Enterprise International
- Communities
- HPE Blogs and Forum
© Copyright 2018 Hewlett Packard Enterprise Development LP