Grounded in the Cloud
Showing results for 
Search instead for 
Did you mean: 

Offloading and Accelerating Data Warehouse Processing Using Hadoop


Guest Author: Michele Nemschoff, VP Corporate Marketing, MapR


Over the years, the amount of transaction data has grown dramatically and so has the volume of unstructured data - because of which, the challenge of data extraction, transformation and loading (ETL) has become more and more complex. Efficient and scalable ETL processes have to address the needs of not only the data warehouse but of all analytical processes.


As a result, ETL has had to evolve.


An early evolution shifted the last two steps to make the process ELT instead of ETL. This shift, which was usually done using an MPP SQL database, allowed the transformation to take place after the loading so that more scalable MPP SQL could be used to transform the data.


As pointed out in the white paper, Offloading and Accelerating Data Warehouse ETL Processing Using Hadoopby Mike Ferguson of Intelligent Business Strategies, the next evolution of ETL is for Hadoop to play a central role. Hadoop is the perfect enterprise data hub, a low-cost yet powerful system ready to take on the role of ETL operations for the modern data warehouse.


First, the cost advantages of using Hadoop for ETL are significant, with savings of an average of 20x up to 50x, according to industry benchmarks. Then there’s the analytical power behind Hadoop, driving exploratory analysis of un-modeled, multi-structured data, and extreme analytics—for example having to run numerous scoring models at the same time on millions of credit card accounts to detect fraud.


MapR is a Hadoop vendorthat has enhanced its Hadoop distribution to handle new big data analytical workloads as well as offloading ETL processing from data warehouses. MapR’s solutions can be used for larger datasets and advanced scalable analytics. (Pages 7 to 12 in the white paper delve into how MapR helps scale ETL and makes the process more effective.)


Companies should consider offloading ETL processing to a lower-cost Hadoop platform, where it can scale to manage increasing transaction volumes as well as integrate this data with new more complex, high-value data types such as clickstream data and un-modeled multi-structured data.


MapR’s M5 and M7 Hadoop distributions include features such as Self Healing High Availability of all critical cluster services, MapR Direct Access NFS, snapshots for online point-in-time data recovery, automatic data compression, and disaster recovery.  It’s an ideal platform for offloading ETL processing from data warehouses as well as serving as a platform for big data analytics on a variety of data.

Senior Manager, Cloud Online Marketing
0 Kudos
About the Author


I manage the HPE Helion social media and website teams promoting the enterprise cloud solutions at HPE for hybrid, public, and private clouds. I was previously at Dell promoting their Cloud solutions and was the open source community manager for OpenStack and at Rackspace and Citrix Systems. While at Citrix Systems, I founded the Citrix Developer Network, developed global alliance and licensing programs, and even once added audio to the DOS ICA client with assembler. Follow me at @SpectorID

Jan 30-31, 2018
Expert Days - 2018
Visit this forum and get the schedules for online HPE Expert Days where you can talk to HPE product experts, R&D and support team members and get answ...
Read more
See posts for dates
HPE Webinars - 2018
Find out about this year's live broadcasts and on-demand webinars.
Read more
View all