Big Data
Showing results for 
Search instead for 
Do you mean 

Top 5 reasons to consider a different SQL on Hadoop solution like HP Vertica

‎09-28-2015 01:37 PM - edited ‎11-03-2015 07:53 AM

Recently, HP has been working with partners like MapR and Hortonworks to offer their venerable SQL engine based on the columnar store database Vertica. The partnership allows Vertica to be installed and run on Hadoop, providing all the power of our mature engine to coexist with Hadoop. But with so many SQL on Hadoop solutions in the market, like HIVE, Stinger and Impala, is there really a need for such a solution? Who would want this?

 

Let’s take a look at the big reasons why you should consider this solution.

 

1. Complete TPC-DS Benchmarks

The Vertica engine has been around for more than a decade and when it comes to complete SQL standard capabilities, there is no compromise. Buyers can ensure that their database will perform any and all analytics they would need with the TPC-DS benchmarks. The benchmarks represent queries of various operational requirements and complexities (e.g., reporting, data mining) and are often the queries that challenge your CPU and I/O.  Therefore, Hadoop SQL on Hadoop solutions typically can only perform about 50-60% of these queries to completion. Vertica, on the other hand, can complete 100% of these benchmark queries, giving you some indication of how it will work in real-world workload scenarios. 

 

2. You Don’t Have to Move the Data

Vertica installs natively on the Hadoop nodes without the need for helper nodes. In working with Hadoop Vendors, HP has developed some great integration points, including:

  • Very fast ways to read ORC files, a data format that you likely already use. We specifically have optimizations for ORC files, offering features like predicate pushdown, column pruning, partition pruning and segment pruning to boost performance.
  • Formats like Parquet, JSON, AVRO and other formats are supported via a HIVE serializer or our Schema-on-Read technology.
  • For MapR, the system sees very little difference between our own proprietary operating systems and MapR’s NFS. You can use a MapR NFS volume for the main Vertica storage.

The bottom line here is that you don’t have to move the data or transform it in any way. If you want to perform big data analytics with Stinger and later use Vertica to do something with greater complexity, you can with no cost to move the data. If you want to do joins across any of these formats, you can.

 

3. Sometimes a Database is what you need

We often hear that Hadoop is not a database. When performing data analytics and democratizing your data to a larger group of data consumers, there are times when you need one, such as:

  • Metadata management – Databases are better at providing vital information about data such as where to find it, how it got where it is, its lineage and other important contextual information about the data.
  • Leveraging the power of a cluster to perform in-database transformations without a lot of PIG scripts and other messy code.
  • Using all the optimization of Vertica to boost analytical performance with advanced analytics to meet your SLAs. A MPP, columnar database is built from the ground up to provide ways for you to get better performance with features like projections and live aggregate projection.
  • Data Governance - Providing detailed access control of data at an attribute level (aka Security), backup and restore and other key procedures for data governance tend to be more difficult without a database.

In this case, moving data into a database like Vertica Enterprise Edition is a very quick way to boost performance and provide some of the must-haves of database platform.

 

4. Concurrency

The power of big data comes from its democratization. It comes by offering it out to many users and data scientists who can think about the information in new ways. However, in our testing, when multiple applications are querying data in Hadoop, they tend to take unbelievable performance hits. Many of our customers report that even having two or three users performing analytical queries can have a severe impact on performance and cause the queries to fail.

 

On the other hand, we’ve built Vertica to handle concurrency. The control center of this operation is our own management console, which can see what queries are running and allocate the right resources for them. In addition, we’ve recently added the capability to monitor health and status of the Hadoop cluster in our management console via Ambari. It’s easier for administrators to see what’s going in both your Hadoop and Vertica clusters and make adjustments to resources when needed.

 

5. Pricing and TCO

If you think that the economics of Hadoop is so much better than a database, you may be thinking about the old days. Back in the day, database vendors charged for CPUs, connected systems, number of users and more. Today, the pricing for HP Vertica for SQL on Hadoop is available for a simple node-based price, just like the Hadoop vendors. The number of Map/Reduce and Pig programmers you need to handle Vertica make it very competitive by comparison.

 

Test Drive

If you want to take a test drive of HP Vertica, click here.

 

0 Kudos
About the Author

SteveSarsfield

Steve Sarsfield is a product evangelist and spokesperson in HPE’s Big Data Software Business unit. He is also a big data enthusiasts and author of the book, The Data Governance Imperative. Steve has many years of experience in big analytics, information quality, big data and data governance.

Events
Connect Worldwide is Hewlett Packard Enterprise’s largest independent technology user community, and has more than 70,000 global members. Visit this p...
Read more
25 Jan 2017
Online
HPE Webinars - 2017
Find out about this year's live broadcasts and on-demand webinars.
Read more
View all