Transforming IT
Showing results for 
Search instead for 
Did you mean: 

Big Data — The plan comes together


By Donald Livengood, HP Distinguished Technologist/Strategist


I love it when a plan comes together…or, in the case I’m about to describe, when a plan is validated.


Sizing Hadoop: Don’t sweat it


A lot of work has gone into HP’s AppSystem for Apache Hadoop and into the Reference Architectures. The Reference Architectures were announced and discussed at HP Discover 2012 and, up until this year, many customer were still sweating bullets over sizing servers for their Hadoop clusters. For many this is a natural worry, since many of the MPP and BI systems, or other technologies they’ve worked with, really needed to be closely thought out and sized correctly; otherwise production problems would surface quickly.


Hadoop isn’t the same. It scales out beautifully in a linear fashion. However, based on past history and experience, many customers had not bought into the concept of “Need to speed up your tasks? Just add more nodes!” Instead, they wanted to get it right the first time. I completely understood that mindset, having had relatively recent experience in a space that did require thinking of the end-state up-front (VDI)…as well as a long history in the industry.


I say “had not bought into” because this year it seems to be changing. I’ve had multiple customers say that they aren’t getting wrapped around the axle with detailed sizing. Instead, they are taking our reference architecture(s) as-is, or slightly modified, and buying enough servers that they believe will handle their initial workloads; and if they don’t have it perfect, they’ll just add more nodes.


I had one customer tell me specifically that, in hind-sight, they probably wasted 10-12 weeks of work tweaking their own reference architecture and conducting performance testing of their design – only to find that their Hadoop cluster grew faster than they anticipated and they’d have been better off buying our AppSystem for Apache Hadoop or just implementing our reference architecture up front. They then could have focused more on the data ingestion and processing rather than spending so much time on the system architecture. Like I said, it’s nice when a plan comes together.


Clearly there is still a need to monitor a Hadoop cluster closely. Most implementations do make several tweaks related to Hadoop configuration files, JVM configuration, Operating System configuration/tuning, and even some BIOS changes. That’s going to be the case for most Hadoop implementations and will be driven by specific workloads in the environment.  


Consider combining technologies to create Big Data solutions


There was another “Ah-Ha” moment at a recent roadshow event. The event had a Hadoop focus, but during side meetings the topic of replacing everything with Hadoop or mixing-and-matching came up. As it turns out, for most of the customers, a combination of technologies is the way to go.


With Hadoop, customers have the option of loading up data from new, irregular data sets into Hadoop, analyzing them and, perhaps, transforming those datasets into structured results that can be consumed by other systems. A similar use of Hadoop is to load more structured datasets and perform exploratory analytics on them and then, once interesting results are obtained, move the on-going detailed analytics of that data to a specialized system.


In fact, these two use cases were discussed by one of the customers at the roadshow and, in this particular case, the “other system” was Vertica. The “Ah-Ha” for me here was around Vertica creating integration points to Hadoop. Autonomy has provided similar connectivity to Hadoop and, interestingly, Vertica and Autonomy also have connectors to each other. Another validation of the plan.


Don’t forget the surrounding infrastructure

Big Data technologies, whether Open Source or commercial products, all need to behave as expected and function as proper tenants of the datacenter. Governance of the data in the solution and ensuring protection and compliance are critical. Integration of the solutions into the operational model will need to take place. This will include integration into the Identity & Access Management systems as well as ensuring sensible and practical backup and recovery processes are in place. 


Monitoring of the performance and capacity of these new systems at the application, server, storage, and network layers is critical and essential for IT to consider as these new technologies enter the data center. As I stated earlier, these systems need to behave as proper tenants of the datacenter and should be assessed and analyzed with the same critical eye.


How do you cover all of these topics? It turns out that HP has announced services at Discover 2012 in this space to address some of these areas. (Read how HP can help you deploy a successful roadmap for Hadoop). Discover 2013 will likely unveil more services in this space. Again…more of the plan coming together.


 As HP Discover 2013 approaches, I expect you’ll see HP continue to provide hardware, software, and services that can be combined to solve Big Data challenges. The plan continues to evolve.


Feel free to chime in!


To learn more about me and how I can help your organization develop a Big Data plan, click below.


Donald Livengood, HP Expert.jpg

0 Kudos
About the Author


28-30 November
Madrid, Spain
Discover 2017 Madrid
Join us for Hewlett Packard Enterprise Discover 2017 Madrid, taking place 28-30 November at the Feria de Madrid Convention Center
Read more
HPE at Worldwide IT Conferences and Events -  2017
Learn about IT conferences and events  where Hewlett Packard Enterprise has a presence
Read more
View all