Big Data
Showing results for 
Search instead for 
Do you mean 

Analytics for Human Information: The New Top Ten Myths of Big Data - Myth #4

ChrisSurdak ‎10-18-2013 05:00 AM - edited ‎02-19-2015 01:39 PM

In this installment of our “New Myths” (check out Big Data Myth #3, if you missed it) I’m going to break ranks and point out that the emperor’s new clothes aren’t providing much cover.  In direct conversations with literally hundreds of colleagues, customers, analysts and pundits over the last six months, I’ve noticed that this myth is amongst the strongest out there, and it is one that needs to be put into proper perspective before the trough of disillusionment becomes too deep for all of us to climb out.


Big Data Myth #4: If You Are Doing Hadoop, You’re Doing Big Data


Every technical revolution has its starting point, its catalyst, and certainly forBig Data that has been Hadoop.  This brilliant technical platform was created just a few short years ago, in order to address a very specific, very focused “Big Data” issue; how to distribute (then) terabytes of data, across a network of computers in such a way that this large amount of information could be processed and the answer to a particular problem determined.


At its heart, Hadoop is a MapReduce engine, and this name is properly descriptive.  Hadoop takes a huge chunk of data and distributes it across dozens, hundreds or thousands of computers, and maintains a map of where all of the data resides. Then, each of these computers, or nodes, is given an analysis to perform, consistent across every node.  This analysis is a “Reduce” operation, where the data is processed so that the computer can reduce or collapse the data into a simple output.  Each node performs the reduction of its own data, creates its own output, and returns it to the central controller, which combines all of the results to come up with a final solution.


Hadoop is powerful, scalable, and generally extremely fast, as long as the question that you are looking to have answered is fairly simple, is linear in nature (rather than iterative), and can be processed as a batch.  If you hold yourself within these limits, you’ll get a great deal of value from Hadoop.


These constraints, however, are the source of potentially-significant limitations to Hadoop’s usefulness for many business needs.  Say that you are looking to ask fairly sophisticated, complex questions; something beyond “yes” or “no”, what’s the biggest, smallest or average etc.  “Reduce” works great, as long as what you are asking is itself a simple reduction of the data.  More complex questions require much more sophisticated coding, and a lot more computational time.


Perhaps you need to consume large amounts of real time data and create real-time results  Hadoop would respond “Sorry, I’m a batching technology,” and at some point, you can batch only so frequently and only so fast.  So, real time is a real issue. You might notice this with websites such as LinkedIn or Klout, where updates come every day or so.  If you want to know who looked up your profile a second ago (and might still be on your page) you’ll have to wait for the next batch to finish, oh, sometime tonight!


Finally, let’s say that you need to do a iterative analysis; taking multiple cuts across a data set in order to ferret out some more subtle relationships.  Sorry again, but MapReduce does what it does one “reduction” at a time, and so iterative or recursive analyses are out, at least for the time-being. 


None of this is to slam Hadoop, rather it is an acknowledgement of what it was built to do, what it is good for, and what it is not designed to do… yet.  This last point is important, because there is an army of developers out there expanding upon Hadoop’s capabilities, and they are working hard to deal with some of these inherent shortcomings. This new functionality is on the way, but in the interim, businesses need to look beyond Hadoop in addressing certain “Big Data” projects.


And so they are.  Many of our customers are using the technologies in HP’s HAVEn platform to extend Hadoop’s capabilities and in so doing they are able to ask, and answer, questions that they otherwise could not. They recognized early on that Hadoop, like any technology, has both inherent benefits and limitations, and as a result they needed to embrace additional technologies that compliment Hadoop.  In this, they are realizing significant business capabilities and outcomes that resonate with their customers.


In closing, if the depth and breadth of your “Big Data” efforts start and end with Hadoop, you very likely are not doing “Big Data”, or at least you’re doing it with one hand tied behind your back.  To improve your results, look to Hadoop’s inherent strengths and weaknesses, map those to the business problems that you’re trying to address, and build out a technology platform where there are functional gaps in what Hadoop is providing today. I’ve written more about Hadoop in a previous post.


Gotta run.  I have to batch-process a bunch of tweets and Facebook posts out to my friends and colleagues…Not!


In Myth #5 we will explore why the members of Motley Crue have signed up for ComSci courses online! (Just kidding, I think). 


Click below to continue reading about The New Top Ten Myths of Big Data :

0 Kudos
About the Author


Chris Surdak is a Subject Matter Expert on Information Governance, analytics and eDiscovery for HP Autonomy. He has over 20 years of consulting and technology experience, and holds a Juris Doctor from Taft University, an MS from the Wharton School at the University of Pennsylvania, a CISSP Master's Certificate from Villanova and a BS in Mechanical Engineering from Penn State. Chris is author of the Big Data strategy book, "Data Crush," which was recently nominated as International Book of the Year for 2014, by GetAbstract. Chris is also contributing editor and columnist for European Business Review magazine.

27 Feb - 2 March 2017
Barcelona | Fira Gran Via
Mobile World Congress 2017
Hewlett Packard Enterprise at Mobile World Congress 2017, Barcelona | Fira Gran Via Location: Hall 3, Booth 3E11
Read more
Each Month in 2017
Software Expert Days - 2017
Join us online to talk directly with our Software experts during online Expert Days. Find information here about past, current, and upcoming Expert Da...
Read more
View all