Big Data
Showing results for 
Search instead for 
Do you mean 

The Goldilocks Scenario: Finding Big Data technologies which are “just right” for business

BigData_Guest on ‎04-13-2015 12:34 PM

Guest blog post by Walt Maguire, HP Software Big Data Chief Field Technologist

 

This week I had the pleasure of reading an interview with Dr. Michael Stonebraker by Barron’s.  I’ve met him several times, and am always impressed by what a truly sharp guy he is.  His commentary in the interview crystallized something about big data technologies that’s been on my mind for a while.  Goldilocks was onto something with the “just right” porridge. 

 

Let’s recap Dr. Stonebraker’s points and frame them up in “Goldi-speak”.

 

Row oriented technologies (i.e. Oracle, DB/2, etc.) are outdated – both in terms of their technology, and in terms of their business models.  This insight is spot on.  We hear this every single day as we talk with our customers around the world.  No one would argue that the rate of technological change is accelerating.  Technologies built for data measured in megabytes just can’t handle a petabyte scale challenge.  And a company with thirty years of legacy code used by thousands of companies just can’t walk away from that revenue.

 

We in the big data space have something of a unique problem. The first impact of rapid technology change has been to deliver devices which do good things… and also generate massive amounts of data.  The challenges faced by those of us in the big data space are driven by a secondary impact.  Mountains of new data have appeared blindingly fast – spewed out by smartphones, pads, smart sensors, smart cars, smart appliances, etc.  So the vendors of existing data management products have been behind the curve and trying to catch up by doing things like throwing hardware at the old database or hacking a row storage engine into thinking it’s a column engine.  But just like bolting a jet engine to a Yugo doesn’t make it a jet, hacking your software product after the fact doesn’t make it a big data platform.

 

So, that porridge is too cold.

 

NoSQL and some of the Hadoop technologies are trying to solve problems which are in many cases already solved.  Many of us in the industry have watched the NoSQL mantra morph from anti-SQL to SQL-is-coming-soon.  The NoSQL technologies generally appeared to solve fairly specific problems.  The thinking behind them seemed to be “let’s write this new thing because it targets an under-served use case” such as dealing with schema-free (or schema-lite) data like JSONs.  There was a brief period six or so years ago when that was true.   Similarly, the notion of putting the “divide & conquer” Map/Reduce framework on top of a distributed filesystem (HDFS) ready and able to  handle things like drive failures, while scaling to very large volumes is what drove the early development of Hadoop, and that made a lot of sense.

 

These technologies are definitely making a move to go where the money is – enterprise business users.  And business users want their SQL.  NoSQL’s early uptake was largely driven because developers hate schemas. Need to push an easy button to auto-schematize your JSON & query it?  That’s a great use case.  In fact, mature technology exists for that today.  And your analysts can use it with SQL.  While NoSQL gets points for cool, so does Esperanto.  But cool doesn’t mean that businesses will base their future on it.  Similarly, the Hadoop community has a number of threads of work underway which are trying to move it away from being a batch-oriented data processing platform towards becoming – wait for it – an interactive parallel-processing data platform.  Great idea.  So great in fact that there are already technologies which solve that exact problem, and which are years ahead in development.  The simple (not really so simple!) truth here is choosing the right tool for the job.  There are a lot of jobs for Hadoop and success is choosing the right use cases to get the greatest value from the Hadoop innovations.

 

The tempting question to ask at this point is – if technology change is accelerating, doesn’t that mean that the most recently built technology is better? Not necessarily. Big data and acceleration in the overall rate of technology change haven’t changed certain fundamentals about data technologies.  Distributed processing is by nature a loosely defined problem.  Can we take Postgres, hack and interconnect onto it to make it think it’s a massively parallel database, and start doing stuff?  Yep.  Until the user tries something it’s not ready for.  These are “edge cases” – types of usage that haven’t been tried on the technology before.  The only way to solve most edge cases is to play “whack-a-mole” – something which takes years and a large investment of effort.  Can’t scale past eight racks because the interconnect can’t address that many nodes?  Time to rewrite the interconnect! Can’t process that nested group by query efficiently?  Time to change the optimizer!  Optimizer too hard to modify with new execution operators?  Time to invent a new optimizer!  Again, there are no lack of tools and no lack of jobs, but a lot of mistakes in matching the two together.

 

So this porridge is too hot.

 

In this case, “just right” for businesses means technology which meets all the needs of a business without creating undue risk.  Not too long ago, I was part of a group tasked with distilling out the key things big data technology needs for a “Goldilocks scenario”.  It boils down to a pretty short list:

  • Enable the organization to harness 100% of the data
  • Allow for the development, delivery and consumption of insights anywhere
  • Deliver scale, speed, stability and functionality without compromise
  • Be open, extensible and developer-friendly
  • Provide economics which match the use case

And we have technology which meets these needs right now.  Not three years from now.  Today.  Big data appeared quickly, but not overnight.  There were companies such as Vertica (founded by Michael Stonebraker) and Autonomy who began developing product in the early days of big data.  Not so long ago that the technology is antiquated.  Not so recently that they’re only halfway there.  But far enough into the big data era that smart engineers and visionary leadership could build platforms which both meet needs today and serve as foundations for the future.  And we fully recognize that there is and will continue to be a larger ecosystem of data processing technologies – Hadoop plays a very key role, and it’s going to continue to find adoption for what it does well, which is why we’re invested.  We put a great deal of time & effort in to making sure we play well with the ecosystem via partnerships, integration, engineering, etc.  When a customer is already using Hadoop or a NoSQL technology (which they sometimes are), we bring Haven OnHadoop to the table to help them maximize value.  It’s a big and diverse world, and we aim to provide many options for our customers.

 

It’s early to say what these technologies will look like in a decade, but it’s safe to say that things will be very different.  Data complexity and growth rates are only increasing, and advanced analytics are becoming foundational core competencies for businesses around the world.    And at the end of the day, from the perspective of business leaders the real question is: how do I get maximum positive impact with minimal risk to my business?

 

This is where HP is “just right”.  Not only are we committed to technologies which are core to big data such as columnar databases, knowledge platforms, predictive analytics, and so on, but we’re also investing in next generation memory technologies as well as database innovations to optimize their use.  And our product plans are oriented towards a big data platform, not just a collection of technologies. This will allow companies to solve big data problems comprehensively, and with a cost & effort equation which works for them. 

 

Learn more about HP Haven

 

Walter Maguire (@waltermaguirehas twenty-eight years of experience in analytics and data technologies.  He practiced data science before it had a name, worked with big data when "big" meant a megabyte, and has been part of the movement which has brought data management and analytic technologies from back-office, skunk works operations to core competencies for the largest companies in the world.  Now as Chief Field Technologist with HP’s Big Data Business Unit, Walt has the unique pleasure of addressing customer needs with Haven, the HP Big Data platform.

 

#HPBigData

 

Read more articles from the HP Big Data team:
The Big Data shift in a data-driven world by Colin Mahony
Big Data is changing everything by Andrew Joiner
How does Big Data change physical security? by Joe Leung
Big need in Big Data & Government: Forge stronger links between Federal & the best minds in business by Lewis Carr

0 Kudos
About the Author

BigData_Guest

Events
June 6 - 8, 2017
Las Vegas, Nevada
Discover 2017 Las Vegas
Join us for HPE Discover 2017 in Las Vegas. The event will be held at the Venetian | Palazzo from June 6-8, 2017.
Read more
Each Month in 2017
Online
Software Expert Days - 2017
Join us online to talk directly with our Software experts during online Expert Days. Find information here about past, current, and upcoming Expert Da...
Read more
View all