Behind the scenes at Labs
cancel
Showing results for 
Search instead for 
Did you mean: 

HP launches a new tool for predictive analytics, optimized for big data and powered by HP Labs

Guest_Blogger

Contributed by Simon Firth, freelance technology journalist

 

dr-logo-(2)_resize.jpg

 

Indrajit-Roy_Discover-Barce.jpgAt this week’s Strata + Hadoop World conference in San Jose, California, HP unveiled HP Haven Predictive Analytics, an open source solution that massively reduces execution times for statistical analysis and allows users to analyze much larger data sets than is currently possible.

 

The offering is powered by Distributed R, a new extension to R, the most-used open source tool in data analysis. First conceived in HP Labs and then commercialized in a collaboration with HP Software, Distributed R addresses the need for analytics solutions that can handle modern, data-intensive environments, says Distributed R architect and HP Labs Principal Researcher Indrajit Roy.

 

“Millions of data scientists use R in everything from cancer research to graph analysis,” Roy notes. “But R doesn’t scale and it features almost no parallel algorithms, which has seriously inhibited R’s users from taking advantage of the power of distributed computing. Distributed R helps overcome those limitations.”

 

The HP Labs project to extend R grew out of research aimed at improving machine learning algorithms. Roy’s team realized that many machine learning and graph algorithms had matrix operations at their core – an insight that led them to focus on building a system that was good at matrix operations and could provide simple API’s to the user.

 

“A good example of a matrix operation is distributed regression, which is used widely by customers,” Roy notes. “We decided to implement this new capability by breaking down the problem into smaller matrix problems. R already provided very good libraries to perform matrix operations. So we were looking to find a way to both scale R to multiple nodes and reduce the overhead of sharing data on single node – and Distributed R was the result.”

 

Since data scientists must first extract their data from a database like HP Vertica before they can analyze it, the HP Labs and HP Software team also enabled super-efficient data loading from HP Vertica into Distributed R, as well as the ability to deploy machine learning models in HP Vertica, so that customers can use the database itself for predictive analytics.

 

In 2013, the HP Labs research was shared internally with HP Software executives Colin Mahony and Shilpa Lawande, who decided to fund a collaboration between Labs and HP Vertica to create a product offering that took full advantage of Distributed R.

 

HP Haven Predictive Analytics is a fruit of that collaboration – and the industry’s first open source version of a distributed platform for R explicitly designed to address predictive analytic tasks in very large data sets.

 

The offering works seamlessly within the HP Haven platform and boosts overall data access performance sometimes by more than 5x. It also brings the power of predictive analytics to a new, broader community of developers without requiring that they learn a new technology or tool, and it comes pre-loaded with parallel algorithms that maintain consistency with standard R packages and enable users to easily migrate scripts.

 

At the same time, Roy and the Distributed R team are continuing to work with the open source R community on extending the utility of R for the era of big data. They recently hosted a workshop on “Distributed Computing in R” at HP Labs, for example.

 

“Unlocking actionable insights from data is an increasingly complex undertaking,” suggests John Sontag, Vice President and Director of the Systems Research Lab. “But as we look ahead, data is only going to get bigger and come at us faster. Tackling that successfully requires developing systems as open platforms – so we’re looking forward to working with our peers around the world as we continue to make Distributed R a richer and more valuable tool.”

 

Watch this video for an overview of HP Vertica Distributed R:

 

 

HP_Labs_insignia_developed-with_blue.jpg

0 Kudos
About the Author

Guest_Blogger

Comments

Well done. My congratulations specially to Indrajit Ray, the principle reserecher.This new tecnological innervation will no doubt benefit many reserchers,scientists , stasticians and others. I am looking forward for more such innervation.

Once again a big 'CONGRATULATION'



 

 

 

This is indeed the future. Many are talking about Big Data revolution about to happen or is happening, but HP getting on board is a welcome move. Congratulations to whole team.