Behind the scenes @ Labs
Showing results for 
Search instead for 
Do you mean 

HP Labs and HP Vertica enhance R to simplify Big Data processing

‎03-06-2014 06:18 PM - edited ‎09-30-2015 07:02 AM

Contributed by Indrajit Roy, HP Labs principal researcher and technical lead for the Distributed R project


Editor’s Note: Distributed R began at HP Labs as a summer internship project in 2011. During the last three years a dedicated team of HP Labs researchers and HP Vertica developers has continued to work on the project and developed the technology to the point where it has now been transferred to HP Vertica’s marketplace for commercial use.



From left to right: HP Labs researchers Vanish Talwar, Alvin AuYoung, Rob Schreiber, and Indrajit Roy. Not in the photo: interns Shivaram Venkataraman, Erik Bodzsar, and Kyungyong Lee



icon_1068_1063.pngData scientists are key to unlocking actionable insights from data – a task that’s becoming increasingly complex as we tackle ever larger sets of both structured and unstructured information. At HP, we realize the need to empower data scientists in the ‘Big Data’ era. To that end, HP Vertica announced last month the debut of Distributed R, a platform developed in HP Labs to run complex machine learning, statistical analysis, and graph processing on a Big Data scale.


 Every data scientist has his or her favorite analysis tool. For the last decade, the statistical programming language R has been a popular choice – it’s open source and used by millions. However, R has multiple limitations when applied to Big Data. The main issue: R does not scale and it features almost no parallel algorithms. 


With Distributed R, we have overcome many of R’s limitations. Using the new platform, data scientists can continue to use the familiar R environment while benefiting from parallel algorithms and a scalable, high-performance environment. For data scientists unfamiliar with distributed programming, Distributed R simplifies how a cluster of servers can be used to complete analyses in a matter of minutes.


Distributed R started as an HP Labs summer internship project in 2011. Its aim was to run machine learning and graph algorithms on really large datasets, billions of records and terabyte-scale data. We succeeded in doing that and more, with the technology now being transferred to HP Vertica for commercial use.


HP customers can already use databases like HP Vertica to store and efficiently analyze data using SQL. With the addition of Distributed R, they can perform complex analyses on top of HP Vertica. For example, healthcare customers can use fast, ad-hoc queries in HP Vertica to perform patient analytics, discover business trends, and comply with regulations. To model patient health and predict complications, analysts may need to run clustering and classification algorithms that are not easily expressed in SQL. These algorithms can now be run using Distributed R.


While Distributed R can be used as a standalone platform with any backend store, the combination of HP Vertica and Distributed R has multiple benefits. Users can perform SQL analysis and pre-processing in HP Vertica, do their complex modeling in Distributed R, and run predictions in-database. This integrated approach offers a convenient way to deploy and manage the full life-cycle of data analysis.


Distributed R is currently in beta and available for free on the HP Vertica marketplace. HP Vertica and HP Labs are working closely to improve the software and add more features.


Our vision is to continue to develop the system as an open platform for data mining. We look forward to community engagement and welcome your contributions as we develop Distributed R further.


Sign up for the webinar about HP Vertica Distributed R on March 11. 


Photography by Serge Vejvoda

0 Kudos
About the Author


Nov 29 - Dec 1
Discover 2016 London
Learn how to thrive in a world of digital transformation at our biggest event of the year, Discover 2016 London, November 29 - December 1.
Read more
Each Month in 2016
Software Expert Days - 2016
Join us online to talk directly with our Software experts during online Expert Days. Find information here about past, current, and upcoming Expert Da...
Read more
View all