Grounded in the Cloud
Showing results for 
Search instead for 
Did you mean: 

GreenButton Simplifies Hadoop


Guest Author: Christian Smith, Solution Architect, GreenButton


As the quantity and complexity of unstructured data increases, so does the need to process it.  Businesses are finding clever and innovative ways of turning this data into a source of revenue, not directly, but through a better understanding of their business, their customers and the habits exhibited by them.


Because of this growth, we’re often finding the traditional data analysis techniques aren’t up to the task and as a result of this Hadoop has become a leader in the Big Data and Analytics spaces.


While the Hadoop ecosystem offers many valuable tools like MapReduce, HBase, Hive, PIG and Oozie, setting these up and running your jobs can be a daunting task.  The Hadoop components require significant capital for the hardware and ops, and raise numerous questions like ‘how much hardware do I need?’  It would be nearly impossible to cater for all the potential workloads, and if you did you would probably end up with a lot of idle resources.  If you aim too low, you end up with contention over the limited resources.  What businesses really need is a managed Hadoop service that scales to their needs, enabling them to use the resources required for a particular job.  This approach also mitigates the contention and fighting over resources.  Need to run more than one job?  No problem, just create a new cluster or scale up an existing one.  When you’re done, you can just delete the cluster.


Another hurdle is around job submission and handling job dependencies and data.  Currently the story around Hadoop’s job submission and monitoring are a little rough (although improving).  Typically you would execute MapReduce jobs directly on a Hadoop node or submit a workflow via Oozie but this requires you to ensure all job dependencies are available on each of the slaves which can be a tedious and error prone process.


Governance is another major problem.  How do I calculate the Hadoop resources used by my marketing department over the last month?  How many hours did Bob use?


This is where GreenButton comes in.  GreenButton has been simplifying HPC applications for years and has now extended this knowledge to the Hadoop stack, offering:


  • Easy and automated provisioning of on-demand or permanent clusters
  • Easy job submission, monitoring and gathering of outputs via Mission Control
  • Governance tools to identify and track resource usage
  • Tools to manage and monitor your cluster(s)
  • High performance data synchronization with CloudSync



The GreenButton solution simplifies Hadoop provisioning in the cloud of your choice, e.g. HP’s OpenStack, Azure, Amazon etc. 


As a customer you can choose between on-demand and permanent clusters.  On-demand clusters are provisioned by GreenButton when a job is submitted, and removed when a job completes.  All job outputs are persisted in Swift storage and can be accessed at a later time.


Permanent clusters can be provisioned and deleted via the GreenButton API, as required.  Jobs can then be submitted to specific clusters, or any available clusters.  A Permanent cluster can also be dynamically scaled up and down to accommodate larger jobs as needed.


Job Submission & Monitoring


GreenButton has a mature RESTful API[1] that allows you to easily submit jobs and their required assets.  Job progress can be monitored and job outputs can be downloaded via the API or Mission Control.  Mission Control can be used to get an overview of the cluster CPU, memory and IO during the lifetime of a job and its tasks.




Governance with Mission Control


GreenButton’s Mission Control provides governance controls to manage costs across cloud deployments with spending limits and allows you to breakdown costs so usage can be charged back to departments and projects in your organization.






Cluster Management & Monitoring


Each Hadoop cluster includes a dedicated Ambari[2] deployment for monitoring the clusters health, resources, start or stop services and make changes to Hadoop configuration.  Ambari includes a Web UI which visualizes the various real-time metrics including memory, CPU, IO, JVM, Map & Reduce slots, and many more.




The Hadoop User Experience - Hue


Each cluster is deployed with a dedicated instance of Hue[3], an easy to use web application for the Hadoop ecosystem including HDFS, MapReduce, Oozie, PIG, Hive, and HBase.  Hue provides a full-featured user interface making it really easy to get started with Hadoop, including drag & drop creation of Oozie workflows and one-click submission.




The Hue website offers good tutorials covering the different components and latest Hue features.


Data Synchronization with CloudSync


One of the hurdles with running ‘Big Data’ workloads in the cloud, is getting that Big Data to the cloud so that it can be processed.  GreenButton has developed its CloudSync product for just this problem, allowing customers to easily synchronize data between local storage and the cloud, or between cloud services. It even supports ETL from custom data sources. CloudSync makes use of the GridFTP protocol to facilitate large, parallel data transfers over UDP.




You can read more about the GreenButton’s CloudSync product here:


More Information and Feedback


If you would like to try out these services or like more information on them, feel free to contact us.  Additionally, if you have any feedback or suggestions we’re always happy to hear them.


About GreenButton™ Limited


GreenButton™ is an award winning global software company specializing in On-Demand cloud computing. GreenButton delivers a turnkey solution for cloud-enablement, synchronizing data and bursting apps to the cloud. Enabling enterprises and independent software vendors (ISVs) to move to the cloud and access cloud resources. GreenButton provides a multi-purpose cloud platform for development and delivery of software and services. GreenButton's Cloud Fabric empowers users across all industries including digital media, engineering, oil and gas, financial and biotech, to leverage supercomputing power. With GreenButton's Mission Control dashboard, cloud-based applications across multiple private and public cloud platforms can be easily managed from one centralized and user-friendly interface with rich usage reporting and governance controls.


GreenButton is Microsoft Corp's 2011 Windows Azure ISV Partner of the year and the company's offices are located in New Zealand and the US. For more information, please or follow GreenButton at


GreenButton is a service mark and trademark of GreenButton Limited. All other product names, service marks, and trademarks mentioned herein are trademarks of their respective owners.


[1] Details of the GreenButton Job API are available here


[2] Further details on Ambari can be found on their page


[3] Further information on Hue can be found here

Senior Manager, Cloud Online Marketing
0 Kudos
About the Author


I manage the HPE Helion social media and website teams promoting the enterprise cloud solutions at HPE for hybrid, public, and private clouds. I was previously at Dell promoting their Cloud solutions and was the open source community manager for OpenStack and at Rackspace and Citrix Systems. While at Citrix Systems, I founded the Citrix Developer Network, developed global alliance and licensing programs, and even once added audio to the DOS ICA client with assembler. Follow me at @SpectorID

See posts for dates
See posts for locations
HPE at 2018 Technology Events
Learn about the technology events where Hewlett Packard Enterprise will have a presence in 2018.
Read more
See posts for dates
HPE Webinars - 2018
Find out about this year's live broadcasts and on-demand webinars.
Read more
View all