- Community Home
- >
- Services
- >
- The Cloud Experience Everywhere
- >
- Top 10 most desirable capabilities of a modern, pu...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Receive email notifications
- Printer Friendly Page
- Report Inappropriate Content
Top 10 most desirable capabilities of a modern, public cloud-based big data analytics platform
By Gopal Panchavati, Principal Cloud Architect, Hewlett Packard Enterprise
Organizations are leveraging insights from their data in a variety of ways ranging from fraud detection, to customer loyalty improvement, to disease prediction and prevention, and a host of other industry-specific use cases. The public cloud can accelerate the implementation of a big data analytics (BDA) platform, which is essential to harness value from the data.
This article explores the top 10 desired capabilities of a public cloud-based BDA platform and the considerations to keep in mind during its design and implementation. (Learn how HPE cloud consulting can help you move to, innovate on, and run your cloud environments.)
1. A secure cloud foundation
Though not a core capability of the BDA platform, a secure cloud foundation is essential to sustain its growth. It is very easy to spin up different components of a BDA platform in the public cloud with the swipe of a credit card. However, doing it right requires careful study and incorporation of industry best practices to ensure all guard rails are in place, especially those related to:
- Identity and access management
- Naming and tagging standards
- Account/subscription hierarchy
- Logging and monitoring
- Cloud security controls
- Infrastructure and network design
- Provisioning and management processes and tools.
Adherence to industry best practices ensures a secure and scalable foundation upon which the BDA platform and the big data analytics program it supports can expand and thrive.
2. Highly available and scalable storage
A public cloud-based BDA platform can cater to all hybrid big data workloads spanning edge, on-prem, and the public cloud. Storage which is highly available and scalable is an essential capability of a BDA platform. The storage could be a combination of a data lake to store raw data, an MPP (massively parallel processing) data warehouse to store readily consumable aggregated data, or a data fabric which persists data across the hybrid cloud scenario. (For more on data fabrics, see this Gartner report: Data Fabrics Modernize Your Data Integration. Requires registration to download.)
3. Highly elastic and scalable compute
On-prem big data systems are hard to maintain and scale, in addition to being capitally expensive. The public cloud CSPs offer highly elastic and scalable big data compute as a service, but may fall short in some desired capabilities. An inventory of all desired big data processing capabilities, along with a feature comparison to equivalent CSP and marketplace offerings, should be done to study the portability and cloud suitability of big data workloads.
A container management platform spanning the hybrid cloud, complemented by a data fabric, can help fill any capability gaps which the CSP is lacking. It will facilitate containerization, portability, and optimal distribution of big data workloads across the hybrid cloud and help leverage the existing on-prem investments.
4. Big data handling and support for data science operations
A BDA platform should be able to ingest and handle any type of data, big or small, structured or unstructured, binary or text, file-based or RDBMS format, coming in at any speed and volume. It should support real-time and batch data processing capabilities, and all AI/ML operations including modelling, training, and publishing. Being able to rapidly spin up and tear down the compute clusters required for such big data operations could result in significant cost savings for organizations leveraging the public cloud.
5. Self-service
A BDA platform should provide the self-service support to personas of all types – from a business analyst requiring to execute simple queries, to a data scientist who needs to access disparate data sources from his or her personal workbench.
A data mesh which helps span data silos in a federated environment via a robust data virtualization capability and/or a data fabric, complemented by a data visualization capability accessed by a tool of user choice, are critical for a successful self-service analytics capability associated with a big data analytics program .
6. Data distribution
Organizations are interested in monetizing their data via an efficient data distribution capability. A CSP-offered or custom API management solution with tight security controls serves this need. The solution needs to be scalable and elastic and protect against any DDoS attacks, and other security threats. Also, the data distribution solution needs to have mitigation plans to ensure business continuity. A scalable API infrastructure is desirable even if the services are for internal consumption.
7. Data security
All data stored in the BDA platform located in a public cloud should be protected at multiple levels, at-rest, in-transit, in-use, and via tight access controls.
A detailed mapping of all endpoints which the data traverses should be done to ensure all data hops are identified and protected. If the traffic ends in a load balancer, the standard practice is to terminate encryption at the load balancer. It is however recommended to extend the encryption beyond the load balancer for sensitive data.
8. Data discovery
Siloed organization structure creates inherent barriers which restrict the free flow and exchange of data. It reduces the visibility of data assets within the organization, and ultimately manifests in problems such as delays in procuring data, lack of authoritative data sources, ownership tussles over master data, multiple versions of datasets, duplication of work, and finally lack of trust in data sources within the organization.
A data discovery capability, such as a data catalog service which provides a searchable, security-trimmed list of the enterprise data assets, can help reduce the effect of silos, or even achieve their complete elimination. The tool should have access approval workflow and sliding expiry access capabilities for effective governance.
9. Automation
Leveraging automation to provision and manage the operations of a BDA platform is essential to the smooth and secure functioning of a BDA platform.
Automation via CSP policies or custom code helps keep the platform secure with the latest updates and patches and reduces proliferation of zombie assets (data or compute). In addition to providing security, automation cuts costs, ensures business continuity preparedness, and above all ensures repeatability, reliability and trust in the BDA program.
10. Data governance
Data governance is about the processes and controls to manage the availability, usability, security, and integrity of data. The CSPs provide native policies and other cloud native tools to facilitate governance. A public cloud-based BDA platform should fully leverage such native services to enforce regulatory compliance and internal data standards and policies, and the related processes and controls, via automation.
Also, several industry-standard data governance tools exist to help with compliance checks, data quality, meta data management, master data management, and data lineage, amongst other data governance aspects.
HPE Cloud Services: helping you build it right
The public cloud can be leveraged to get a jump start on any new big data analytics program, or to extend the capabilities of an existing program. It is easy to build a public cloud-based BDA platform, but doing it right requires careful planning and giving due consideration to foundational as well as all operational capabilities to support and sustain the big data analytics program. An assessment of the current capabilities and the gaps against future requirements would help you understand where the focus needs to be in the platform design.
If you are considering leveraging public cloud or a hybrid cloud for your analytic needs, big data analytics services from HPE can help. We can work with you to turn your data into vital insights and transform your business from edge to cloud.
Learn more about data analytics solutions from HPE.
For more information, connect with Gopal Panchavati on LinkedIn
Gopal Panchavati is a Principal Cloud Architect at HPE with over 25 years of experience developing strategy and delivering business solutions based on sound enterprise architecture principles. Gopal has a solid background in architecting and implementing transactional and analytical systems in both on-prem and public cloud. He is well versed in public cloud security controls, all aspects of migration to public cloud, and the challenges faced in public cloud. Gopal is passionate about leveraging public cloud for big data and AI/ML solutions.
Services Experts
Hewlett Packard Enterprise
twitter.com/HPE_Pointnext
linkedin.com/showcase/hpe-pointnext-services/
hpe.com/pointnext
- Back to Blog
- Newer Article
- Older Article
- Deeko on: The right framework means less guesswork: Why the ...
- MelissaEstesEDU on: Propel your organization into the future with all ...
- Samanath North on: How does Extended Reality (XR) outperform traditio...
- Sarah_Lennox on: Streamline cybersecurity with a best practices fra...
- Jams_C_Servers on: Unlocking the power of edge computing with HPE Gre...
- Sarah_Lennox on: Don’t know how to tackle sustainable IT? Start wit...
- VishBizOps on: Transform your business with cloud migration made ...
- Secure Access IT on: Protect your workloads with a platform agnostic wo...
- LoraAladjem on: A force for good: generative AI is creating new op...
- DrewWestra on: Achieve your digital ambitions with HPE Services: ...