- Integrated Systems
- About Us
- Integrated Systems
- About Us
Managing data in a hybrid and ever more connected world…
HPE Ezmeral Data Fabric in a multi-site deployment: on premises and in the cloud
Managing data is not a new challenge for enterprises. However, over the last decade, it has become more complicated. Data no longer resides in a single place but is distributed over multiple sites in different locations. Globally distributed teams need to share and make information available – quickly, securely, and reliably. It’s becoming more common for the data processing to take place outside the traditional data center environment, such as on the edge or in the cloud.
Modern workloads demand data to be handled wherever it is--from edge-to-core and edge-to-cloud, in any direction, at any speed, and in any format. Data volume and the number of locations are growing rapidly. A robust data pipeline is the foundation and a prerequisite for modern workloads and use cases.
This blog describes how you can build that foundation with the HPE Ezmeral Data Fabric and create a global namespace. Global namespace refers to a collection of features that make it possible for applications to access data via the same pathname, from anywhere and using any access method (API). By providing a global namespace, HPE Ezmeral Data Fabric provides a unified view and access to data spanning from on premises to the cloud, without having to be aware of the physical location of the files. The combination of global namespace and data fabric’s efficient mirroring capability lets you run the right application, at the right time, on the right data.
The world is hybrid with multiple edges, clouds, and data centers
Cloud is an indispensable component in this pipeline architecture, as resources can conveniently be ramped-up quickly to handle new workloads or be used as an efficient data archive. Customers are using a mix of on-premises infrastructure and the cloud to leverage the benefits of both worlds. However, the jeopardy in a hybrid world is the creation of data silos instead of a continuously connected data pipeline. With the advent of container technology, applications can be spun up quickly and then shifted from a datacenter to the cloud or the other way around. But what about the data? The data store underneath should have the same flexibility.
The pipes must be connected for these workloads along the edge, core, and cloud. Data must be reliably handled and accessible – whether it’s streams, tables, or millions of files.
How to build a multi-site data fabric
HPE Ezmeral Data Fabric gives you the foundation to build these pipelines from a data store perspective. The HPE Ezmeral Data Fabric is a distributed and scale-out data platform. Its filesystem was designed to handle billions of files, streams, and tables in a unified data platform. You don’t need to stitch together several technologies; you can just use a single data platform. The HPE Ezmeral Data Fabric can be deployed in a hybrid environment—on premises or in the cloud. This exact combination is needed for pipeline workloads.
To demonstrate how you can connect an HPE Ezmeral Data Fabric deployment in the data center to an HPE Ezmeral Data Fabric deployment in the cloud, here is a technical description of the process as it was carried out with an HPE lab environment using an on-premises HPE cluster and a Microsoft Azure environment as the cloud cluster.
In this scenario, the latest release of HPE Ezmeral Data Fabric, version 6.2 was installed on five HPE Apollo 4200 Gen10 Systems to create the fabric cluster “my.datafabric.lab” in a secure mode.
To build the second cluster, using a Microsoft Azure environment in the Azure resource group, a dedicated HPE Ezmeral vnet was created in a private IP Range that did not overlap with the on-premises network configuration. Standard_E8as_v4 size was used as the minimum configuration and five virtual machines (VMs) with a CentOS 8.1 image were deployed to be used as HPE Ezmeral Data Fabric nodes. To fulfill the minimum installation requirements, it was necessary to increase the OS disk size to 300GB for each VM. In addition, data disks were added to the VMs for the HPE Ezmeral Data Fabric to use for storage. After the VM deployment, the standard HPE Ezmeral Data Fabric installation script was run on the first HPE Ezmeral Data Fabric node. The installer script deployed the HPE Ezmeral Data Fabric on the other HPE Ezmeral Data Fabric nodes in a secure mode and created the cluster “my.datafabric.cloud”.
For this multi-site and hybrid deployment, a site-to-site virtual private network (VPN) connection between the Houston Data Center and Azure was needed and all nodes had to be accessible by their fully qualified domain name (FQDN). Next, a local and virtual network gateway was created on the Azure Site. A connection between the Azure Gateway and the Houston VPN device was configured and established.
To complete the hybrid deployment, a secure trust relationship between both clusters was required. Setting up a cross-cluster security can be done by running the configure-cross cluster utility script. The script configured the clusters for remote access then automatically exchanged keys and certificates.
Hybrid data fabric and the global namespace
Once cross-cluster trust has been established, it is possible to run commands remotely, create replicas, mirror copies of volumes, and access our data in both directions.
Inside the HPE Ezmeral Data Fabric GUI, you can see the different cluster instances in the menu bar. This option allows the user to manage and configure each deployment individually under a single management interface. Inside the management interface, the user can define volumes and mirrors for each site. The HPE Ezmeral Data Fabric also provides a CLI and REST-API for cluster federation. It allows the user to control and manage all the deployments from a single site, as long as cluster trust is established.
The HPE Ezmeral Data Fabric has POSIX and global namespace capabilities, which make it possible to mount the entire HPE Ezmeral Data Fabric environment into the local filesystem directory to provide a holistic view of the data.
Putting workloads on the data fabric
With a hybrid cloud HPE Ezmeral Data Fabric deployment setup such as just described, a user can work on the data and edit files directly, even if the files are in a different physical location.
Because the HPE Ezmeral Data Fabric has a global namespace, it also provides the ability to lift and shift applications. For one use case, it’s possible to start a containerized compute job in the cloud instance and later decide if you would like to shift it over to an on-premises environment in the datacenter. The HPE Ezmeral Data Fabric lets you manage and access the data where it’s needed.
Another use case where the data fabric provides an advantage is handling an edge data acquisition workload. In de-centralized locations (such as edge or cloud), sensor data is processed and analyzed. By using the HPE Ezmeral Data Fabric capability to define a mirror volume, you can replicate the collected data from those de-centralized locations into the data center, where data can be collected and analyzed by the method of choice, including machine learning or interactive queries.
An additional benefit of the HPE Ezmeral Data Fabric for containerized workloads is that the integrated container storage interface (CSI) allows the HPE Ezmeral Data Fabric to be persistent storage for a Kubernetes environment.
HPE Ezmeral Data Fabric’s volumes feature helps support multi-tenancy, enabling users to implement various workloads on the data platform without interfering with each other.
Solid data foundation for a hybrid world
The HPE Ezmeral Data Fabric capabilities create a distributed fabric, wherever the data is. Multiple use cases can be deployed on top of it and run along the pipeline in multiple locations. This capability enables flexibility for applications and workloads to run where they fit best.
With cluster trust established, you can leverage the global namespace capabilities. The HPE Ezmeral data Fabric includes multiple APIs that allow different applications and workloads to access the data, including HDFS, NFS, POSIX, CSI, and S3. This ability to access data in any way you like and from any place you need avoids unnecessary copies or data movement. Avoiding unnecessary copies lessens the tendency to split your system into separated data ponds.
The use of mirroring and replication drive business continuity. On a volume level, we can define replication to a different geo-distributed cluster site. Data is available and accessible in different locations with a common security and governance model. This combination of capabilities contributes significantly to team efficiency, as they can learn globally, yet act locally.
You can find more information in the technical whitepaper “HPE Ezmeral Data Fabric Multi-Site On-Premises and Cloud Integration”.
About the author:
Denise Ochoa is a Solutions Engineer for the HPE Storage Solutions team. She is passionate about technology and her current focus on Big Data and analytics.
- Matheen_Raza on: Introducing HPE Ezmeral Container Platform 5.1
- LWhitehouse on: Catch the next wave of HPE Discover Virtual Experi...
- EzmeralExperts on: Bringing Trusted Computing to the Cloud
- Matheen_Raza on: Leverage containers to maintain business continuit...
- MattMaccaux on: How to accelerate model training and improve data ...
- R_Christiansen on: More enterprises are using containers; here’s why.
- Matheen_Raza on: Machine Learning Operationalization in the Enterpr...