Servers & Systems: The Right Compute
1760062 Members
2404 Online
108889 Solutions
New Article
Max_Alt

Does your bioscience data strategy fit the way HPC is changing?

As your workloads and datasets continue to grow, public cloud is less able to meet the need. But it’s hard to give up some of those convenient features. With HPE GreenLake for HPC you don’t have to. Learn why.  

HPE-GreenLake-HPC-Bioscience.png

Today’s bioscience startups often begin life in the public cloud, where they can accumulate large amounts of data from day one and add capacity as they grow. In the public cloud, HPC is delivered as a service—there’s no need to spend time managing and maintaining a data center. Many established bioscience teams are also looking to shift from traditional on-premises HPC systems to a more flexible, utility-like HPC service. Public cloud seems to offer all this.

However, as we’ll discuss in this post, HPC workloads often become less suitable for public cloud as they grow. And many types of HPC workloads aren’t a good fit for public cloud in the first place.

So, if you’re an organization planning a long-term data strategy and you want value today or in three years, what’s the answer? Can you have the attractive features of public cloud—on-demand scaling, consumption-based billing, managed services—in a dedicated solution that really fits your needs?

We developed HPE GreenLake for HPC with this set of needs in mind. It’s a portfolio of hybrid cloud HPC solutions powered by industry-leading AMD EPYC processors, which delivers the control of on-premises IT and the agility and elasticity of the cloud. Let’s look more closely at why bioscience HPC workloads demand this kind of solution.

How bioscience HPC workloads are changing

Today’s bioscience leaders use a convergence of HPC, data science, and high throughput computing to accelerate time to insights in at least five areas.

  • Computational chemistry and structural biology uses simulation to speed research into new drugs. It played an important role in the development of COVID 19 vaccines.
  • Genomics involves the analysis of genes and DNA to better understand diseases and create personalized treatments.
  • Medical image analysis is being used to create predictive models that can automate detection of symptoms and speed patient diagnostics.
  • Modeling and simulation in precision medicine is being used to model outcomes inferred by specific circumstances and precise factors
  • Knowledge graphs enable researchers to build semantic networks from large datasets, which reveal relationship etween seemingly unrelated data points. Knowledge graphs can reveal personal characteristics that make patients mor or less likely to develop a health condition.  

What is common among these use cases is growth. Datasets are growing exponentially in size. The fidelity of experiments and parameter search is growing. And there is a growing need to incorporate external collaborators and public datasets. All this must take place in a compliant and secure global environment, with fault-tolerant architecture and scalable and dynamic resourcing.

HPC systems are growing in size to support high data volumes, produced by scientific instruments that have generated an exponential demand for computing (and storage). The key instrument classes driving demand include NGS, Flow Cytometry, Live Cell Imaging, Light Sheet Microscopy, and Cryo-electron microscopy. That data has to be moved from instruments to appropriate storage locations, where it can be managed and analyzed.

For operational production workloads with deterministic performance, manageability, reliability and impact protection, teams need dedicated HPC infrastructure (and a staging data storage system) within an isolated environment. This is not easily available in a public cloud model.

When workloads can't go to public cloud

As HPC workloads and datasets grow into this novel set of requirements, teams often find that public cloud becomes less and less able to meet them. Factors include:

  • Latency sensitivity – Equipment and processes are growing more sensitive to compute latency. Examples include interactive workflows, such as AR/VR and visualization, and complex workloads that span a variety of hosts and storage systems.
  • Local data processing – Bioscience teams want to apply transcoding, filtering, caching, and alerting at the edge. They also need to bring compute to large and localized data sets, which can’t be moved due to size or regulatory constraints.
  • Data residency – Regulations may dictate that the data and infrastructure reside in specific locations. Contracts may specify where applications and services must be deployed. Some enterprises also need to comply with strict Infosec practices.
  • Value-cost – These factors include timeliness of results and costs associated with large-scale compute, data services (e.g., transfer, high-throughput storage), portability, and expenses associated with moving applications and their licenses.

There are other considerations too, such as availability of scientific applications, requirements to run applications specialized hardware (e.g., accelerators), or being bound to your enterprise’s large-scale, internal HPC infrastructure.

Workloads that are likely to stay on premises due to these factors include lab data quality control, 3D data modeling, medical imaging and ML/AI projects around it, genomics RNA-sequencing, and drug sequencing.

But all of these requirements can be met by HPE GreenLake for HPC, which also provides advantages that bioscience organizations look for in public cloud platforms.

Bringing the cloud to you

HPE GreenLake for HPC solutions1, powered by industry-leading AMD EPYC processors, bring the cloud experience directly to your applications and data wherever they are. That includes edge locations, such as hospitals and labs where bioscience data such as medical images are created and stored, as well as colocations and data centers.

With HPE GreenLake for HPC you have a pay per use, scalable, point and click self-service experience that is managed for you. It offers all of the utility-like flexibility of public cloud that bioscience teams look for, while solving those long-term challenges:

  • You stay in control of your data because you can locate it and move it wherever you want, with no extra charges
  • You can scale on demand to meet variable capacity requirements
  • The long-term economics are better because HPE GreenLake for HPC provides workload-optimized solutions with no data movement or egress charges, ever. AMD EPYC processors offer an attractive price-performance ratio, thanks to their high core density and innovative architecture. And because your solution is deployed where you need it, you can keep using your existing investments alongside it.

Not every HPC workload can go public cloud. But with HPE GreenLake for HPC, you can have the most valuable attributes of public cloud, as well as a performance edge. You get the world-leading performance of AMD EPYC processors, which deliver more cores, high memory bandwidth, and vast I/O enabling accelerated performance for the most demanding workloads. AMD EPYC processors hold more than 170 world records2 across multiple platforms. The latency of a public cloud connection is also removed.

Learn more about HPE GreenLake for HPC solutions at hpe.com/greenlake/hpc.


Max Alt
Hewlett Packard Enterprise

twitter.com/hpe_hpc
linkedin.com/showcase/hpe-ai/
hpe.com/info/hpc


1. https://www.hpe.com/greenlake/hpc

2. https://www.amd.com/en/processors/epyc-world-records

0 Kudos
About the Author

Max_Alt

Max has a unique background with almost 30 years of experience in software performance technologies and high performance computing. He is both an entrepreneur and a large-scale enterprise leader. He founded several tech start-ups in the Bay area and he spent 18 years at Intel in various engineering and leadership roles including developing next generation super-computing technologies. Max's strongest areas of expertise lie in computer and server architectures, cloud technologies, operating systems, compilers, and software engineering. Prior to joining HPE, Max was SVP AI & HPC Technology at Core Scientific that acquired Atrio in 2020 where Max was the CEO & Founder.