Advancing Life & Work

Introducing PharML.Bind: a powerful tool to advance drug discovery


The modern process of drug discovery is both extremely time-consuming – taking years to move from validating targets to clinical trials – and expensive – costing hundreds of millions of dollars. As a society, we have never felt more urgency to speed up how drugs are developed than during the current COVID-19 pandemic. Together, with our partners at the Medical University of South Carolina, we are offering a new open source tool PharML.Bind that researchers can use in this global fight.

There is nothing simple about drug discovery. The ongoing search for chemical compounds that are effective, safe, and meet clinical and commercial needs is time-intensive. Scientists must search for chemical compounds that create a therapeutic effect on human proteins or biological pathways—compounds that are safe and effective.

Commonly, early-stage drug discovery involves physically screening compounds to assay them – targeting the compounds for specific proteins or pathways. An assay is an analysis done to determine the presence of a substance and the amount of that substance. Extremely efficient labs can screen up to 10,000 compounds per day, although a rare few can conduct up to 100,000 assays per day.[1] That sounds like a large number, until you realize that the number of possible compounds could range from 200,000 to >1016[2]. The number of targets is equally as large; the number of proteins in the human body is estimated to be between 80,000 and 400,000[3].  And while a lab might process 10,000 to 100,000 assays a day, the reality is that the process of actually setting up a single assay might take days or even weeks, and typically only 200 assays can be processed in a single day.

But a new virtual screening technique, called virtual high throughput screening, adds high-performance computing to the process of predicting the affinity of drug compounds and proteins and dramatically improves its efficiency. The approach uses Artificial Intelligence to accelerate the matching of compounds and targets. While virtual screening doesn’t attempt to replace the physical processes of drug discovery, it can jump-start it by narrowing the search of possible drugs for a given target.

In an effort to put this powerful tool in the hands of researchers, HPE partnered with Dr. Yuri Peterson from the Medical University of South Carolina on PharML.Bind, a revolutionary approach to predicting target compound affinity to protein structures using deep learning and powerful GPU-accelerated systems, now available on Github


PharML.Bind effectively changes the types of problems solvable in drug discovery. With PharML.Bind, researchers can generate highly physical predictions [BL1] [HP2] [BL3] for real protein interactions with every existing compound approved by the FDA orders of magnitude faster than with traditional methods. PharML.Bind is active site-agnostic — meaning that it can test compounds using an entire protein structure without any knowledge of where or how compounds should bind. Likewise, the framework enables researchers to investigate the inverse problem in a way that has not previously been possible: quickly identifying potential proteins for a specific compound to guide expectations for unintended drug side-effects.

For several years now, researchers at Cray (and now HPE) worked closely with Dr. Peterson, exploring the application of deep learning and supercomputing resources to the problem of protein-drug interactions. The framework we’ve published takes a novel approach, using new pre-trained deep Graph Neural Networks (GNN). Neural networks are a set of algorithms, modeled after the human brain, that are designed to recognize patterns. 

In benchmark tests, PharML.Bind, running on a single GPU accelerated system can generate affinity predictions for a well-known protein[4] for over 300,000 compounds in under 25 minutes. And more impactful, PharML.Bind deployed on a high-performance compute cluster generates the same predictions for 300,000 compounds in less than three minutes. This speed is remarkable when compared with today’s top labs, which can only screen up to 100,000 assays per day.

PharML.Bind on Github includes a codebase (for training, inference, data pre-processing and visualization) and an ensemble of pre-trained Molecular-Highway Graph Neural Networks (MH-GNNs).

At HPE, our purpose is to advance the way people live and work. I am extremely proud of the ways we are responding to COVID-19 and harnessing the power of supercomputing to accelerate the development of treatments and vaccines.

For a deeper examinatioin, please consult this PharML paper:

Mark Potter
Chief Technology Officer, HPE and Director, Hewlett Packard Labs





[4] The spike(S) glycoprotein (6VSB) which are the primary mechanism by which the Covid-19 virus binds to surfaces of cells within human organs.

About the Author


Mark Potter is the Chief Technology Officer for Hewlett Packard Enterprise and the Director of Hewlett Packard Labs, the company’s advanced research organization.