The Coronavirus causing the pandemic COVID-19 is wreaking havoc in our lives. This efficient infection causing machine, SARS-CoV-2, is comprised of only 29 proteins with a genome 1/200,000 the size of a human being. It is remarkably evolved to trick human cells in its quick propagation causing innumerable deaths and sickness across the globes.
In the last few months, scientists of all kinds have been engaged in learning about the virus and have galvanized themselves into studying the virus’ structure and are building therapeutics and vaccines with the help of Supercomputers.
This article is a glimpse of how supercomputers, which is an array of computers that act as one collective machine capable of processing enormous amounts of data exploiting Artificial Intelligence and Machine Learning, are used for the study of COVID-19.
Supercomputing for Coronavirus spike protein study
From the start of this pandemic, researchers exploiting supercomputing have used it to study one particular protein of the coronavirus – the ‘S’ or ‘spike’ protein, which allows the virus to make its way into human cells enabling its infection in the human body.
As a result, finding ways to attack or neutralize the spike protein is the cornerstone of much of the research on COVID-19 vaccines and therapeutics. But, that research is complicated by the extraordinarily computationally intensive tasks of simulating the spike protein’s various forms and binding vast numbers of molecules to those forms.
So, researchers across the globe are working on the computational study of this protein. Recently, a duo of researchers from the University of California, Berkeley and Istanbul Technical University announced that they are using supercomputing at the Texas Advanced Computing Center (TACC) to elucidate the most minute movements of this spike protein.
The duo is Ahmet Yildiz, Associate Professor of Physics and Molecular Cell Biology at the University of California, Berkeley and his collaborator Mert Gur at Istanbul Technical University. They are combining supercomputer-powered molecular dynamics simulations with single molecule experiments to uncover the secrets of the Coronavirus.
Yildiz and Gur are studying the virus’ spike (S) protein, the part of the virus that binds to human cells and the process when it begins inserting the RNA into the cell. The initial findings of this research, which have already been published in the Journal of Chemical Physics, were validated using lab experiments.
The researchers’ initial goal is to use molecular dynamics simulations to identify the processes that happen when the virus binds to the host cell. There are three critical phases that allow the spike protein to break into the cell and begin replicating, finds researcher Yildiz.
First, the spike protein needs to transform from a closed configuration to an open one. Second, the spike protein binds to its receptor on the outside of our cells. This binding triggers a conformational change within the spike protein and allows another human protein to cleave the spike. Finally, the newly exposed surface of the spike interacts with the host cell membrane and enables the viral RNA to enter and hijack the cell.
In early February, electron microscope images revealed the structure of the spike protein. But the snapshots only showed the main configurations that the protein takes, not the transitional, in-between steps. Only snapshots of the stable conformations were observed. Because the timing of events that allow the protein to go from one stable conformation to the next one was not known by the researchers, intermediary conformations remained unknown too. This is exactly where computer modelling had to be deployed.
The microscope images provided a useful starting point to create models of every atom in the protein, and its environment (water, ions, and the receptors of the cell). From there, Yildiz and Gur set the protein in motion and watched to see what happened using supercomputers.
They showed that the S protein visits an intermediate state before it can dock to the receptor protein on the host cell membrane and this intermediate state can be useful for drug targeting to prevent the S protein to initiate viral infection.
Simulating this behaviour at the level of the atom or individual amino acid is incredibly computationally intensive. Yildiz and Gur were granted time on the Stampede2 supercomputer at the Texas Advanced Computing Center (TACC) — the second fastest supercomputer at a U.S. university and the 19th fastest overall — through the COVID-19 HPC Consortium. Simulating one microsecond of the virus and its interactions with human cells — roughly one million atoms in total — takes weeks on a supercomputer…and would take years without one.
Apparently, doing all this is a computationally demanding process. But the predictive power of this approach is very powerful. Yildiz and Gur team, along with approximately 40 other research groups studying COVID-19, have been given priority access to TACC systems.
So, Gur and his collaborators have churned through calculations, re-enacting the atomic travel of the spike protein as it approaches, binds to, and interacts with Angiotensin-converting enzyme 2 (ACE2) receptors — proteins that line the surface of many cell types.
The researchers, so far, have figured that the S protein visits an intermediate state before it can dock to the receptor protein on the host cell membrane. This intermediate state can be useful for drug targeting to prevent the S protein to initiate viral infection.
But, the researchers revealed that their research went beyond just finding these intermediary states. The duo and the team with them have worked to identify the individual amino acids that stabilize each state, aiming to find a way to introduce roadblocks in the physical changes of the spike protein.
“If we can determine the important linkages at the single amino acid level – which interactions stabilize and are critical for these confirmations – it may be possible to target those states with small molecules,” Yildiz was quoted saying in this research paper. “It’s a computationally demanding process, but the predictive power of this approach is very powerful.”
Deep Mind’s supercomputing research in SARS protein study
In early March, artificial intelligence (AI) developer, Alphabet’s wholly-owned subsidiary, DeepMind announced the use of its AlphaFold deep learning system to predict protein structures associated with the COVID-19 disease. The company since then is in efforts to help scientists understand how to fight the SARS-CoV-2 coronavirus.
The company is using the latest version of its AlphaFold system by releasing structure predictions of several under-studied proteins associated with SARS-CoV-2, the virus that causes COVID-19.
Deep Mind believes that these structure predictions may contribute to the scientific community’s interrogation of how the virus functions, and will serve as a hypothesis generation platform for future experimental work in developing therapeutics.
To be more specific, AlphaFold is a deep learning system that focuses on predicting protein structure accurately when no structures of similar proteins are available, and is called ‘free modelling’.
Knowing a protein’s structure provides an important resource for understanding how it functions, but experiments to determine the structure can take months or longer, and some prove to be intractable. For this reason, researchers have been developing computational methods to predict protein structure from the amino acid sequence. In cases where the structure of a similar protein has already been experimentally determined, algorithms based on ‘template modelling’ are able to provide accurate predictions of the protein structure.
In the later days, DeepMind researchers confirmed that their system provided an accurate prediction for the experimentally determined SARS-CoV-2 spike protein structure shared in the Protein Data Bank. DeepMind also shared the results with several colleagues at the Francis Crick Institute in the UK, including structural biologists and virologists, who encouraged the company’s researchers to release the structures to the general scientific community.
In August, DeepMind submitted their predictions to Francis Crick for the proteins whose structures are not readily determined. These proteins include the membrane protein, protein 3a, nsp2, nsp4, nsp6, and papain-like C-terminal domain. These protein structures can potentially contain docking sites for new drugs or therapeutics, and were intended to help with future drug development in the efforts to contain COVID-19.
“Our models include per-residue confidence scores to help indicate which parts of the structure are more likely to be correct. We have only provided predictions for proteins which lack suitable templates or are otherwise difficult for template modelling. While these understudied proteins are not the main focus of current therapeutic efforts, they may add to researchers’ understanding of SARS-CoV-2,” reads the company’s note on DeepMind’s webpage.
Supercomputing and AI to develop COVID-19 treatments and vaccine models
A team of scientists from Vaxine Pty Ltd. (Adelaide, Australia) used computer modelling on the coronavirus spike protein to rapidly design a synthetic COVID-19 vaccine a vaccination biotech company have used cloud-based supercomputing and artificial intelligence (AI) to develop COVID-19 treatments and vaccine models.
Vaxine team claims that they were able to design, manufacture and advance their Covax-19 vaccine into human trials in under five months in a process that normally would take up to 15 years. The team is also using similar techniques for other projects, including a new treatment for respiratory complications of COVID-19, a preventive nasal spray, and a rapid response test to predict how severely the disease will progress.
Using the genetic sequence of COVID-19, the team built three dimensional molecular structures of key COVID-19 proteins that were then used to screen existing drugs and natural remedies for potential activity against the COVID-19 protease protein. The team used high performance cloud computing services provided by Oracle under a research grant to Flinders University (Adelaide, South Australia) that enabled the team to rapidly screen for potential drugs against COVID-19.
The Vaxine research group released a list of up to 80 new potential candidate drugs against the COVID-19 virus. The possible therapies were identified using cloud-based supercomputer programs used by Vaxine in its vaccination research modelling, allowing other researchers to further investigate their potential.
Flinders University Professor Nikolai Petrovsky, Research Director of Vaxine says that the company’s unique ability to run computer simulations on the virus before it is even fully characterized helped to dramatically speed up the ability to design the Covax-19 vaccine. The vaccine based on the synthetic spike protein was then manufactured in insect cell cultures before being combined with its Vaxine Advax adjuvant, which is used to turbocharge the vaccine and make it more effective
IBM’s supercomputers for large calculations
It is certain that data models powered by supercomputing can track the infections from social gatherings. Supercomputing not only brings compute power but a large amount of data storage to the pandemic research. The speed of data exchange is valuable during calculations.
Companies such as IBM are actively involved in using supercomputing to fight the pandemic. n collaboration with the White House Office of Science and Technology Policy and the U.S. Department of Energy and many others, IBM is helping launch the COVID-19 High Performance Computing Consortium, which will bring forth an unprecedented amount of computing power—16 systems with more than 330 petaflops, 775,000 CPU cores, 34,000 GPUs, and counting — to help researchers everywhere better understand COVID-19, its treatments and potential cures.
High-performance computing systems allow researchers to run very large numbers of calculations in epidemiology, bioinformatics, and molecular modelling. These experiments would take years to complete if worked by hand, or months if handled on slower, traditional computing platforms.
By pooling the supercomputing capacity under a consortium of partners, including IBM, Lawrence Livermore National Lab (LLNL), Argonne National Lab (ANL), Oak Ridge National Laboratory (ORNL), Sandia National Laboratory (SNL), Los Alamos National Laboratory (LANL), the National Science Foundation (NSF), NASA, the Massachusetts Institute of Technology (MIT), Rensselaer Polytechnic Institute (RPI), and multiple leading technology companies, the tech giants can offer extraordinary supercomputing power to scientists, medical researchers and government agencies as they respond to and mitigate this global emergency.
As a powerful example of the potential, IBM’s Summit, said to be the most powerful supercomputer on the planet, has already enabled researchers at the Oak Ridge National Laboratory and the University of Tennessee to screen 8,000 compounds to find those that are most likely to bind to the main ‘spike’ protein of the coronavirus, rendering it unable to infect host cells. They were able to recommend the 77 promising small-molecule drug compounds that could now be experimentally tested. This is the power of accelerating discovery through computation.
Oak Ridge National Laboratory (ORNL) and the University of Tennessee have used IBM’s Summit supercomputer to screen 8,000 compounds to find which ones will bind to the coronavirus “spike” protein so that it will not infect host cells. Through computation, the University of Tennessee and ORNL recommended 77 small-molecule drug compounds that showed promise for experimental testing.
Apparently, in 2021, ORNL plans to introduce an exascale supercomputer called Frontier, which will improve on the speed of Summit by a factor of 10. That will usher in the exascale era in computing, which will lead to even more possibilities for doing these calculations quickly and in more detail. The additional speed will help with identifying proteins for COVID-19 because it will allow the researchers to do calculations on more chemicals and more accurately.
Supercomputing as Pandemic Teaching Tool
Christopher D. Carothers teaches a course called Parallel Programming and Computing. He instructs students on how to create models on how the COVID-19 spreads using Artificial Intelligence Multiprocessing Optimized System (AIMOS) supercomputer, which has 252 compute nodes and a top processing speed of 1048.6 teraflops.
Chris Carothers is a Professor in the Computer Science Department at Rensselaer Polytechnic Institute. His research interests are in massively parallel systems focusing on modelling and simulation systems of all sorts.
A final student project on COVID-19, under his guidance, ties together elements they learned in the course about GPU programming (both individual and multi-GPU), compute nodes with multiple GPUs and a parallel file system.
The goal of the student project is to use COVID as a driver for understanding parallel computing and how fast the simulation could go. Carothers says that the project team took data that was known publicly about how the virus was spreading and adapted that for their application, and were able to get some interesting, good performance results.
Another project involves partners such as IBM, Harvard University and RPI using supercomputing for triaging of medical symptoms and helping medical professionals decide on treatments for patients, such as whether to prescribe a ventilator.
A supercomputer-dependent future of medicine
Going forward, scientists will continue to rely on computational models as they have to forecast the influenza virus each year. Supercomputers will be needed to do the desired statistics to track everything about the virus, its structure, mutation power and time and most essentially to design and manufacture an appropriate vaccine.
Scientists say that there will be a lot of focus on speculatively looking at the evolution of COVID-19, and how the therapies that people are working on now would need to change to accommodate that. The data from millions of trials will help medical researchers plan for the COVID-19 pandemic as well as future pandemics.
Another relevant application of supercomputers is its ability to shrink the vaccine cycle. As, vaccines usually take 10 years to develop, some experts believe supercomputing will aid in reducing that time to trials to administration to just a few months. The bigger use will also be able to help with managing the supply chain for drugs and vaccines in future pandemics.
In the future, supercomputers built with advanced AI and ML are going to speculate, predict, and prevent another pandemic and save millions of lives – after all AI/ML/NLP are all getting there.
- ‘It will change everything’: DeepMind’s AI makes gigantic leap in solving protein structures (nature.com)
- Know when to fold ’em: How a company best known for playing games used A.I. to solve one of biology’s greatest mysteries: https://fortune.com/2020/11/30/deepmind-solved-protein-folding-alphafold/
- AlphaGo vs. Lee Se-dol: Why a win for AI is not a lose for humanity | Founding Fuel