AlphaFold – The 50 year old problem of protein folding solved with AI

In recent years, AI has begun to demonstrate its potential in solving very difficult problems in many different fields, from self-driving cars to chess to predicting the 3D structure of proteins. But what are proteins and why is their structure important? Let’s start from the beginning.

Why is protein folding so complicated

In daily life, protein is something in our diet that we need and is used to build muscle, but it’s so much more than that. Proteins are what keeps the cells and our body functioning, it’s the machinery that makes us able to move, what allows for digestion of food, they detect light in your eyes, it is antibodies that fight diseases, and it’s so much more [1]. Proteins can be seen as nanomachines that life on Earth invented, and what these nanomachines does depends on their structure and shape. This is why it is so important to understand how proteins function and what their shape is – By knowing the shape of a protein it would, for example, be possible to understand the mechanisms of a disease such as COVID-19, and it would accelerate the speed at how fast a medicine would be developed to fight the disease.

Proteins consist of a string of amino acids that is folded up to a 3D structure, and each string folds up to a different protein. In principle, it should be possible to predict the shape of a protein from just the string of amino acids, but this is an infamously difficult problem called “The Folding Problem” that researchers have spent more than 50 years trying to solve [2,3]. This problem regards three main aspects: How do  interatomic forces give structure, what is the folding mechanism, and can we predict the native shape of a protein from its amino acid sequence? The Levinthal paradox (by Cyrus Levinthal) describes that the folding possibilities of a protein increase exponentially with the number of amino acids. Accordingly, the time required to find the native folding would also have to be insanely high. However, proteins usually only need milliseconds to seconds to fold properly [4].

AlphaFold starts computing

To track and advancement of protein folding models, the Critical Assessment of Protein Structure Prediction (CASP) competition was launched [5]. In a nutshell, the competition is about creating a model that can predict the structure of a protein. The competitors’ answers are compared to experimental results and scored according to how close their prediction was to the experiment.

A major leap of improvement was at the 13th CASP competition in 2018 when the team DeepMind introduced AI into their model AlphaFold. This entry uses among other things deep neural networks for pattern analysis [6]. And not only that: In the following competition in 2020, the second version of AlphaFold (AlphaFold 2) followed with even better performance [3]. AlphaFold 2 was better than all previous models by leaps and bounds and it has incredible accuracy, with the average error being around the width of an atom [3].

In the figure below you can see the astonishing accuracy of AlphaFold.

Experimental results compared to AlphaFold’s predictions [3]

AlphaFold at home

But this knowledge is not locked in a cage or behind the pay-walls of academia! DeepMind produced a gift to humanity as they used AlphaFold to predict the structure for over 200 million proteins and made it free for anyone to access online at AlphaFold Protein Structure Database . For reference, the number of structures that have been experimentally elucidated and are freely accessible on the Protein Data Bank is “just” ~200,000 at the time of writing this article. The AlphaFold database is equally free to use for the curious as for scientists. Beyond the database, it’s also possible to run AlphaFold at home! Well, sort of. AlphaFold requires a lot of computing power and memory to run, making it not possible to run on a home computer or a laptop. Although, a simplified version has been created that can be run on a typical laptop and it’s called AlphaFold-Colab, and it’s free for anyone to use. More info about how to use AlphaFold-Colab can be found at the bottom of the page.

The present and the future

AlphaFold could be useful in combating future pandemics, as last year DeepMind used it to predict several structures of the SARS-CoV-2 virus. These structures were previously unknown, but in time, their predictions were validated with experiments to high accuracy [3]. Another area where AlphaFold 2 has been used is in a research group that has been trying for 10 years to decipher the structure of a membrane protein that confers antibiotic resistance. One researcher in the group mentioned that “proteins in the membrane are notoriously, extremely difficult to crystallise”. Not even a research group that worked on it for 10 years could find the structure, so difficult was the task, but AlphaFold gave them a structure after 30 minutes [7]. With this information, they can continue their study and figure out how to fight antibiotic resistance. This year, AlphaFold was awarded the Breakthrough Prize 2023 [8], and in our personal opinion[1,2] AlphaFold may be on the level of a Nobel Prize in regards to the level of impact it may have.

The true impact of AlphaFold will be revealed in time.  By understanding how proteins fold, it may be possible to design new proteins that could degrade waste, plastic and clean up oil spills [1]. Designed proteins could possibly replace chemical catalysts with biofriendly enzymes in industry, and it could give higher yields as enzymes are chiral and can produce chiral products, while chemical reactions may produce racemic mixtures. AlphaFold  will likely be of great use in the development of medicine, making the development more effective and hence reducing the cost of development [1]. For rare genetic diseases that cause the misfolding of a protein, AlphaFold could be very helpful in understanding the disease. AlphaFold may give insight into how our bodies work and shape our view on how life on earth functions.

All in all, AlphaFold will probably accelerate research by a lot and it is not unreasonable that in the end it will save lives, thanks to DeepMind.

How to use AlphaFold-Colab

Now to the fun part. If you follow the steps you can use AlphaFold on your own!

  1. Open this link
  2. Copy an amino acid sequence into the page. You can use the sequence for Green Fluorescent protein (GFP):
    MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSY
    GVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDF
    KEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVL
    LPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK
    (More sequences can be found in Protein Data Bank (PDB))
  3. Click “Runtime” > “Run all” > “Run anyway”
  4. Let it run, it may take a few minutes up to an hour. Too big proteins may crash the program but GFP should work.

If you don’t find the sequence of the protein of your interest, follow these steps:

  1. Search for a protein on the PDB webpage.
  2. Open a page for a protein.
  3. Click ‘Download Files’ > FASTA sequence.
  4. Open the file as a text file and copy the sequence of capital letters (amino acid sequence) into AlphaFold-Colab.
  5. An alternative to Protein Data Bank is Uniprot and the procedure is quite similar.

References

[1] AlphaFold: Using AI for scientific discovery, January 15, 2020. Accessed at 22:16 CET, 17. October 2022. https://www.deepmind.com/blog/alphafold-using-ai-for-scientific-discovery-2020
[2] Dill, K. A., Ozkan, S. B., Shell, M. S., & Weikl, T. R. (2008). The protein folding problem. Annual review of biophysics, 37, 289. 10.1146/annurev.biophys.37.092707.153558
[3] AlphaFold: a solution to a 50-year-old grand challenge in biology. November 30, 2020. Accessed at 22:14  CET, 17. October 2022. https://www.deepmind.com/blog/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology
[4] Zwanzig, R., Szabo, A., & Bagchi, B. (1992). Proceedings of the National Academy of Sciences, 89(1), 20-22. https://doi.org/10.1073/pnas.89.1.20
[5] Protein Structure Prediction Center, accessed at 13:00 CET, 17. October 2022. https://predictioncenter.org/
[6] AlQuraishi, M. (2019). AlphaFold at CASP13. Bioinformatics, 35(22), 4862-4865. https://doi.org/10.1093/bioinformatics/btz422
[7] How Marcelo and Megan solved a ten-year problem in minutes – Unfolded, accessed at 00:08 CET, 18. October 2022, https://www.youtube.com/watch?v=uLDud7pNiNQ
[8] Winners of 2023 Breakthrough Prize announced, accessed at 00:28 CET, 17. October 2022. https://philanthropynewsdigest.org/news/winners-of-2023-breakthrough-prize-announced

+ posts
+ posts