A true “aha” conversation! Learn how deep learning techniques from natural language processing (NLP) are applied to drug discovery, specifically, protein to protein interactions. Includes a quick and dirty primer on just enough biology to understand the training data A-Alpha Bio uses for their ML models.
If you are interested in novel applications of ML, subscribe to our podcast at:
Show Notes:
0:37 - The basics of synthetic biology for machine learning practitioners
0:50 - What are proteins and why do they matter?
1:50 - A protein is a string of 20 amino acids… which means it starts looking like a Natural Language Processing problem.
2:35 - DeepMind’s AlphaFold and Meta FAIR’s ESMFold: taking as input a string of amino acids, and then predicting the 3D structure of proteins.
6:23: Where Alphafold got their training data: The Protein Data Bank.
8:07: A Alpha Bio’s product: AlphaSeq. 10:45: The source of the name “A Alpha Bio”: yeast genders. 11:36: Applications of synthetic biology: pharmaceuticals, agriculture.
15:00: Applying ML to predict protein to protein interactions.
20:30: !!! The actual ML techniques applied: treating proteins as strings and applying NLP architectures: RNNs, LSTMs, Attention, and Transformers.
22:50: Discrete Optimization problem to then generate proteins.
28:30: The insights behind why applying ML would work.
31:20: The rise of deep learning in the field of computational biology.
32:50: Ryan’s journey into machine learning and data science
35:20: Advice for deep learning people interested in applying ML to biology
Additional papers covering the topic of ML in biology:
https://www.nature.com/articles/s41586-021-03819-2 - The AlphaFold paper.
https://pubmed.ncbi.nlm.nih.gov/35830864/ - A broad overview of deep learning in biology.
https://pubmed.ncbi.nlm.nih.gov/35862514/ - A paper out of the Baker lab in which the authors use deep learning to design proteins from scratch.
https://pubmed.ncbi.nlm.nih.gov/35099535/ - From Charlotte Deane’s lab with collaborators from Roche, this paper presents a deep learning approach to rapidly and accurately model the structure of antibody CDR3 loops. One of the papers mentioned in the review above.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9129155/ - This is recent work from A-Alpha; this paper doesn’t include any ML but does include some great examples of AlphaSeq data and how it can be applied.