Back to Blog

Discovering Pharmaceuticals using Machine Learning, with Ryan Emerson of A-Alpha Bio

Image of Yaoshiang Ho
Yaoshiang Ho

A true “aha” conversation! Learn how deep learning techniques from natural language processing (NLP) are applied to drug discovery, specifically, protein to protein interactions. Includes a quick and dirty primer on just enough biology to understand the training data A-Alpha Bio uses for their ML models.

If you are interested in novel applications of ML, subscribe to our podcast at:

Show Notes:

0:37 - The basics of synthetic biology for machine learning practitioners

0:50 - What are proteins and why do they matter?

1:50 - A protein is a string of 20 amino acids… which means it starts looking like a Natural Language Processing problem.

2:35 - DeepMind’s AlphaFold and Meta FAIR’s ESMFold: taking as input a string of amino acids, and then predicting the 3D structure of proteins.

6:23: Where Alphafold got their training data: The Protein Data Bank.

8:07: A Alpha Bio’s product: AlphaSeq. 10:45: The source of the name “A Alpha Bio”: yeast genders. 11:36: Applications of synthetic biology: pharmaceuticals, agriculture.

15:00: Applying ML to predict protein to protein interactions.

20:30: !!! The actual ML techniques applied: treating proteins as strings and applying NLP architectures: RNNs, LSTMs, Attention, and Transformers.

22:50: Discrete Optimization problem to then generate proteins.

28:30: The insights behind why applying ML would work.

31:20: The rise of deep learning in the field of computational biology.

32:50: Ryan’s journey into machine learning and data science

35:20: Advice for deep learning people interested in applying ML to biology

 

Additional papers covering the topic of ML in biology:

https://www.nature.com/articles/s41586-021-03819-2 - The AlphaFold paper.

https://pubmed.ncbi.nlm.nih.gov/35830864/ - A broad overview of deep learning in biology.

https://pubmed.ncbi.nlm.nih.gov/35862514/ - A paper out of the Baker lab in which the authors use deep learning to design proteins from scratch.

https://pubmed.ncbi.nlm.nih.gov/35099535/ - From Charlotte Deane’s lab with collaborators from Roche, this paper presents a deep learning approach to rapidly and accurately model the structure of antibody CDR3 loops. One of the papers mentioned in the review above.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9129155/ - This is recent work from A-Alpha; this paper doesn’t include any ML but does include some great examples of AlphaSeq data and how it can be applied.


Related Posts

Generating ROI from ML at “reasonable scale” e-commerce companies with Ciro Greco.

Image of Yaoshiang Ho
Yaoshiang Ho

Ciro Greco has built ML systems at many named-brand retailers. In this episode, he gives us tips on...

Read more

Applying ML to Cybersecurity

Image of Yaoshiang Ho
Yaoshiang Ho

Building Things with Machine Learning is Masterful AI's podcast covering interesting applications...

Read more