Discovering Pharmaceuticals using Machine Learning, with Ryan Emerson of A-Alpha Bio

Yaoshiang Ho

A true “aha” conversation! Learn how deep learning techniques from natural language processing (NLP) are applied to drug discovery, specifically, protein to protein interactions. Includes a quick and dirty primer on just enough biology to understand the training data A-Alpha Bio uses for their ML models.

Show Notes:

0:37 - The basics of synthetic biology for machine learning practitioners

0:50 - What are proteins and why do they matter?

1:50 - A protein is a string of 20 amino acids… which means it starts looking like a Natural Language Processing problem.

2:35 - DeepMind’s AlphaFold and Meta FAIR’s ESMFold: taking as input a string of amino acids, and then predicting the 3D structure of proteins.

6:23: Where Alphafold got their training data: The Protein Data Bank.

8:07: A Alpha Bio’s product: AlphaSeq. 10:45: The source of the name “A Alpha Bio”: yeast genders. 11:36: Applications of synthetic biology: pharmaceuticals, agriculture.

15:00: Applying ML to predict protein to protein interactions.

20:30: !!! The actual ML techniques applied: treating proteins as strings and applying NLP architectures: RNNs, LSTMs, Attention, and Transformers.

22:50: Discrete Optimization problem to then generate proteins.

28:30: The insights behind why applying ML would work.

31:20: The rise of deep learning in the field of computational biology.

32:50: Ryan’s journey into machine learning and data science

35:20: Advice for deep learning people interested in applying ML to biology


Additional papers covering the topic of ML in biology: - The AlphaFold paper. - A broad overview of deep learning in biology. - A paper out of the Baker lab in which the authors use deep learning to design proteins from scratch. - From Charlotte Deane’s lab with collaborators from Roche, this paper presents a deep learning approach to rapidly and accurately model the structure of antibody CDR3 loops. One of the papers mentioned in the review above. - This is recent work from A-Alpha; this paper doesn’t include any ML but does include some great examples of AlphaSeq data and how it can be applied.

