Ciro Greco has built ML systems at many named-brand retailers. In this episode, he gives us tips on getting value out of ML at “reasonable scale” companies with NLP and information retrieval. The concept of “reasonable scale” was one he returned to. and he clearly has a nuanced understanding of how that segment differs from the hyper scale tech giants. He also brings advanced ideas like embeddings from NLP towards e-commerce personalization.
If you are interested in novel applications of ML, subscribe to our podcast at:
1:36: Key differences in applying ML at “reasonable scale” companies like major retailers where you can’t just “big-data” your way out of problems, compared to the hyper scale tech giants.
3:22: The basics of personalization: suggestions, search, recommendations, and categories.
4:38: A non-obvious challenge: how to personalize for non-logged-in users without a profile who visit infrequently.
9:00: Different incentives for reasonable scale vs hyper scale companies.
9:44: Getting your data right: data ingestion, data practices, organizing teams around data, transforming data, infrastructure for flexible data access, so that you can make developers productive when you have finite resources.
11:23: Learning from experience that data - replayability and replicitability - is more important than modeling.
12:58: Learnings from experiences at presenting at top tier conferences: so many published papers come from the hyper scale companies.
14:19: Taking session data and catalog data to create a “product to vector” embedding to personalize an experience.
19:20: Requirements on how to sell: the sales people must communicate to the “people who write the check” that data integration is a first class citizen, not a downstream task, to achieve ROI.
21:09: Dynamics of regulatory and privacy issues, and how to tackle them as an organization.
24:10: Ciro talks about his personal journey into ML, starting with a PhD in neuroscience and linguistics.
25:46: Early challenges in applying deep learning to NLP.
26:22: The “a ha” moment that led to Ciro’s first startup delivering search products.
27:55: Changes in the role of a data scientist over the past decade. From the role of PhDs who had to tackle problems with very little tooling, to today where there are so many tools available. And a shift towards understanding products and customers.