Artwork for podcast Machine Learning Engineered
Why Multi-Modality is the Future of Machine Learning w/ Letitia Parcalabescu (University of Heidelberg, AI Coffee Break)
Episode 1210th November 2020 • Machine Learning Engineered • Charlie You
00:00:00 01:31:47

Share Episode


Letitia Parcalabescu is a PhD candidate at the University of Heidelberg focused on multi-modal machine learning, specifically with vision and language.

Learn more about Letitia:

Every Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here:

Follow Charlie on Twitter:

Take the Giving What We Can Pledge:

Subscribe to ML Engineered:

Comments? Questions? Submit them here:


01:30 Follow Charlie on Twitter (

02:40 Letitia Parcalabescu

03:55 How she got started in CS and ML

07:20 What is multi-modal machine learning? (

16:55 Most exciting use-cases for ML

20:45 The 5 stages of machine understanding (

23:15 The future of multi-modal ML (GPT-50?)

27:00 The importance of communicating AI breakthroughs to the general public

37:40 Positive applications of the future “GPT-50”

43:35 Letitia’s CVPR paper on phrase grounding (

53:15 ViLBERT: is attention all you need in multi-modal ML? (

57:00 Preventing “modality dominance”

01:03:25 How she keeps up in such a fast-moving field

01:10:50 Why she started her AI Coffee Break YouTube Channel (

01:18:10 Rapid fire questions


AI Coffee Break Youtube Channel

Exploring Phrase Grounding without Training

AI Coffee Break series on Multi-Modal learning

What does it take for an AI to understand language?

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations