Loading Episode...
Data Science Conversations - Damien Deighan and Philipp Diesinger EPISODE 2, 14th October 2020
Philipp Koehn (Part 2) - How Neural Networks have Transformed Machine Translation
00:00:00 00:29:41

Philipp Koehn (Part 2) - How Neural Networks have Transformed Machine Translation

This is Part 2 of our conversation with Professor Philipp Koehn of Johns Hopkins University.  Professor Koehn is one of the world’s leading experts in the field of Machine Translation & NLP.  

In this episode we delve into commercial applications of machine translation, open source tools available and also take a look into what to expect in the field in the future.

Episode Summary:


  • Typical datasets used for training models
  • The role of infrastructure and technology in Machine Translation
  • How the academic research in Machine Translation has manifested into industry applications

  • Overview of what’s available in Open source tools for Machine Translation


  • The Future of Machine Translation and can it pass a Turing test




Philipp Koehn latest book - Neural Machine Translation - Amazon link: 




Omniscien Technologies - Leading Enterprise Provider of machine translation services:




Open Source tools:


- Fairseq https://fairseq.readthedocs.io/en/latest/

- Marian https://marian-nmt.github.io/

- OpenNMT https://opennmt.net/

- Sockeye https://awslabs.github.io/sockeye/


Translated texts (parallel data) for training:


- OPUS http://opus.nlpl.eu/

- Paracrawl https://paracrawl.eu/


Two papers mentioned about excessive use of computing power to train NLP models:


- GPT-3 https://arxiv.org/abs/2005.14165

- Roberta https://arxiv.org/abs/1907.11692