The Road to BERT
Description
In this course, we will dive into the world of Natural Language Processing. We will demonstrate how Deep Learning has re-shaped this area of Artificial Intelligence using concepts like word vectors and embeddings, strucutured deep learning, collaborative filtering, recurrent neural networks, sequence-to-sequence models and transformer networks. In our journey, we will be mostly concerned with how to represent the language tokens, being at the word or character level, and and how to represent their aggregation, like sentences or documents, in a semantically sound way. We start the journey by going through the traditional pipeline of text pre-processing and the different text features like binary and TF-IDF features with the Bag-of-Words model. Then we will dive into the concepts of word vectors and embeddings as a general deep learning concept, with detailed discussion of famous word embedding techniques like word2vec, GloVe, Fasttext and ELMo. This will enable us to divert into recommender systems, using collaborative filtering and twin-tower model as an example of the generic usage of embeddings beyond word representations. In the second part of the course, we will be concerned with sentence and sequence representations. We will tackle the core NLP of Langauge Modeling, at statistical and neural levels, using recurrent models, like LSTM and GRU. In the following part, we tackle sequence-to-sequence models, with the flagship NLP task of Machine Translation, which paves the way to talk about many other tasks under the same design seq2seq pattern, like Question-Answering and Chatbots. We present the core idea idea of Attention mechanisms with recurrent seq2seq, before we generalize it as a generic deep learning concept. This generalization leads to the to the state-of-the art Transformer Network, which revolutionized the world of NLP, using full attention mechanisms. In the final part of the course, we present the ImageNet moment of NLP, where Transfer Learning comes into play together with pre-trained Transfomer architectures like BERT, GPT 1-2-3, RoBERTa, ALBERT, XLTransformer and XLNet.
Who this course is for:
- Beginner level NLP engineers and data scientists