From Pixels to Semantics
Description
Welcome to our course, Deep Learning for Computer Vision: From Pixels to Semantics. In this course, we will cover three main parts. The first part covers the essentials of traditional computer vision pipeline, and how to deal with images in OpenCV and Pillow libraries, including the image pre-processing pipeline like: thresholding, denoising, blurring, filtering, edge detection, contours…etc. We will build simple apps like Car License Plate Detection (LPD) and activity recogntion. This will lead us to the revolution that deep learning brought to the game of computer vision, turning traditional filters into learnable parameters using Convolution Neural Networks. We will cover all the basics of ConvNets, including the details of the Vanilla architecture for image classification, hyper parameters like kernels, strides, maxpool and feature maps sizes calculations. Beyond the Vanilla architecture, we also cover the state-of-the art ConvNet meta-architectures and design patters, like skip-connnections, Inception, DenseNet…etc. In the second part, we will learn how to use ConvNets to solve practical problems in different situations, with small amount of data, how to use transfer learning and the different scenarios for that, and finally how to debug and visualize the leant kernels in ConvNets. In the last part, we will learn about different CV apps using ConvNets. We will learn about the Encoder-Decoder design pattern. We start by the task of semantic segmentation, where we will build a U-Net architecture from scratch for the Cambridge Video (CAMVID) dataset. Then we will learn about Object Detection, covering both 2-stage and one-shot architectures like SSD and YOLO. Next, we will learn how to deal with the video data using the Spatio-Temporal ConvNet architectures. Finally we will introduce 3D Deep Learning to extend ConvNets usage to deal with 3D data, like LiDAR data.
Who this course is for:
- Beginner level computer vision engineer