In Depth exploration of Spark Structured Streaming 3.0 using Python API.
Get introduced to Apache Kafka on a high level in the process.
Understand the nuances of Stream Processing in Apache Spark
Discover various features Spark provides out of the box for Stream Processing
Getting faster action from the data is the need of many industries and Stream Processing helps doing just that. But it comes with its own set of theories, challenges and best practices.
Apache Spark has seen tremendous development being in stream processing. The rich features of Spark Structured Streaming introduces a learning curve and this course is aimed at bringing all those concepts in a friendly and easy to reflect manner. Structured Streaming is a scalable and fault-tolerant stream processing engine built on the Spark SQL engine. You can express your streaming computation the same way you would express a batch computation on static data. The Spark SQL engine will take care of running it incrementally and continuously and updating the final result as streaming data continues to arrive. It allows data engineers and data scientists to process real-time data from various sources including (but not limited to) Kafka, Flume, and Amazon Kinesis.
This illustrative course will build your foundational knowledge. You will learn the differences between batch & stream processing, programming model, the APIs and the challenges specific to stream processing. Quickly we’ll move to understand the concepts of stream processing with wide varieties of examples & hands-on, dealing with inner working and taking a use case towards the end. All of this activity will be on cloud using Spark 3.0.