Taming Big Data with Apache Spark and Python – Hands On!
Apache Spark tutorial with 20+ hands-on examples of analyzing large data sets on your desktop or on Hadoop with Python!
What you’ll learn
-
Use DataFrames and Structured Streaming in Spark 3
-
Frame big data analysis problems as Spark problems
-
Use Amazon’s Elastic MapReduce service to run your job on a cluster with Hadoop YARN
-
Install and run Apache Spark on a desktop computer or on a cluster
-
Use Spark’s Resilient Distributed Datasets to process and analyze large data sets across many CPU’s
-
Implement iterative algorithms such as breadth-first-search using Spark
-
Use the MLLib machine learning library to answer common data mining questions
-
Understand how Spark SQL lets you work with structured data
-
Understand how Spark Streaming lets your process continuous streams of data in real time
-
Tune and troubleshoot large jobs running on a cluster
-
Share information between nodes on a Spark cluster using broadcast variables and accumulators
-
Understand how the GraphX library helps with network analysis problems
Requirements
-
Access to a personal computer. This course uses Windows, but the sample code will work fine on Linux as well.
-
Some prior programming or scripting experience. Python experience will help a lot, but you can pick it up as we go.
Who this course is for:
- People with some software development background who want to learn the hottest technology in big data analysis will want to check this out. This course focuses on Spark from a software development standpoint; we introduce some machine learning and data mining concepts along the way, but that’s not the focus. If you want to learn how to use Spark to carve up huge datasets and extract meaning from them, then this course is for you.
- If you’ve never written a computer program or a script before, this course isn’t for you – yet. I suggest starting with a Python course first, if programming is new to you.
- If your software development job involves, or will involve, processing large amounts of data, you need to know about Spark.
- If you’re training for a new career in data science or big data, Spark is an important part of it.