Spark and Python for Big Data with PySpark
Learn how to use Spark with Python, including Spark Streaming, Machine Learning, Spark 2.0 DataFrames and more!
What you’ll learn
-
Use Python and Spark together to analyze Big Data
-
Learn how to use the new Spark 2.0 DataFrame Syntax
-
Work on Consulting Projects that mimic real world situations!
-
Classify Customer Churn with Logisitic Regression
-
Use Spark with Random Forests for Classification
-
Learn how to use Spark’s Gradient Boosted Trees
-
Use Spark’s MLlib to create Powerful Machine Learning Models
-
Learn about the DataBricks Platform!
-
Get set up on Amazon Web Services EC2 for Big Data Analysis
-
Learn how to use AWS Elastic MapReduce Service!
-
Learn how to leverage the power of Linux with a Spark Environment!
-
Create a Spam filter using Spark and Natural Language Processing!
-
Use Spark Streaming to Analyze Tweets in Real Time!
Requirements
-
General Programming Skills in any Language (Preferrably Python)
-
20 GB of free space on your local computer (or alternatively a strong internet connection for AWS)
Who this course is for:
- Someone who knows Python and would like to learn how to use it for Big Data
- Someone who is very familiar with another programming language and needs to learn Spark