Apache Spark with Scala Hands On with Big Data
Dive right in with 20+ hands-on examples of analyzing large data sets with Apache Spark, on your desktop or on Hadoop!
What you’ll learn
-
Frame big data analysis problems as Apache Spark scripts
-
Develop distributed code using the Scala programming language
-
Optimize Spark jobs through partitioning, caching, and other techniques
-
Build, deploy, and run Spark scripts on Hadoop clusters
-
Process continual streams of data with Spark Streaming
-
Transform structured data using SparkSQL, DataSets, and DataFrames
-
Traverse and analyze graph structures using GraphX
-
Analyze massive data set with Machine Learning on Spark
Requirements
-
Some prior programming or scripting experience is required. A crash course in Scala is included, but you need to know the fundamentals of programming in order to pick it up.
-
You will need a desktop PC and an Internet connection. The course is created with Windows in mind, but users comfortable with MacOS or Linux can use the same tools.
-
The software needed for this course is freely available, and I’ll walk you through downloading and installing it.
Who this course is for:
- Software engineers who want to expand their skills into the world of big data processing on a cluster
- If you have no previous programming or scripting experience, you’ll want to take an introductory programming course first.