Contact Us
We would love to hear from you. Please complete this form to pre-book or request further information about our delivery options.

2 Days

(Online and onsite)

Price Upon Request
After completing this course, you will be able to:
- Write your own Python programs that can interact with Spark
- Implement data stream consumption using Apache Spark
- Recognize common operations in Spark to process known data streams
- Integrate Spark streaming with Amazon Web Services
- Create a collaborative filtering model with Python and the movielens dataset
- Apply processed data streams to Spark machine learning APIs
Lesson 1: Introduction to Spark Distributed Processing
- Introduction to Spark and Resilient Distributed Datasets
- Operations Supported by the RDD API
- Self-Contained Python Spark Programs
- Introduction to SQL, Datasets, and DataFrames
Lesson 2: Introduction to Spark Streaming
- Streaming Architectures
- Introduction to Discretized Streams
- Windowing Operations
- Introduction to Structured Streaming
Lesson 3: Spark Streaming Integration with AWS
- Spark Integration with AWS Services
- Integrating AWS Kinesis and Python
- AWS S3 Basic Functionality
Lesson 4: Spark Streaming, ML, and Windowing Operations
- Spark Integration with Machine Learning
Big Data Processing with Apache Spark is for you if you are a software engineer, architect, or IT professional who wants to explore distributed systems and big data analytics. Although you don‘t need any knowledge of Spark, prior experience of working with Python is recommended.
Hardware:
For an optimal experience with the hands-on labs and other practical activities, we recommend the following hardware configuration:
- Processor: Intel Core i5 or equivalent
- Memory: 4GB RAM
- Storage: 35 GB available space
Software:
- OS: Windows 7 SP1 64-bit, Windows 8.1 64-bit or Windows 10 64-bit
- PostgreSQL 9.0 or above
- Python 3.0 or above
- Spark 2.3.0
- Amazon Web Services (AWS) account
After completing this course, you will be able to:
- Write your own Python programs that can interact with Spark
- Implement data stream consumption using Apache Spark
- Recognize common operations in Spark to process known data streams
- Integrate Spark streaming with Amazon Web Services
- Create a collaborative filtering model with Python and the movielens dataset
- Apply processed data streams to Spark machine learning APIs
Lesson 1: Introduction to Spark Distributed Processing
- Introduction to Spark and Resilient Distributed Datasets
- Operations Supported by the RDD API
- Self-Contained Python Spark Programs
- Introduction to SQL, Datasets, and DataFrames
Lesson 2: Introduction to Spark Streaming
- Streaming Architectures
- Introduction to Discretized Streams
- Windowing Operations
- Introduction to Structured Streaming
Lesson 3: Spark Streaming Integration with AWS
- Spark Integration with AWS Services
- Integrating AWS Kinesis and Python
- AWS S3 Basic Functionality
Lesson 4: Spark Streaming, ML, and Windowing Operations
- Spark Integration with Machine Learning
Big Data Processing with Apache Spark is for you if you are a software engineer, architect, or IT professional who wants to explore distributed systems and big data analytics. Although you don‘t need any knowledge of Spark, prior experience of working with Python is recommended.
Hardware:
For an optimal experience with the hands-on labs and other practical activities, we recommend the following hardware configuration:
- Processor: Intel Core i5 or equivalent
- Memory: 4GB RAM
- Storage: 35 GB available space
Software:
- OS: Windows 7 SP1 64-bit, Windows 8.1 64-bit or Windows 10 64-bit
- PostgreSQL 9.0 or above
- Python 3.0 or above
- Spark 2.3.0
- Amazon Web Services (AWS) account