Building Batch Data Pipelines on Google Cloud

This course is part of multiple programs.

Taught in English

Instructor: Google Cloud Training

45,084 already enrolled

Included with Coursera Plus

Learn more

Course

Gain insight into a topic and learn the fundamentals

4.5

(1,672 reviews)

85%

Intermediate level

Some related experience required

17 hours (approximately)

Flexible schedule

Learn at your own pace

View course modules

What you'll learn

Review different methods of data loading: EL, ELT and ETL and when to use what
Run Hadoop on Dataproc, leverage Cloud Storage, and optimize Dataproc jobs
Build your data processing pipelines using Dataflow
Manage data pipelines with Data Fusion and Cloud Composer

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

4 quizzes

Course

Gain insight into a topic and learn the fundamentals

4.5

(1,672 reviews)

85%

Intermediate level

Some related experience required

17 hours (approximately)

Flexible schedule

Learn at your own pace

View course modules

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

Build your subject-matter expertise

This course is available as part of

When you enroll in this course, you'll also be asked to select a specific program.

Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV

Share it on social media and in your performance review

There are 6 modules in this course

Data pipelines typically fall under one of the Extract and Load (EL), Extract, Load and Transform (ELT) or Extract, Transform and Load (ETL) paradigms. This course describes which paradigm should be used and when for batch data. Furthermore, this course covers several technologies on Google Cloud for data transformation including BigQuery, executing Spark on Dataproc, pipeline graphs in Cloud Data Fusion and serverless data processing with Dataflow. Learners get hands-on experience building data pipeline components on Google Cloud using Qwiklabs.

In this module, we introduce the course and agenda

What's included

1 video

This module reviews different methods of data loading: EL, ELT and ETL and when to use what

What's included

6 videos1 quiz

This module shows how to run Hadoop on Dataproc, how to leverage Cloud Storage, and how to optimize your Dataproc jobs.

What's included

11 videos1 quiz1 app item

11 videosTotal 48 minutes

Module introduction0 minutesPreview module
The Hadoop ecosystem4 minutes
Running Hadoop on Dataproc11 minutes
Cloud Storage instead of HDFS6 minutes
Optimizing Dataproc2 minutes
Optimizing Dataproc Storage9 minutes
Optimizing Dataproc Templates and Autoscaling4 minutes
Optimizing Dataproc Monitoring3 minutes
Lab Intro: Running Apache Spark jobs on Dataproc0 minutes
Getting Started with Google Cloud and Qwiklabs4 minutes
Summary0 minutes

1 quizTotal 6 minutes

Executing Spark on Dataproc6 minutes

1 app itemTotal 90 minutes

Running Apache Spark jobs on Dataproc90 minutes

This module covers using Dataflow to build your data processing pipelines

What's included

13 videos1 reading1 quiz6 app items

13 videosTotal 34 minutes

Module introduction0 minutesPreview module
Introduction to Dataflow5 minutes
Why customers value Dataflow2 minutes
Building Dataflow Pipelines in code3 minutes
Key considerations with designing pipelines2 minutes
Transforming data with PTransforms3 minutes
Lab Intro: Building a Simple Dataflow Pipeline0 minutes
Aggregate with GroupByKey and Combine5 minutes
Lab Intro: MapReduce in Beam0 minutes
Side Inputs and Windows of data4 minutes
Lab Intro: Serverless Data Analysis with Dataflow: Side Inputs0 minutes
Creating and re-using Pipeline Templates3 minutes
Summary2 minutes

1 readingTotal 1 minute

Completing Labs in this course1 minute

1 quizTotal 4 minutes

Serverless Data Processing with Dataflow4 minutes

6 app itemsTotal 540 minutes

A Simple Dataflow Pipeline (Python)90 minutes
Serverless Data Analysis with Dataflow: A Simple Dataflow Pipeline (Java)90 minutes
MapReduce in Beam (Python)90 minutes
Serverless Data Analysis with Beam: MapReduce in Beam (Java)90 minutes
Serverless Data Analysis with Dataflow: Side Inputs (Python)90 minutes
Serverless Data Analysis with Dataflow: Side Inputs (Java)90 minutes

This module shows how to manage data pipelines with Cloud Data Fusion and Cloud Composer.

What's included

13 videos1 quiz2 app items

13 videosTotal 29 minutes

Module introduction0 minutesPreview module
Introduction to Cloud Data Fusion3 minutes
Components of Cloud Data Fusion1 minute
Cloud Data Fusion UI1 minute
Build a pipeline4 minutes
Explore data using wrangler1 minute
Lab Intro: Building and executing a pipeline graph in Cloud Data Fusion0 minutes
Orchestrate work between Google Cloud services with Cloud Composer1 minute
Apache Airflow Environment1 minute
DAGs and Operators4 minutes
Workflow scheduling5 minutes
Monitoring and Logging3 minutes
Lab Intro: An Introduction to Cloud Composer0 minutes

1 quizTotal 2 minutes

Manage Data Pipelines with Cloud Data Fusion and Cloud Composer2 minutes

2 app itemsTotal 240 minutes

Building and Executing a Pipeline Graph with Data Fusion150 minutes
Lab: An Introduction to Cloud Composer90 minutes

Course Summary

What's included

1 video

Instructor

Instructor ratings

4.6 (244 ratings)

Google Cloud Training

Google Cloud

1,312 Courses2,519,743 learners

Offered by

Google Cloud

Recommended if you're interested in Cloud Computing

Google Cloud
Building Resilient Streaming Analytics Systems on Google Cloud
Course
Google Cloud
Smart Analytics, Machine Learning, and AI on Google Cloud
Course
Google Cloud
Preparing for your Professional Data Engineer Journey
Course
Google Cloud
Cost Optimization and Data Tiering with BigLake and Cloud Storage
Project

Why people choose Coursera for their career

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Learner reviews

Showing 3 of 1672

4.5

1,672 reviews

5 stars
65.96%
4 stars
25.53%
3 stars
6.16%
2 stars
1.49%
1 star
0.83%

Reviewed on Feb 1, 2020

Reviewed on Sep 29, 2020

Reviewed on May 27, 2020

View more reviews

New to Cloud Computing? Start here.

Open new doors with Coursera Plus

Unlimited access to 7,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription

Learn more

Advance your career with an online degree

Earn a degree from world-class universities - 100% online

Explore degrees

Join over 3,400 global companies that choose Coursera for Business

Upskill your employees to excel in the digital economy

Learn more

Frequently asked questions

Yes, you can preview the first video and view the syllabus before you enroll. You must purchase the course to access content not included in the preview.

If you decide to enroll in the course before the session start date, you will have access to all of the lecture videos and readings for the course. You’ll be able to submit assignments once the session starts.

Once you enroll and your session begins, you will have access to all videos and other resources, including reading items and the course discussion forum. You’ll be able to view and submit practice assessments, and complete required graded assignments to earn a grade and a Course Certificate.

Building Batch Data Pipelines on Google Cloud

Course

What you'll learn

Details to know

Course

See how employees at top companies are mastering in-demand skills

Build your subject-matter expertise

Earn a career certificate

There are 6 modules in this course

Introduction

What's included

Introduction to Building Batch Data Pipelines

What's included

Executing Spark on Dataproc

What's included

Serverless Data Processing with Dataflow

What's included

Manage Data Pipelines with Cloud Data Fusion and Cloud Composer

What's included

Course Summary

What's included

Instructor

Offered by

Recommended if you're interested in Cloud Computing

Building Resilient Streaming Analytics Systems on Google Cloud

Smart Analytics, Machine Learning, and AI on Google Cloud

Preparing for your Professional Data Engineer Journey

Cost Optimization and Data Tiering with BigLake and Cloud Storage

Why people choose Coursera for their career

Learner reviews

New to Cloud Computing? Start here.

Open new doors with Coursera Plus

Advance your career with an online degree

Join over 3,400 global companies that choose Coursera for Business

Frequently asked questions

Can I preview a course before enrolling?

When will I have access to the lectures and assignments?

What will I get when I enroll?

More questions