Automating Data Pipelines ★★★ Expert Level
This two-day course will provide a way of thinking and best practices for automating data pipelines from scratch, starting with the design then continuing with implementing orchestration and making scripts automatable. It concludes with the components you need to monitor and manage an automated data pipeline, showing options on how to implement those components.
Currently there are no scheduled dates for this course. To be notified about upcoming dates, please choose "Reserve a seat".
We're sorry, but all tickets sales have ended because the event is expired.
*If you are a group of 5 or more, we are happy to accommodate a date for the training that suits you best. If so, please choose the "Reserve a seat" option.
Automating Data Pipelines
About the courseAutomating your data pipeline ensures its speed and quality. This two-day module will teach you how to design a data pipeline that you can apply to all work which involves processing data. We will cover everything from the basics of creating automatable data pipelines, to orchestrating scripts and monitoring an automated data pipeline. You will learn the management of data pipelines including the why, when, and how of making code idempotent and improving scripts. All of this is combined with implementing basic logging and validation checks, so you walk away with a complete understanding and the confidence to apply your new knowledge to your business. This course follows a five-step structured approach, combining theory with case studies so participants gain both a theoretical understanding and the experience and confidence to use their knowledge within the business. After completing this course, participants will be able to successfully design and develop an automated data pipeline with the appropriate quality control measures to monitor its performance over time.
Why this is for youDo you still manually run scripts or perform checks in a periodic data process? Do you feel like you could improve the speed of your data pipeline by redesigning it? Do you run into bugs with your data pipeline? This course will teach you an improved way of thinking when it comes to designing, developing, and monitoring data pipelines and give you hands-on experience with widely used tools to achieve this. Reduce the time spent on doing things manually that could be automated, and learn how to manage them well.
For whomThis course is designed for AI Engineers, Data Engineers, and Data Scientists who have experience with manipulating data and programming and are looking to automate the flow of their data from source to models and applications. Before signing up for this course we require you to have completed both the Data Models and Manipulation (4204) and Programming Meta-Skills (4205) badges. Expert programming in SQL and Python is also required as a prerequisite as both languages are used on advanced levels in the cases during this course.
What you’ll learn
- Designing an E2E automated data pipeline for an E2E AI solution
- Defining and orchestrating components to create an automated data pipeline
- Ways of changing and improving scripts to make them automatable
- Managing quality of automated data pipelines
- Implementing quality control measures in your data pipelines
- Design data pipelines – Based on design and data flow requirements for E2E AI solution
- Orchestrate scripts – Automate data pipelines and define requirements from each component
- Write automatable code – Change and improve scripts to make them automatable
- Manage quality of automated data pipelines – Define components to monitor data pipeline quality and plan how to act on irregularities