Distributed Data Systems
★★★★ Master Level
Distributed data systems can achieve remarkably high performance and are key for organizations to deal with the ever-growing data volumes. If these systems are correctly configured, they can compute results faster than ever.

Language
English
Duration
2 days
Time
9:00-17:00
Certification
Yes
Lunch
Included
Recommended
Level
Master
Upcoming courses
Currently there are no scheduled dates for this course. To be notified about upcoming dates, please choose "Reserve a seat".
Select tickets
We're sorry, but all tickets sales have ended because the event is expired.
*If you are a group of 5 or more, we are happy to accommodate a date for the training that suits you best. If so, please choose the "Reserve a seat" option.
Distributed data systems
About the course
The amount of data generated globally grows at an exponential rate, doubling every 2 years. Accordingly, the data volumes to be processed within organizations have also seen a rapid growth. Whether it is about identifying fraudulent activities based on analyzing billions of transactional records or about analyzing the flow of millions of customers on your website to increase conversions, the data volumes concerned cannot be handled by traditional data systems. Distributed data systems can achieve remarkably high performance and are key for organizations to deal with the ever-growing data volumes. If these systems are correctly configured, they can compute results faster than ever.For whom
This course is designed specifically for Data Scientists and Data Engineers. Many of the skills covered in this course involve preexisting knowledge outlined in the web scraping pre-work accompanying this badge, along with the Data Models and Manipulation (4204) badge. Participants must have experience interacting with APIs and expert programming skills in SQL and Python to keep up with this course.What you’ll learn
Principles of distributed data systems- Able to explain how distributed storage and distributed processing can strengthen each other
- Able to explain the concepts of partitioning and multi-node processing
- Able to identify which distributed data system to use for your use case
- Able to identify if a distributed system scales in an optimal way
- Able to design queries for optimal parallelization of processing jobs
- Able to configure distributed data systems for optimized performance versus costs
- Able to implement a distributed data system using Apache Spark