0% found this document useful (0 votes)
41 views3 pages

Data Engineering Course Outline

Uploaded by

Blannon Ngoge
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views3 pages

Data Engineering Course Outline

Uploaded by

Blannon Ngoge
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Data Engineering Course Outline

Week 1-2: Introduction to Data Engineering

What is Data Engineering?


Overview of roles and responsibilities
Key components of data infrastructure
Data Engineer vs. Data Scientist
Understanding the collaboration between data engineers and data scientists
Data Pipelines
Overview of data pipeline design and automation
Key concepts: extraction, transformation, and loading (ETL)

Week 3-4: Prerequisites and Core Skills

Educational Background and Career Paths


Degrees in computer science, mathematics, or related fields
Non-traditional paths (self-taught, bootcamps, etc.)
Foundational Skills
Programming Basics (Python):
Syntax, control flow, data structures, and functions
Computer Science Fundamentals:
Algorithms, memory management, and data complexity
SQL for Data Engineering:
Database querying, filtering, and data manipulation using SQL

Week 5-6: Databases and Storage Solutions

Relational Databases (MySQL, PostgreSQL)


Schema design, normalization, and querying techniques
NoSQL Databases (MongoDB, Cassandra)
Types of NoSQL databases and their use cases
Data Warehousing
Introduction to cloud data warehousing solutions (e.g., Amazon Redshift, Google
BigQuery)
ETL processes and data modeling

Week 7-8: Data Processing Techniques

ETL Processes
Extraction, transformation, and loading methods
Tools for data integration
Batch and Streaming Processing
Differences between batch and real-time data processing
Use cases for each approach
Hands-on Projects
Building small-scale ETL pipelines using popular tools
Week 9-10: Cloud Computing for Data Engineers

Introduction to Cloud Platforms (AWS, GCP)


Understanding cloud services (compute, storage, databases)
Hands-on with cloud setup for data engineering tasks
Cloud-Based Data Storage Solutions
Comparison of storage options (S3, GCS, etc.)
Implementing cloud-based pipelines

Week 11-12: Big Data Technologies

Hadoop Ecosystem
Distributed data storage and processing (HDFS, MapReduce, YARN)
Apache Spark
Introduction to Spark for large-scale data processing
In-memory vs. disk-based processing
Hands-on Projects
Data processing using Hadoop and Spark

Week 13-16: Building Data Pipelines

Data Pipeline Design


Extracting data from various sources (APIs, web scraping, databases)
Transforming data into usable formats
Pipeline Orchestration Tools (Apache Airflow, Luigi, Prefect)
Building and automating workflows
Hands-on Practice
Developing, testing, and deploying data pipelines

Week 17-18: Advanced Data Engineering Skills

Machine Learning Integration


Building data pipelines for ML models (preprocessing, feature engineering)
Distributed Systems and DevOps
Fault tolerance, scalability, and CI/CD for data pipelines
Data Security and Governance
Access control, encryption, and compliance in data engineering

Week 19-20: Final Projects and Real-World Applications

Beginner Projects
Building a simple web scraper and basic data cleaning
Intermediate Projects
Cloud-based data warehouse setup or recommendation engine
Advanced Projects
Machine learning pipelines and real-time analytics dashboards

You might also like