Azure Projects Repository

This collection showcases end-to-end data engineering solutions and advanced analytics implementations. Below, you’ll find highlights of my work with Azure services.

Featured Projects

1. Azure End-to-End Data Engineering Project

An end-to-end solution leveraging the Medallion Architecture to ingest, process, and analyze data using Azure Data Factory (ADF) and Azure Databricks.

Key Features:

Staged data transformations across Bronze, Silver, and Gold layers.
Automated pipelines for data ingestion, transformation, and load.
Integration with Power BI for seamless analytics.

2. Azure Serverless Logical Data Warehouse

A demonstration of serverless analytics with Azure Synapse Analytics, showcasing advanced SQL features and seamless integration with Power BI for insights.

Key Features:

Implemented CETAS (Create External Tables As Select) and Incremental Load Design.
Utilized Change Data Capture (CDC) for real-time updates.
Demonstrated SQL performance monitoring and query optimization.

3. Azure Data Factory Pipelines

Multiple pipelines developed to demonstrate below functions

Data Ingestion Pipeline: Automated ingestion of structured and unstructured data into Azure Data Lake.
Transformations Pipeline: ETL workflows built for scalable data processing.
Orchestration Pipeline: Dependencies managed using pipeline chaining, conditional execution, and alerts for monitoring.

4. Azure Databricks

ETL

Scalable enterprise data platform built with Azure Databricks and Azure Data Factory
Automated, end-to-end ETL for car sales data, incrementally loading from GitHub API and Azure SQL Database into ADLS Gen2 using parameterized ADF pipelines
Data processed through the Medallion architecture (Bronze, Silver, Gold layers) orchestrated by Databricks Workflows
Implements Change Data Capture (CDC) for fact tables and Slowly Changing Dimensions (SCD Type 1) for dimension tables
Enforces data governance and security with Unity Catalog
Delivers a star schema modeled in Delta tables for efficient analytics and BI use

DLT

Project with steps for a data processing pipeline using Delta Live Tables (DLT) showcasing -

Incremental Loading: Streaming Tables automatically process only new data on each pipeline run.
Schema Evolution: Adding/modifying columns or renaming tables is handled automatically by DLT.
Autoloader Integration: Integrated Autoloader (spark.readStream.format("cloudFiles")) to ingest files from a landing volume. Configured with options for schema hinting, schema location, file format, and path glob filter. DLT managed checkpoint location for Autoloader automatically.
Append Flow: Used @dlt.append_flow to combine streaming data from multiple sources into a union Streaming Table.
Passing Parameters (Dynamic Tables): Pipeline configurations can be accessed within the DLT notebook using spark.conf.get. Example: dynamically creating separate Gold Materialized Views filtered by order status.
Change Data Capture (CDC) with apply_changes: Used @dlt.apply_changes for SCD Type 1 and 2. Tracked historical changes and handled deletes/truncates. Updated downstream logic to read from SCD Type 2 table and filter for active records.
Data Quality with Expectations: Defined rules using @dlt.expect and @dlt.expect_all. Actions: Warning (default), Drop, Fail. Data quality metrics shown in UI and event logs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Azure Projects Repository

This collection showcases end-to-end data engineering solutions and advanced analytics implementations. Below, you’ll find highlights of my work with Azure services.

Featured Projects

1. Azure End-to-End Data Engineering Project

2. Azure Serverless Logical Data Warehouse

3. Azure Data Factory Pipelines

4. Azure Databricks

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Azure Projects Repository

This collection showcases end-to-end data engineering solutions and advanced analytics implementations. Below, you’ll find highlights of my work with Azure services.

Featured Projects

1. Azure End-to-End Data Engineering Project

2. Azure Serverless Logical Data Warehouse

3. Azure Data Factory Pipelines

4. Azure Databricks