germany-grid-pipeline

End-to-end data engineering pipeline for German electricity market data: ingest from the public SMARD API, land raw JSON in Amazon S3, then transform with PySpark and Delta Lake on Databricks using a Bronze → Silver → Gold layout. Job definitions live under jobs/.

Architecture

flowchart LR
  API[SMARD API] -->|Python ingest| S3[(S3 raw-data)]
  S3 -->|JSON| Bronze[Bronze Delta]
  Bronze -->|PySpark| Silver[Silver Delta]
  Silver -->|Aggregates & features| Gold[Gold Delta]

Layer	Purpose
Bronze	Raw API-shaped JSON from S3 → Delta tables
Silver	Flattened time series (timestamps, values), cleansed columns
Gold	Tables ready for Business Analyst and ML Engineers

Repository layout

├── src/api/                 # Python SMARD client (local + S3 ingest)
├── databricks_notebooks/    # Bronze / Silver / Gold notebooks (.py source)
├── jobs/                    # Databricks Job JSON (see jobs/README.md)
├── req.txt
└── README.md

Local setup (ingestion)

python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -r req.txt

For S3 upload (smard_clientS3.py), configure AWS credentials (e.g. ~/.aws/credentials locally, or an IAM instance profile on Databricks).

Databricks jobs & notebook paths

After you clone this repo with Databricks Repos, update notebook_path in jobs/*.json so they match your workspace. See jobs/README.md for placeholders (YOUR_DATABRICKS_USER, repo folder name).

Cross-job “run next job” tasks were removed from the JSON exports because job IDs are workspace-specific—chain bronze → silver → gold in the Jobs UI or re-add Run job tasks after import.

Tech stack

Python · requests · boto3 · Databricks · PySpark · Delta Lake · Unity Catalog

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

germany-grid-pipeline

Architecture

Repository layout

Local setup (ingestion)

Databricks jobs & notebook paths

Tech stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
databricks_notebooks		databricks_notebooks
jobs		jobs
src/api		src/api
.gitignore		.gitignore
README.md		README.md
req.txt		req.txt

Folders and files

Latest commit

History

Repository files navigation

germany-grid-pipeline

Architecture

Repository layout

Local setup (ingestion)

Databricks jobs & notebook paths

Tech stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages