Data quality checks in your dbt flow.
This repository helps you understand the different data quality checks available in dbt.
You can clone the repository, create a supabase account, write the .env file and run the commands according to the environment setup, and get an environment where you can practice testing with dbt for free.
I'm Bruno Gonzalez from 🇺🇾, working as a senior data engineer, and writing about data quality and data engineering.
Postgres database: supabase. Create a free account and get the credentials to create the .env file.
Create .env file with the following structure:
POSTGRES_HOST=<postgres_host>
POSTGRES_USER=<postgres_user>
POSTGRES_PASSWORD=<postgres_password>
POSTGRES_DATABASE=<postgres_database>
Commands to setup the environment:
conda create -n dbtdq python=3.9
conda activate dbtdq
pip install -r requirements.txt
export $(cat .env | xargs)
dbt seed
dbt run
dbt deps
Modified from dbt-labs jaffle_shop.
Changes:
seeds/raw_customers.csv- Added customer
101withoutfirst_name. - Added customer
102withoutlast_name. - Added customer
103with a differentlast_namepattern. - Added customer
104with inconsistent case infirst_name.
- Added customer
seeds/raw_orders.csv- Duplicated order with
id = 98 - Added order
100with anorder_datein the future. - Added order
101with an inexistentuser_id. - Added order
102with a wrongstatus. - Added order
103without issues.
- Duplicated order with
seeds/raw_payments.csv- Added payment
114for order100with a wrongpayment_method. - Added payment
115for order103with a huge amount (outlier).
- Added payment