Data Lineage Visualizer

Interactive data lineage visualization for Microsoft SQL Server family databases

⚠️ Proof of Concept: This tool was developed and tested with Azure Synapse Analytics dedicated SQL pools and Azure SQL Database. Currently supports only Microsoft SQL Server family (SQL Server, Azure SQL, Synapse Analytics, Fabric). See disclaimers below.

Analyze dependencies between tables, views, and stored procedures with an interactive graph interface.

📺 Watch the Demo Video

▶️ Watch Video Demo • Quick Start • Features • Documentation • Live Demo • Disclaimers

Interactive Graph Visualization

Check out Live Demo

Why Data Lineage Visualizer?

✅ YAML-Based Parser - Pure regex extraction with metadata catalog validation
⚡ 5-Minute Setup - One command installation
🔧 Business-Maintainable - YAML rule engine, no Python required for rule changes
🔌 Flexible - Parquet upload OR direct database connection
📊 Interactive - Trace mode, schema filtering, full-text search
🧪 Extensible - MIT licensed, YAML-based dialect system for easy adaptation

Quick Start

Option 1: Docker (Recommended - One Command)

docker run -d -p 8000:8000 -v data-lineage-config:/app/config --name data-lineage chwagneraltyca/data-lineage-visualizer:latest

Access: http://localhost:8000

Option 2: Local Installation

# Install and run (Production mode - optimized for performance)
git clone https://github.com/your-org/data_lineage.git
cd data_lineage
pip install -r requirements.txt
./start-app.sh

Access:

Frontend: http://localhost:3000
API Docs: http://localhost:8000/docs

Startup Modes:

./start-app.sh - Production mode (default)
./start-app.sh dev - Development mode with HMR (slower inital load ~2min due to React Flow dev mode)
./start-app.sh --rebuild - Force rebuild production bundle

Next Steps: Upload Parquet files or configure database connection

Setup Guides:

QUICKSTART.md - Detailed installation and setup
.devcontainer/README.md - VSCode development environment
.docker/README.md - Docker deployment guide

Features

Core Capabilities

Feature	Description
Interactive Graph	Pan, zoom, explore with React Flow
Trace Mode	Analyze upstream/downstream dependencies (BFS traversal)
SQL Viewer	Monaco Editor with syntax highlighting
Smart Filtering	Schema, type, pattern-based, and focus filtering
Search	Full-text search across all definitions

Data Sources

Method	Use Case
Parquet Upload	Manual metadata extraction (default)
Database Direct	Refresh from SQL Server/Azure SQL/Synapse/Fabric
JSON Export	Share and version control lineage data

Supported Databases

Implemented and Tested:

✅ Azure Synapse Analytics (dedicated SQL pools) - tested
✅ Azure SQL Database - Tested with database direct import
✅ SQL Server - Uses same T-SQL connector as Synapse/Azure SQL
✅ Microsoft Fabric - Uses T-SQL dialect, same connector as SQL Server/Synapse

Note: Only the Microsoft SQL Server family is currently supported. The YAML-based architecture allows for extension to other SQL dialects, but these are not yet implemented.

Architecture

System Flow:

Input - Parquet files (manual upload) or Database Direct (SQL Server/Azure SQL/Synapse/Fabric) or JSON import
Storage - FastAPI + DuckDB analytics workspace
Processing - YAML Rule Engine applies dialect-specific regex patterns
Extraction - Regex-based dependency extraction (FROM/JOIN, INSERT/UPDATE/MERGE, EXEC, SELECT INTO)
Validation - Metadata catalog validates extracted dependencies (removes false positives)
Output - JSON format with validated lineage data
Visualization - React + React Flow interactive graph

Parser: Pure YAML regex patterns extract dependencies, validated against metadata catalog

Details: See docs/ARCHITECTURE.md

Documentation

Document	Audience	Purpose
QUICKSTART.md	Users	5-minute deployment guide
CONFIGURATION.md	Users/DBAs	Environment variables, database setup
DATA_SPECIFICATIONS.md	Developers/DBAs	Data contracts, interface specifications, API endpoints
ARCHITECTURE.md	Developers	System design, parser internals, rule engine
DEVELOPMENT.md	Contributors	Development environment setup and configuration

Disclaimers

⚠️ Proof of Concept Status

This tool was developed as a proof of concept using Claude Code and tested specifically with:

Azure Synapse Analytics dedicated SQL pools (tested)
Azure SQL Database (tested with database direct import feature)

Production Status:

✅ Parser extensively tested with real-world stored procedures
✅ Core functionality validated in Azure environment
✅ Supports Microsoft SQL Server family only (SQL Server, Azure SQL, Synapse, Fabric)

🔧 Maintenance & Development

This repository is published as-is with the following expectations:

Active Support (Initial Period):

Bug fixes for critical issues
Documentation improvements
Security patches if needed

Long-Term:

No active feature development planned
Community contributions welcome via pull requests
Issues will be reviewed but fixes not guaranteed
Consider this a reference implementation

📋 No Warranties

This software is provided "as is" under the MIT License:

No guarantees of fitness for any particular purpose
No liability for data loss or system issues
No SLA or support commitments
Test thoroughly in your environment before production use

🎯 Intended Use

Best suited for:

Understanding SQL object dependencies in Microsoft SQL Server family environments
Learning how to build lineage visualization tools
Reference implementation for YAML-based SQL parsing

Not recommended for:

Mission-critical production lineage without thorough testing
Databases outside the Microsoft SQL Server family (not currently supported)
Environments requiring guaranteed support or updates

🔐 Security Considerations

Never commit credentials to version control
Use Azure Key Vault or similar for production secrets
This tool connects directly to your database - restrict access appropriately
Uploaded Parquet files may contain sensitive metadata - handle accordingly

📚 Extensibility

The YAML-based architecture was designed for adaptability:

Customize parsing rules without Python code changes
Add new extraction patterns via YAML rules
Add support for new SQL dialects with generic development effort (see ARCHITECTURE.md for dialect extension guide)

See engine/rules/ for YAML rule examples.

⚠️ Not Supported

SQL Parsing:

Cross-database lineage: Parser only tracks dependencies within a single database
Dynamic SQL: Cannot parse dynamically constructed SQL statements (e.g., EXEC(@sql), sp_executesql)
Linked server queries: Remote object references not tracked
Column-level tacing: Tool supports only object-level tracing

Dialect Support:

Supports only Microsoft SQL Server family only (SQL Server, Azure SQL, Synapse Analytics, Fabric)
Other SQL dialects could be added through generic development effort (YAML rules + dialect implementation)
ANSI SQL patterns in engine/rules/defaults/ provide foundation for new dialects

License

MIT License - Free to use, modify, and distribute (even commercially).

Simple terms: Do whatever you want with this code, but I'm not responsible if something breaks.

See LICENSE for the official text.

Contributing

Community contributions are welcome! See DEVELOPMENT.md for environment setup.

Please note: While contributions are welcome, active maintenance and review may be limited. Consider this when planning contributions.

Support

Demo: Try the live demo at https://datalineage.chwagner.eu/
Documentation: docs/ - Comprehensive guides and specifications
Quick Help: QUICKSTART.md - 5-minute deployment guide

Acknowledgments

Developed using Claude Code (Anthropic)
Tested with Adventure Works sample database (Microsoft)
Built on FastAPI, React, DuckDB, React Flow, and Graphology

Built with: FastAPI • React • DuckDB • React Flow • Graphology Status: Proof of Concept - Production tested with Azure Synapse/SQL Author: Christian Wagner License: MIT

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.azure-deploy		.azure-deploy
.devcontainer		.devcontainer
.docker		.docker
.vscode		.vscode
api		api
docs		docs
engine		engine
frontend		frontend
requirements		requirements
util		util
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
QUICKSTART.md		QUICKSTART.md
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt
setup-env.sh		setup-env.sh
start-app.sh		start-app.sh
stop-app.sh		stop-app.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Lineage Visualizer

📺 Watch the Demo Video

Interactive Graph Visualization

Why Data Lineage Visualizer?

Quick Start

Option 1: Docker (Recommended - One Command)

Option 2: Local Installation

Features

Core Capabilities

Data Sources

Supported Databases

Architecture

Documentation

Disclaimers

⚠️ Proof of Concept Status

🔧 Maintenance & Development

📋 No Warranties

🎯 Intended Use

🔐 Security Considerations

📚 Extensibility

⚠️ Not Supported

License

Contributing

Support

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

ChrisDevRepo/data_lineage

Folders and files

Latest commit

History

Repository files navigation

Data Lineage Visualizer

📺 Watch the Demo Video

Interactive Graph Visualization

Why Data Lineage Visualizer?

Quick Start

Option 1: Docker (Recommended - One Command)

Option 2: Local Installation

Features

Core Capabilities

Data Sources

Supported Databases

Architecture

Documentation

Disclaimers

⚠️ Proof of Concept Status

🔧 Maintenance & Development

📋 No Warranties

🎯 Intended Use

🔐 Security Considerations

📚 Extensibility

⚠️ Not Supported

License

Contributing

Support

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages