Interactive data lineage visualization for Microsoft SQL Server family databases
⚠️ Proof of Concept: This tool was developed and tested with Azure Synapse Analytics dedicated SQL pools and Azure SQL Database. Currently supports only Microsoft SQL Server family (SQL Server, Azure SQL, Synapse Analytics, Fabric). See disclaimers below.
Analyze dependencies between tables, views, and stored procedures with an interactive graph interface.
- ✅ YAML-Based Parser - Pure regex extraction with metadata catalog validation
- ⚡ 5-Minute Setup - One command installation
- 🔧 Business-Maintainable - YAML rule engine, no Python required for rule changes
- 🔌 Flexible - Parquet upload OR direct database connection
- 📊 Interactive - Trace mode, schema filtering, full-text search
- 🧪 Extensible - MIT licensed, YAML-based dialect system for easy adaptation
docker run -d -p 8000:8000 -v data-lineage-config:/app/config --name data-lineage chwagneraltyca/data-lineage-visualizer:latestAccess: http://localhost:8000
# Install and run (Production mode - optimized for performance)
git clone https://github.com/your-org/data_lineage.git
cd data_lineage
pip install -r requirements.txt
./start-app.shAccess:
- Frontend: http://localhost:3000
- API Docs: http://localhost:8000/docs
Startup Modes:
./start-app.sh- Production mode (default)./start-app.sh dev- Development mode with HMR (slower inital load ~2min due to React Flow dev mode)./start-app.sh --rebuild- Force rebuild production bundle
Next Steps: Upload Parquet files or configure database connection
Setup Guides:
- QUICKSTART.md - Detailed installation and setup
- .devcontainer/README.md - VSCode development environment
- .docker/README.md - Docker deployment guide
| Feature | Description |
|---|---|
| Interactive Graph | Pan, zoom, explore with React Flow |
| Trace Mode | Analyze upstream/downstream dependencies (BFS traversal) |
| SQL Viewer | Monaco Editor with syntax highlighting |
| Smart Filtering | Schema, type, pattern-based, and focus filtering |
| Search | Full-text search across all definitions |
| Method | Use Case |
|---|---|
| Parquet Upload | Manual metadata extraction (default) |
| Database Direct | Refresh from SQL Server/Azure SQL/Synapse/Fabric |
| JSON Export | Share and version control lineage data |
Implemented and Tested:
- ✅ Azure Synapse Analytics (dedicated SQL pools) - tested
- ✅ Azure SQL Database - Tested with database direct import
- ✅ SQL Server - Uses same T-SQL connector as Synapse/Azure SQL
- ✅ Microsoft Fabric - Uses T-SQL dialect, same connector as SQL Server/Synapse
Note: Only the Microsoft SQL Server family is currently supported. The YAML-based architecture allows for extension to other SQL dialects, but these are not yet implemented.
System Flow:
- Input - Parquet files (manual upload) or Database Direct (SQL Server/Azure SQL/Synapse/Fabric) or JSON import
- Storage - FastAPI + DuckDB analytics workspace
- Processing - YAML Rule Engine applies dialect-specific regex patterns
- Extraction - Regex-based dependency extraction (FROM/JOIN, INSERT/UPDATE/MERGE, EXEC, SELECT INTO)
- Validation - Metadata catalog validates extracted dependencies (removes false positives)
- Output - JSON format with validated lineage data
- Visualization - React + React Flow interactive graph
Parser: Pure YAML regex patterns extract dependencies, validated against metadata catalog
Details: See docs/ARCHITECTURE.md
| Document | Audience | Purpose |
|---|---|---|
| QUICKSTART.md | Users | 5-minute deployment guide |
| CONFIGURATION.md | Users/DBAs | Environment variables, database setup |
| DATA_SPECIFICATIONS.md | Developers/DBAs | Data contracts, interface specifications, API endpoints |
| ARCHITECTURE.md | Developers | System design, parser internals, rule engine |
| DEVELOPMENT.md | Contributors | Development environment setup and configuration |
This tool was developed as a proof of concept using Claude Code and tested specifically with:
- Azure Synapse Analytics dedicated SQL pools (tested)
- Azure SQL Database (tested with database direct import feature)
Production Status:
- ✅ Parser extensively tested with real-world stored procedures
- ✅ Core functionality validated in Azure environment
- ✅ Supports Microsoft SQL Server family only (SQL Server, Azure SQL, Synapse, Fabric)
This repository is published as-is with the following expectations:
Active Support (Initial Period):
- Bug fixes for critical issues
- Documentation improvements
- Security patches if needed
Long-Term:
- No active feature development planned
- Community contributions welcome via pull requests
- Issues will be reviewed but fixes not guaranteed
- Consider this a reference implementation
This software is provided "as is" under the MIT License:
- No guarantees of fitness for any particular purpose
- No liability for data loss or system issues
- No SLA or support commitments
- Test thoroughly in your environment before production use
Best suited for:
- Understanding SQL object dependencies in Microsoft SQL Server family environments
- Learning how to build lineage visualization tools
- Reference implementation for YAML-based SQL parsing
Not recommended for:
- Mission-critical production lineage without thorough testing
- Databases outside the Microsoft SQL Server family (not currently supported)
- Environments requiring guaranteed support or updates
- Never commit credentials to version control
- Use Azure Key Vault or similar for production secrets
- This tool connects directly to your database - restrict access appropriately
- Uploaded Parquet files may contain sensitive metadata - handle accordingly
The YAML-based architecture was designed for adaptability:
- Customize parsing rules without Python code changes
- Add new extraction patterns via YAML rules
- Add support for new SQL dialects with generic development effort (see ARCHITECTURE.md for dialect extension guide)
See engine/rules/ for YAML rule examples.
SQL Parsing:
- Cross-database lineage: Parser only tracks dependencies within a single database
- Dynamic SQL: Cannot parse dynamically constructed SQL statements (e.g.,
EXEC(@sql),sp_executesql) - Linked server queries: Remote object references not tracked
- Column-level tacing: Tool supports only object-level tracing
Dialect Support:
- Supports only Microsoft SQL Server family only (SQL Server, Azure SQL, Synapse Analytics, Fabric)
- Other SQL dialects could be added through generic development effort (YAML rules + dialect implementation)
- ANSI SQL patterns in
engine/rules/defaults/provide foundation for new dialects
MIT License - Free to use, modify, and distribute (even commercially).
Simple terms: Do whatever you want with this code, but I'm not responsible if something breaks.
See LICENSE for the official text.
Community contributions are welcome! See DEVELOPMENT.md for environment setup.
Please note: While contributions are welcome, active maintenance and review may be limited. Consider this when planning contributions.
- Demo: Try the live demo at https://datalineage.chwagner.eu/
- Documentation: docs/ - Comprehensive guides and specifications
- Quick Help: QUICKSTART.md - 5-minute deployment guide
- Developed using Claude Code (Anthropic)
- Tested with Adventure Works sample database (Microsoft)
- Built on FastAPI, React, DuckDB, React Flow, and Graphology
Built with: FastAPI • React • DuckDB • React Flow • Graphology
Status: Proof of Concept - Production tested with Azure Synapse/SQL
Author: Christian Wagner
License: MIT

