This project analyzes the Linux kernel git history to rank universities by their patch contributions. It matches commit author email domains against a university domain list, aggregates statistics (patch count, lines changed, contributors), and publishes the results as a static website via GitHub Pages.
# Install PDM (Linux/macOS)
curl -sSL https://pdm-project.org/install-pdm.py | python3 -
# Install dependencies
pdm install
# Clone the Linux kernel repo (full history required)
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
# Generate results
pdm startThe output is result.json, result.js, and localized paginated HTML detail pages in detail/<locale>/. Serve index.html with any web server to view the rankings, or open it directly as a local file after generation.
The website supports en, zh-CN, zh-TW, ja, and ko. Use the language selector in the page header, or open the page with ?lang=<locale> such as ?lang=zh-CN.
University names, author names, domains, emails, and commit summaries remain in their source language. The localized content covers the website UI and generated detail-page navigation.
- Python 3.9+
- PDM package manager
- A local clone of the Linux kernel repository (with full git history)
pdm install -G dev # Install with dev dependencies (pylint, pytest)
pdm lint # Run pylint (threshold 9.0)
pdm test # Run all tests- Fetches the university domain list from Hipo/university-domains-list
- Iterates through all git commits, matching author email domains to universities (with parent domain fallback, e.g.
cs.mit.edumatchesmit.edu) - Aggregates per-domain statistics and merges aliases for the same university
- Ranks universities by patch count (tiebreak: total lines changed)
- Writes
result.jsonand generates localized paginated HTML detail pages
- Pylint CI: Runs on every push/PR to
masteracross Python 3.9-3.13 - Page Deployment: A daily scheduled workflow regenerates the data and deploys to GitHub Pages
See the docs/ folder for more details:
- Architecture - Project structure and data pipeline
- Data Format - Schema of
result.json