vsimkus
diff --git a/‎.editorconfig
Lines changed: 21 additions & 0 deletions b/‎.editorconfig
Lines changed: 21 additions & 0 deletions
diff --git a/‎.gitattributes
Lines changed: 2 additions & 0 deletions b/‎.gitattributes
Lines changed: 2 additions & 0 deletions
diff --git a/‎.gitignore
Lines changed: 27 additions & 0 deletions b/‎.gitignore
Lines changed: 27 additions & 0 deletions
diff --git a/‎.gitmodules
Lines changed: 6 additions & 0 deletions b/‎.gitmodules
Lines changed: 6 additions & 0 deletions
diff --git a/‎README.md
Lines changed: 108 additions & 0 deletions b/‎README.md
Lines changed: 108 additions & 0 deletions
diff --git a/‎cdi/__init__.py b/‎cdi/__init__.py
diff --git a/‎cdi/common/__init__.py b/‎cdi/common/__init__.py
@@ -0,0 +1,21 @@
+# EditorConfig: https://EditorConfig.org
+
+# top-most EditorConfig file
+root = true
+
+# Unix-style newlines with a newline ending every file
+[*]
+end_of_line = lf
+insert_final_newline = true
+
+# Matches multiple files with brace expansion notation
+# Set default charset
+[*.{py}]
+charset = utf-8
+
+# 4 space indentation
+[*.py]
+indent_style = space
+indent_size = 4
+# max_line_length = 120
+trim_trailing_whitespace = true
@@ -0,0 +1,2 @@
+# So that GitHub does not classify the project as a Jupyter Notebook project
+*.ipynb linguist-documentation
@@ -0,0 +1,27 @@
+# Python
+*__pycache__/
+*.pyc
+*.egg-info
+
+# VS Code
+.vscode/
+
+# Jupyter
+.ipynb_checkpoints/
+
+# Out
+trained_models/
+imputation_stats/
+
+# Slurm
+slurm-*.out
+
+# Ignore unused EMNIST data
+data/EMNIST/processed/*_byclass.pt
+data/EMNIST/processed/*_bymerge.pt
+data/EMNIST/processed/*_digits.pt
+data/EMNIST/processed/*_letters.pt
+data/EMNIST/processed/*_mnist.pt
+
+# PyTorch Lightning
+**/lightning_logs/
@@ -0,0 +1,6 @@
+[submodule "cdi/submodules/missing_data_provider"]
+	path = cdi/submodules/missing_data_provider
+	url = [email protected]:vsimkus/missing-data-provider.git
+[submodule "cdi/submodules/torch_reparametrised_mixture_distribution"]
+	path = cdi/submodules/torch_reparametrised_mixture_distribution
+	url = [email protected]:vsimkus/torch-reparametrised-mixture-distribution.git
@@ -0,0 +1,108 @@
+# Variational Gibbs inference (VGI)
+
+This repository contains the research code for
+
+> Simkus, V., Rhodes, B., Gutmann, M. U., 2021. Variational Gibbs inference for statistical model estimation from incomplete data.
+
+The code is shared for reproducibility purposes and is not intended for production use. It should also serve as a base for anyone wanting to use VGI for model estimation from incomplete data.
+
+## Abstract
+
+Statistical models are central to machine learning with broad applicability across a range of downstream tasks. The models are typically controlled by free parameters that are estimated from data by maximum-likelihood estimation. However, when faced with real-world datasets many of the models run into a critical issue: they are formulated in terms of fully-observed data, whereas in practice the datasets are plagued with missing data. The theory of statistical model estimation from incomplete data is conceptually similar to the estimation of latent-variable models, where powerful tools such as variational inference (VI) exist. However, in contrast to standard latent-variable models, parameter estimation with incomplete data often requires estimating exponentially-many conditional distributions of the missing variables, hence making standard VI methods intractable. We address this gap by introducing variational Gibbs inference (VGI), a new general-purpose method to estimate the parameters of statistical models from incomplete data.
+
+## VGI demo
+
+We invite the readers of the paper to also see the Jupyter [notebook](./notebooks/VGI_demo.ipynb), where we demo VGI on two statistical models and animate the learning process.
+
+Below is an animation from the notebook of a Gaussian Mixture Model fitted from incomplete data using the VGI algorithm (left), and the variational Gibbs conditional approximations (right) throughout iterations.
+
+TODO
+
+## Dependencies
+
+Install python dependencies from conda and the project package with
+
+```bash
+conda env create -f environment.yml
+conda activate cdi
+python setup.py develop
+```
+
+If the dependencies in `environment.yml` change, update dependencies with
+
+```bash
+conda env update --file environment.yml
+```
+
+## Summary of the repository structure
+
+### Data
+
+All data used in the paper are stored in [`data`](./data/) directory and the corresponding data loaders can be found in [`cdi/data`](./cdi/data/) directory.
+
+### Method code
+
+The main code to the various methods used in the paper can be found in [`cdi/trainers`](./cdi/trainers/) directory.
+
+* [`trainer_base.py`](./cdi/trainers/trainer_base.py) implements the main data loading and preprocessing code.
+* [`variational_cdi.py`](./cdi/trainers/variational_cdi.py) implements the key code for variational Gibbs inference (VGI).
+* [`mcimp.py`](./cdi/trainers/mcimp.py) implements the code for variational block-Gibbs inference (VBGI) used in the VAE experiments.
+* The other scripts in [`cdi/trainers`](./cdi/trainers/) implement the reference code and variational conditional pre-training code.
+
+### Statistical models
+
+The code for the statistical and the variational models are located in [`cdi/models`](./cdi/models/).
+
+### Configurations
+
+The [`experiment_configs`](./experiment_configs/) directory contains the configuration files for all experiments. The config files are in json format. They are passed to the main running script as a command-line argument and values in them can be overriden with additional command-line arguments.
+
+### Run scripts
+
+[`train.py`](./train.py) is the main code we use to run the experiments, and [`test.py`](./test.py) is the main script to produce analysis results used in the text.
+
+### Analysis code
+
+The Jupyter notebooks in [`notebooks`](./notebooks/) directory contain the code which was used to analysis the method and produce figures in the text.
+
+## Running the code
+
+Before running any code you'll need to activate the `cdi` conda environment (and make sure you've installed the dependencies)
+
+```bash
+conda activate cdi
+```
+
+### Model fitting
+
+To train a model use the `train.py` script, for example, to fit a rational-quadratic spline flow on 50% missing MiniBooNE dataset
+
+```bash
+python train.py --config=experiment_configs/flows_uci/learning_experiments/3/rqcspline_miniboone_chrqsvar_cdi_uncondgauss.json
+```
+
+Any parameters set in the yaml file can be overriden by passing additionals command-line arguments, e.g.
+
+```bash
+python train.py --config=experiment_configs/flows_uci/learning_experiments/3/rqcspline_miniboone_chrqsvar_cdi_uncondgauss.json --data.total_miss=0.33
+```
+
+#### Optional variational model warm-up
+
+Some VGI experiments use variational model "warm-up", which pre-trains the variational model on observed data as probabilistic regressors. The experiment configurations for these runs will have `var_pretrained_model` set to the name of the pre-trained model. To run the corresponding pre-training script run, e.g.
+
+```bash
+python train.py --config=experiment_configs/flows_uci/learning_experiments/3/miniboone_chrqsvar_pretraining_uncondgauss.json
+```
+
+## Running model evaluation
+
+For model evaluation use [`test.py`](./test.py) with the corresponding test config, e.g.
+
+```bash
+python test.py --test_config=experiment_configs/flows_uci/eval_loglik/3/rqcspline_miniboone_chrqsvar_cdi_uncondgauss.json
+```
+
+This will store all results in a file that we then analyse in the provided notebook.
+
+For the VAE evaluation, where variational distribution fine-tuning is required for test log-likelihood evaluation use [`retrain_all_ckpts_on_test_and_run_test.py`](./retrain_all_ckpts_on_test_and_run_test.py).
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+# So that GitHub does not classify the project as a Jupyter Notebook project`
	`2`	`+*.ipynb linguist-documentation`