Skip to content

Commit 5752c8b

Browse files
committed
vgi code
0 parents  commit 5752c8b

File tree

3,003 files changed

+1948422
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

3,003 files changed

+1948422
-0
lines changed

.editorconfig

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# EditorConfig: https://EditorConfig.org
2+
3+
# top-most EditorConfig file
4+
root = true
5+
6+
# Unix-style newlines with a newline ending every file
7+
[*]
8+
end_of_line = lf
9+
insert_final_newline = true
10+
11+
# Matches multiple files with brace expansion notation
12+
# Set default charset
13+
[*.{py}]
14+
charset = utf-8
15+
16+
# 4 space indentation
17+
[*.py]
18+
indent_style = space
19+
indent_size = 4
20+
# max_line_length = 120
21+
trim_trailing_whitespace = true

.gitattributes

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# So that GitHub does not classify the project as a Jupyter Notebook project
2+
*.ipynb linguist-documentation

.gitignore

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Python
2+
*__pycache__/
3+
*.pyc
4+
*.egg-info
5+
6+
# VS Code
7+
.vscode/
8+
9+
# Jupyter
10+
.ipynb_checkpoints/
11+
12+
# Out
13+
trained_models/
14+
imputation_stats/
15+
16+
# Slurm
17+
slurm-*.out
18+
19+
# Ignore unused EMNIST data
20+
data/EMNIST/processed/*_byclass.pt
21+
data/EMNIST/processed/*_bymerge.pt
22+
data/EMNIST/processed/*_digits.pt
23+
data/EMNIST/processed/*_letters.pt
24+
data/EMNIST/processed/*_mnist.pt
25+
26+
# PyTorch Lightning
27+
**/lightning_logs/

.gitmodules

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
[submodule "cdi/submodules/missing_data_provider"]
2+
path = cdi/submodules/missing_data_provider
3+
url = [email protected]:vsimkus/missing-data-provider.git
4+
[submodule "cdi/submodules/torch_reparametrised_mixture_distribution"]
5+
path = cdi/submodules/torch_reparametrised_mixture_distribution
6+
url = [email protected]:vsimkus/torch-reparametrised-mixture-distribution.git

README.md

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# Variational Gibbs inference (VGI)
2+
3+
This repository contains the research code for
4+
5+
> Simkus, V., Rhodes, B., Gutmann, M. U., 2021. Variational Gibbs inference for statistical model estimation from incomplete data.
6+
7+
The code is shared for reproducibility purposes and is not intended for production use. It should also serve as a base for anyone wanting to use VGI for model estimation from incomplete data.
8+
9+
## Abstract
10+
11+
Statistical models are central to machine learning with broad applicability across a range of downstream tasks. The models are typically controlled by free parameters that are estimated from data by maximum-likelihood estimation. However, when faced with real-world datasets many of the models run into a critical issue: they are formulated in terms of fully-observed data, whereas in practice the datasets are plagued with missing data. The theory of statistical model estimation from incomplete data is conceptually similar to the estimation of latent-variable models, where powerful tools such as variational inference (VI) exist. However, in contrast to standard latent-variable models, parameter estimation with incomplete data often requires estimating exponentially-many conditional distributions of the missing variables, hence making standard VI methods intractable. We address this gap by introducing variational Gibbs inference (VGI), a new general-purpose method to estimate the parameters of statistical models from incomplete data.
12+
13+
## VGI demo
14+
15+
We invite the readers of the paper to also see the Jupyter [notebook](./notebooks/VGI_demo.ipynb), where we demo VGI on two statistical models and animate the learning process.
16+
17+
Below is an animation from the notebook of a Gaussian Mixture Model fitted from incomplete data using the VGI algorithm (left), and the variational Gibbs conditional approximations (right) throughout iterations.
18+
19+
TODO
20+
21+
## Dependencies
22+
23+
Install python dependencies from conda and the project package with
24+
25+
```bash
26+
conda env create -f environment.yml
27+
conda activate cdi
28+
python setup.py develop
29+
```
30+
31+
If the dependencies in `environment.yml` change, update dependencies with
32+
33+
```bash
34+
conda env update --file environment.yml
35+
```
36+
37+
## Summary of the repository structure
38+
39+
### Data
40+
41+
All data used in the paper are stored in [`data`](./data/) directory and the corresponding data loaders can be found in [`cdi/data`](./cdi/data/) directory.
42+
43+
### Method code
44+
45+
The main code to the various methods used in the paper can be found in [`cdi/trainers`](./cdi/trainers/) directory.
46+
47+
* [`trainer_base.py`](./cdi/trainers/trainer_base.py) implements the main data loading and preprocessing code.
48+
* [`variational_cdi.py`](./cdi/trainers/variational_cdi.py) implements the key code for variational Gibbs inference (VGI).
49+
* [`mcimp.py`](./cdi/trainers/mcimp.py) implements the code for variational block-Gibbs inference (VBGI) used in the VAE experiments.
50+
* The other scripts in [`cdi/trainers`](./cdi/trainers/) implement the reference code and variational conditional pre-training code.
51+
52+
### Statistical models
53+
54+
The code for the statistical and the variational models are located in [`cdi/models`](./cdi/models/).
55+
56+
### Configurations
57+
58+
The [`experiment_configs`](./experiment_configs/) directory contains the configuration files for all experiments. The config files are in json format. They are passed to the main running script as a command-line argument and values in them can be overriden with additional command-line arguments.
59+
60+
### Run scripts
61+
62+
[`train.py`](./train.py) is the main code we use to run the experiments, and [`test.py`](./test.py) is the main script to produce analysis results used in the text.
63+
64+
### Analysis code
65+
66+
The Jupyter notebooks in [`notebooks`](./notebooks/) directory contain the code which was used to analysis the method and produce figures in the text.
67+
68+
## Running the code
69+
70+
Before running any code you'll need to activate the `cdi` conda environment (and make sure you've installed the dependencies)
71+
72+
```bash
73+
conda activate cdi
74+
```
75+
76+
### Model fitting
77+
78+
To train a model use the `train.py` script, for example, to fit a rational-quadratic spline flow on 50% missing MiniBooNE dataset
79+
80+
```bash
81+
python train.py --config=experiment_configs/flows_uci/learning_experiments/3/rqcspline_miniboone_chrqsvar_cdi_uncondgauss.json
82+
```
83+
84+
Any parameters set in the yaml file can be overriden by passing additionals command-line arguments, e.g.
85+
86+
```bash
87+
python train.py --config=experiment_configs/flows_uci/learning_experiments/3/rqcspline_miniboone_chrqsvar_cdi_uncondgauss.json --data.total_miss=0.33
88+
```
89+
90+
#### Optional variational model warm-up
91+
92+
Some VGI experiments use variational model "warm-up", which pre-trains the variational model on observed data as probabilistic regressors. The experiment configurations for these runs will have `var_pretrained_model` set to the name of the pre-trained model. To run the corresponding pre-training script run, e.g.
93+
94+
```bash
95+
python train.py --config=experiment_configs/flows_uci/learning_experiments/3/miniboone_chrqsvar_pretraining_uncondgauss.json
96+
```
97+
98+
## Running model evaluation
99+
100+
For model evaluation use [`test.py`](./test.py) with the corresponding test config, e.g.
101+
102+
```bash
103+
python test.py --test_config=experiment_configs/flows_uci/eval_loglik/3/rqcspline_miniboone_chrqsvar_cdi_uncondgauss.json
104+
```
105+
106+
This will store all results in a file that we then analyse in the provided notebook.
107+
108+
For the VAE evaluation, where variational distribution fine-tuning is required for test log-likelihood evaluation use [`retrain_all_ckpts_on_test_and_run_test.py`](./retrain_all_ckpts_on_test_and_run_test.py).

cdi/__init__.py

Whitespace-only changes.

cdi/common/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)