Refactor: add transforms and transformed emulator #474

sgreenbury · 2025-05-16T09:14:04Z

Closes #348 and initial impl of #376.

Summary

This PR includes:

An AutoEmulateTransform class subclassing torch.distributions.Transform that:
- Enables sequentially composable transforms for inputs and outputs that can flexibly handle both TensorLike, DistributionLike and with custom functionality for GaussianLike outputs.
- It is subclassed to enable the inclusion of methods for:
  - Fitting after initialization (and checking if fitted)
  - Including approaches to transforming multivariate normals from a basis matrix that can be specified for the transform. This also relates to the Transformation of uncertainty with dimensionality reduction #376 where the delta method is included (the Jacobian is used to provide an approximate linear estimate around the mean when inverting). This manifests in the expanded_basis_matrix override in application to the VAE.
  - An inverse sample method to enable the empirical construction of the inverted mean and covariance matrix for a GaussianLike
A utility function for matrices to make_positive_definite: this function adds jitter (up to retries) on the diagonal and when this fails clamps the eigenvalues to be positive (up to retries)
AutoEmulateTransforms implemented for:
- PCATransform, StandardizeTransform (around mean and std_dev), VAETransform (this copies the implementation from the current VAE used as a transform in the non-experimental module)
TransformedEmulator emulator class (the refactor version of the AutoEmulatePipeline) with:
- An emulator with specification of lists of transforms for inputs and target_transforms for outputs.
- Has API to enable choice of inverting a wrapped emulator model that returns a GaussianLike analytically / with delta method or through a sampling approach
- Reverts to using diagonal covariance matrix beyond max_targets threshold to ensure computationally tractable
Integration with compare loop is for Add refactored transforms to compare loop #531 - a .tune() method may need to be directly added to the TransformedEmulator so it has a slightly different API and perhaps shouldn't directly subclass Emulator

Questions

The separate.fit() method could be removed in favour of fitting upon initialization. We could pass functools.partial classes to the TransformedEmulator in this case instead of fully initialized ones so that the init method can be performed within the TransformedEmulator's init
Should the amount of relaxation for make_positive_definite be configurable in the emulator or TransformedEmulator API
Should standardization as a transform be included only for TransformedEmulators or should it be also included directly within emulators and replace the preprocessor API
Should we also subclass ComposeTransform for AutoEmulateComposeTransform to simplify composing AutoEmulateTransforms
Are there additional ways we could be testing the numerical accuracy of the approaches

Remaining tasks:

Open new issues that are beyond scope of this PR for:
- Learnable transforms (opened Add transforms that are learnable simultaneously with emulator #536)
- Adding transforms/target transforms to the the compare loop / outputs (update: opened Add refactored transforms to compare loop #531)
Confirm value for tolerance for allclose with delta method compared to sampling method for VAE / identify alternative test

codecov-commenter · 2025-05-16T09:27:06Z

Codecov Report

Attention: Patch coverage is 91.40271% with 38 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@b4faf29). Learn more about missing BASE report.
Report is 100 commits behind head on main.

Files with missing lines	Patch %	Lines
...emulate/experimental/emulators/transformed/base.py	85.39%	13 Missing ⚠️
autoemulate/experimental/transforms/utils.py	67.85%	9 Missing ⚠️
autoemulate/experimental/transforms/vae.py	91.42%	6 Missing ⚠️
autoemulate/experimental/transforms/base.py	90.69%	4 Missing ⚠️
autoemulate/experimental/transforms/pca.py	90.62%	3 Missing ⚠️
autoemulate/experimental/transforms/standardize.py	90.00%	3 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #474   +/-   ##
=======================================
  Coverage        ?   78.05%           
=======================================
  Files           ?      126           
  Lines           ?     9077           
  Branches        ?        0           
=======================================
  Hits            ?     7085           
  Misses          ?     1992           
  Partials        ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-05-16T09:27:19Z

Coverage report

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)	Lines missing
autoemulate/emulators
gaussian_process.py
autoemulate/experimental/emulators
__init__.py
base.py
autoemulate/experimental/emulators/neural_processes
conditional_neural_process.py
autoemulate/experimental/emulators/transformed
base.py					180-182, 351-352, 375-378, 383-388, 392-396
autoemulate/experimental/transforms
__init__.py
base.py					35-36, 43-44
pca.py					53-55
standardize.py					45-47
utils.py					60, 67-75
vae.py					89, 117-121, 141-143
tests
test_compare.py
tests/experimental
conftest.py
test_experimental_conditional_neural_process.py
test_experimental_transformed.py
tests/experimental/transforms
test_transforms.py
Project Total

_{This report was generated by python-coverage-comment-action}

review-notebook-app · 2025-05-16T09:29:57Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

…ssing

Co-authored-by: Radka Jersakova <[email protected]>

sgreenbury · 2025-06-24T15:18:19Z

@radka-j: thanks for the review comments! I've aimed to address these now and also more generally:

Added docstrings for all _methods in TransformedEmulator and AutoEmulateTransform
Renamed function arguments / attributes in AutoEmulateTransform and TransformedEmulator
Renamed methods in AutoEmulateTransform for clarity

sgreenbury · 2025-06-24T15:20:19Z

tests/experimental/test_experimental_transformed.py

+            # TODO: PCA/VAE both require StandardizeTransform for numerical stability
+            # e.g. "ValueError: Input tensor y contains non-finite values"
+            # TODO: check error when no target transforms are provided
+            # None,
+            # [StandardizeTransform()],


I think this can probably be removed now since it seems reasonable to expect these transforms to work in this context with StandardizeTransform applied first.

sgreenbury · 2025-06-24T15:20:37Z

tests/experimental/test_experimental_transformed.py

+            # TODO: PCA/VAE both require StandardizeTransform for numerical stability
+            # e.g. "ValueError: Input tensor y contains non-finite values"
+            # TODO: check error when no target transforms are provided
+            # None,
+            # [StandardizeTransform()],


As above: https://github.com/alan-turing-institute/autoemulate/pull/474/files#r2164309387

sgreenbury · 2025-06-24T15:22:54Z

tests/experimental/test_experimental_transformed.py

+    assert isinstance(y_pred_cov, TensorLike)
+    assert isinstance(y_pred2_cov, TensorLike)
+    print(y_pred2_cov - y_pred_cov)
+    # TODO: consider if this is close enough for PCA case


This is a large value for atol though I think this seems a reasonable discrepancy given the scaling with n_samples explored in this notebook

sgreenbury · 2025-06-24T15:26:07Z

tests/experimental/test_experimental_transformed.py

+    # TODO: these are not necessarily expected to be close since both approximate in
+    # different ways
+    # Most are within 50% error
+    assert torch.quantile(diff_abs.flatten(), 0.9).item() < 0.25
+    assert torch.quantile(diff_abs.flatten(), 0.95).item() < 0.5
+    # Some large max differences so will not assert on these
+    print("Max diff", diff_abs.abs().max())


Similar to PCA above but to a greater extent - the similarity used in the assert here would ideally be closer and based on a specific expectation. It might be worth considering a test with an alternative dataset for this in case it relates to the quality of the VAE fit.

Add transformed emulator and transform class, add PCA transform

9698fbf

sgreenbury added 2 commits May 16, 2025 10:29

Add kwargs

944bd32

Add example notebook

77c1b32

sgreenbury added 24 commits May 16, 2025 13:21

Add _inverse_sample method

81321e3

Add refit placeholder

844ff56

Add initial test for transformed emulator

50394e0

Add check for whether fitted

cdc404b

Initial reimpl of VAE for refactor

a6d2de2

Assert transform ops on TensorLike

7a9d501

Update poetry.lock

372190b

Add utils and apply make_positive_definite

4f08241

Fix missing VAE init

725ce86

Add delta method for VAE

80107c0

Add tests for transforms

fe8c0fe

Make transforms optional

2555143

Add spectral approach, generalize test

37f0906

Fix _inverse_sample shapes

d5bb502

Add test for inverse for gaussians

6c4b8b3

Add fallback to sample if delta method fails

31c4a4b

Begin standardize transform

29ac348

Merge remote-tracking branch 'origin/main' into 348-refactor-preproce…

66647ba

…ssing

Fix import

8325738

Initial standardize transform

e40da06

Refactor common functionality to base class for transforms

8ec6c1b

Split basis_matrix and expanded_basis_matrix

0746610

Add kwargs to init for transformed emulator

333a5ba

Update example notebook

f97748b

Revise docstrings

c7749a8

Co-authored-by: Radka Jersakova <[email protected]>

sgreenbury mentioned this pull request Jun 20, 2025

Refactor: docstring style and ruff config #555

Open

sgreenbury added 5 commits June 20, 2025 10:01

Fix lints

63025a5

Refactor variable names and docstrings

bb5dc25

Update default device argument

1926abf

Revise TransformedEmulator API variable names and docstrings

ebf7dff

Add docstring and refactor VAETrasnform

f9bfc3a

sgreenbury mentioned this pull request Jun 23, 2025

Refactor: extend model training API #475

Open

sgreenbury added 10 commits June 24, 2025 06:32

Move experimental reaction diffusion to experimental docs

499fb4c

Update and extend _inverse_sample docstring

1cc4d70

Update and extend docstrings

f886b7c

Add issue number for todo

fc2f886

Add mixins to base transform

ad52543

Add check_matrix call in StandardizeTransform

57d37f0

Remove niter in PCA test

a75a24b

Update init and remove cache from API

d68894c

Update fit with check, move comments, add type hints

98fd93e

Revise cache_size in API, add docstrings

2902587

sgreenbury force-pushed the 348-refactor-preprocessing branch from d16eda9 to 2902587 Compare June 24, 2025 10:58

sgreenbury added 5 commits June 24, 2025 14:46

Refactor with _convert_to_dataloader method

3323ca9

Add diagram for fitting

64e71b8

Add and revise docstrings for clarity and completeness

28dcbf1

Revise method for compatibility with base

062686d

Rename methods for clarity

9d299bb

sgreenbury requested a review from radka-j June 24, 2025 15:18

sgreenbury commented Jun 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor: add transforms and transformed emulator #474

Refactor: add transforms and transformed emulator #474

sgreenbury commented May 16, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented May 16, 2025 •

edited

Loading

Uh oh!

github-actions bot commented May 16, 2025 •

edited

Loading

Uh oh!

review-notebook-app bot commented May 16, 2025

Uh oh!

sgreenbury commented Jun 24, 2025

Uh oh!

sgreenbury Jun 24, 2025

Uh oh!

sgreenbury Jun 24, 2025

Uh oh!

sgreenbury Jun 24, 2025

Uh oh!

sgreenbury Jun 24, 2025

Uh oh!

Uh oh!

Refactor: add transforms and transformed emulator #474

Are you sure you want to change the base?

Refactor: add transforms and transformed emulator #474

Conversation

sgreenbury commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Questions

Uh oh!

codecov-commenter commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage report

Uh oh!

review-notebook-app bot commented May 16, 2025

Uh oh!

sgreenbury commented Jun 24, 2025

Uh oh!

sgreenbury Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

sgreenbury Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

sgreenbury Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

sgreenbury Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sgreenbury commented May 16, 2025 •

edited

Loading

codecov-commenter commented May 16, 2025 •

edited

Loading

github-actions bot commented May 16, 2025 •

edited

Loading