Skip to content

Refactor: add transforms and transformed emulator #474

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 88 commits into
base: main
Choose a base branch
from

Conversation

sgreenbury
Copy link
Collaborator

@sgreenbury sgreenbury commented May 16, 2025

Closes #348 and initial impl of #376.

Summary

This PR includes:

  • An AutoEmulateTransform class subclassing torch.distributions.Transform that:
    • Enables sequentially composable transforms for inputs and outputs that can flexibly handle both TensorLike, DistributionLike and with custom functionality for GaussianLike outputs.
    • It is subclassed to enable the inclusion of methods for:
      • Fitting after initialization (and checking if fitted)
      • Including approaches to transforming multivariate normals from a basis matrix that can be specified for the transform. This also relates to the Transformation of uncertainty with dimensionality reduction #376 where the delta method is included (the Jacobian is used to provide an approximate linear estimate around the mean when inverting). This manifests in the expanded_basis_matrix override in application to the VAE.
      • An inverse sample method to enable the empirical construction of the inverted mean and covariance matrix for a GaussianLike
  • A utility function for matrices to make_positive_definite: this function adds jitter (up to retries) on the diagonal and when this fails clamps the eigenvalues to be positive (up to retries)
  • AutoEmulateTransforms implemented for:
    • PCATransform, StandardizeTransform (around mean and std_dev), VAETransform (this copies the implementation from the current VAE used as a transform in the non-experimental module)
  • TransformedEmulator emulator class (the refactor version of the AutoEmulatePipeline) with:
    • An emulator with specification of lists of transforms for inputs and target_transforms for outputs.
    • Has API to enable choice of inverting a wrapped emulator model that returns a GaussianLike analytically / with delta method or through a sampling approach
    • Reverts to using diagonal covariance matrix beyond max_targets threshold to ensure computationally tractable
  • Integration with compare loop is for Add refactored transforms to compare loop #531 - a .tune() method may need to be directly added to the TransformedEmulator so it has a slightly different API and perhaps shouldn't directly subclass Emulator

Questions

  • The separate.fit() method could be removed in favour of fitting upon initialization. We could pass functools.partial classes to the TransformedEmulator in this case instead of fully initialized ones so that the init method can be performed within the TransformedEmulator's init
  • Should the amount of relaxation for make_positive_definite be configurable in the emulator or TransformedEmulator API
  • Should standardization as a transform be included only for TransformedEmulators or should it be also included directly within emulators and replace the preprocessor API
  • Should we also subclass ComposeTransform for AutoEmulateComposeTransform to simplify composing AutoEmulateTransforms
  • Are there additional ways we could be testing the numerical accuracy of the approaches

Remaining tasks:

@codecov-commenter
Copy link

codecov-commenter commented May 16, 2025

Codecov Report

Attention: Patch coverage is 91.40271% with 38 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@b4faf29). Learn more about missing BASE report.
Report is 100 commits behind head on main.

Files with missing lines Patch % Lines
...emulate/experimental/emulators/transformed/base.py 85.39% 13 Missing ⚠️
autoemulate/experimental/transforms/utils.py 67.85% 9 Missing ⚠️
autoemulate/experimental/transforms/vae.py 91.42% 6 Missing ⚠️
autoemulate/experimental/transforms/base.py 90.69% 4 Missing ⚠️
autoemulate/experimental/transforms/pca.py 90.62% 3 Missing ⚠️
autoemulate/experimental/transforms/standardize.py 90.00% 3 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #474   +/-   ##
=======================================
  Coverage        ?   78.05%           
=======================================
  Files           ?      126           
  Lines           ?     9077           
  Branches        ?        0           
=======================================
  Hits            ?     7085           
  Misses          ?     1992           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

github-actions bot commented May 16, 2025

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  autoemulate/emulators
  gaussian_process.py
  autoemulate/experimental/emulators
  __init__.py
  base.py
  autoemulate/experimental/emulators/neural_processes
  conditional_neural_process.py
  autoemulate/experimental/emulators/transformed
  base.py 180-182, 351-352, 375-378, 383-388, 392-396
  autoemulate/experimental/transforms
  __init__.py
  base.py 35-36, 43-44
  pca.py 53-55
  standardize.py 45-47
  utils.py 60, 67-75
  vae.py 89, 117-121, 141-143
  tests
  test_compare.py
  tests/experimental
  conftest.py
  test_experimental_conditional_neural_process.py
  test_experimental_transformed.py
  tests/experimental/transforms
  test_transforms.py
Project Total  

This report was generated by python-coverage-comment-action

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Co-authored-by: Radka Jersakova <[email protected]>
@sgreenbury sgreenbury force-pushed the 348-refactor-preprocessing branch from d16eda9 to 2902587 Compare June 24, 2025 10:58
@sgreenbury
Copy link
Collaborator Author

@radka-j: thanks for the review comments! I've aimed to address these now and also more generally:

  • Added docstrings for all _methods in TransformedEmulator and AutoEmulateTransform
  • Renamed function arguments / attributes in AutoEmulateTransform and TransformedEmulator
  • Renamed methods in AutoEmulateTransform for clarity

@sgreenbury sgreenbury requested a review from radka-j June 24, 2025 15:18
Comment on lines +78 to +82
# TODO: PCA/VAE both require StandardizeTransform for numerical stability
# e.g. "ValueError: Input tensor y contains non-finite values"
# TODO: check error when no target transforms are provided
# None,
# [StandardizeTransform()],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can probably be removed now since it seems reasonable to expect these transforms to work in this context with StandardizeTransform applied first.

Comment on lines +122 to +126
# TODO: PCA/VAE both require StandardizeTransform for numerical stability
# e.g. "ValueError: Input tensor y contains non-finite values"
# TODO: check error when no target transforms are provided
# None,
# [StandardizeTransform()],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert isinstance(y_pred_cov, TensorLike)
assert isinstance(y_pred2_cov, TensorLike)
print(y_pred2_cov - y_pred_cov)
# TODO: consider if this is close enough for PCA case
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a large value for atol though I think this seems a reasonable discrepancy given the scaling with n_samples explored in this notebook

Comment on lines +213 to +219
# TODO: these are not necessarily expected to be close since both approximate in
# different ways
# Most are within 50% error
assert torch.quantile(diff_abs.flatten(), 0.9).item() < 0.25
assert torch.quantile(diff_abs.flatten(), 0.95).item() < 0.5
# Some large max differences so will not assert on these
print("Max diff", diff_abs.abs().max())
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to PCA above but to a greater extent - the similarity used in the assert here would ideally be closer and based on a specific expectation. It might be worth considering a test with an alternative dataset for this in case it relates to the quality of the VAE fit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor: preprocessing and dimensionality reduction
3 participants