ShiftScaleTransformer Scaling Options "maxnorm" and "maxnormsym" (#74)

nicolearetz · shanemcq18 · web-flow · commit d9758fafbbbe · 2024-11-20T14:46:11.000-07:00
* option "maxnorm" for snapshot transformers including tests

* small doc/test fixes, move byrow/scaling conflict check to __init__()

* ShiftScaleTransformer(scaling='maxnormsym')

* update pre docs page

* bump version 0.5.10 -&gt; 0.5.11, update changelog

---------

Co-authored-by: Shane &lt;smcquar@sandia.gov&gt;
diff --git a/docs/source/api/pre.ipynb b/docs/source/api/pre.ipynb
@@ -60,7 +60,7 @@
     "::::{admonition} Example Data\n",
     ":class: tip\n",
     "\n",
-    "The examples on this page use data from the combustion problem described in {cite}`swischuk2020combustion`.\n",
+    "The examples on this page use data downsampled from the combustion problem described in {cite}`swischuk2020combustion`.\n",
     "\n",
     ":::{dropdown} State Variables\n",
     "\n",
@@ -73,8 +73,8 @@
     "- Specific volume (inverse density) $\\xi = 1/\\rho$\n",
     "- Chemical species molar concentrations for CH$_{4}$, O$_{2}$, CO$_{2}$, and H$_{2}$O.\n",
     "\n",
-    "The dimension of the spatial discretization in the full example in {cite}`swischuk2020combustion` is $38{,}523$ per variable, so $n = 9 \\times 38{,}523 = 346{,}707$.\n",
-    "Here we have downsampled the state dimension to $535$ for each variable for demonstration purposes, i.e., $n = 9 \\times 535 = 4{,}815$.\n",
+    "The dimension of the spatial discretization in the full example in {cite}`swischuk2020combustion` is $n_x = 38{,}523$ for each of the $n_q = 9$ variables, so the total state dimension is $n_q n_x = 9 \\times 38{,}523 = 346{,}707$.\n",
+    "For demonstration purposes, we have downsampled the state dimension to $n_x' = 535$, hence $n = n_q n_x' = 9 \\times 535 = 4{,}815$ is the total state dimension of the example data.\n",
     ":::\n",
     "\n",
     "You can [download the data here](https://github.com/Willcox-Research-Group/rom-operator-inference-Python3/raw/data/pre_example.npy) to repeat the experiments.\n",
@@ -84,7 +84,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -139,6 +139,57 @@
     ":::"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "::::{admonition} Fit-and-Transform versus Transform\n",
+    ":class: important\n",
+    "\n",
+    "Pre-processing transformation classes are calibrated through user-provided hyperparameters in the constructor and/or training snapshots passed to ``fit()`` or ``fit_transform()``.\n",
+    "The ``transform()`` method applies but *does not alter* the transformation.\n",
+    "Some transformations are designed so that the transformed training data has certain properties, but those properties are not guaranteed to hold for transformed data that was not used for training.\n",
+    "\n",
+    ":::{dropdown} Example\n",
+    "\n",
+    "Consider a set of training snapshots $\\{\\q_{j}\\}_{j=0}^{k-1}\\subset\\RR^n$.\n",
+    "The {class}`ShiftScaleTransformer` can shift data by the mean training snapshot, meaning it can represent the transformation $\\mathcal{T}:\\RR^{n}\\to\\RR^{n}$ given by\n",
+    "\n",
+    "$$\n",
+    "\\begin{aligned}\n",
+    "    \\mathcal{T}(\\q) = \\q - \\bar{\\q},\n",
+    "    \\qquad\n",
+    "    \\bar{\\q} = \\frac{1}{k}\\sum_{j=0}^{k-1}\\q_{j}.\n",
+    "\\end{aligned}\n",
+    "$$\n",
+    "\n",
+    "The key property of this transformation is that the transformed training snapshots have zero mean.\n",
+    "That is,\n",
+    "\n",
+    "$$\n",
+    "\\begin{aligned}\n",
+    "    \\frac{1}{k}\\sum_{j=0}^{k-1}\\mathcal{T}(\\q_j)\n",
+    "    = \\frac{1}{k}\\sum_{j=0}^{k-1}(\\q_j - \\bar{\\q})\n",
+    "    = \\frac{1}{k}\\sum_{j=0}^{k-1}\\q_j - \\frac{1}{k}\\sum_{j=0}^{k-1}\\bar{\\q}\n",
+    "    = \\bar{\\q} - \\frac{k}{k}\\bar{\\q}\n",
+    "    = \\0.\n",
+    "\\end{aligned}\n",
+    "$$\n",
+    "\n",
+    "However, for any other collection $\\{\\mathbf{x}_j\\}_{j=0}^{k'-1}\\subset\\RR^{n}$ of snapshots, the set of transformed snapshots $\\{\\mathcal{T}(\\mathbf{x}_j)\\}_{j=0}^{k'-1}$ is not guaranteed to have zero mean because $\\mathcal{T}$ shifts by the mean of the $\\q_j$'s, not the mean of the $\\mathbf{x}_j$'s.\n",
+    "That is,\n",
+    "\n",
+    "$$\n",
+    "\\begin{aligned}\n",
+    "    \\frac{1}{k'}\\sum_{j=0}^{k'-1}\\mathcal{T}(\\mathbf{x}_j)\n",
+    "    = \\frac{1}{k'}\\sum_{j=0}^{k'-1}(\\mathbf{x}_j - \\bar{\\q})\n",
+    "    \\neq \\0.\n",
+    "\\end{aligned}\n",
+    "$$\n",
+    ":::\n",
+    "::::"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -214,7 +265,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The most common type of shift sets the reference snapshot to be the average of the training snapshots:\n",
+    "One strategy that is often effective for Operator Inference is to set the reference snapshot to be the average of the training snapshots:\n",
     "\n",
     "$$\n",
     "    \\bar{\\q}\n",
@@ -352,7 +403,7 @@
    "metadata": {},
    "source": [
     "Many engineering problems feature multiple variables with ranges across different scales.\n",
-    "For such cases, it is often beneficial to scale the variables to similar ranges so that one variable does not overwhelm the other in the operator learning.\n",
+    "For such cases, it is often beneficial to scale the variables to similar ranges so that one variable does not overwhelm the other during operator learning.\n",
     "In other words, training data should be nondimensionalized when possible.\n",
     "\n",
     "A scaling operation for a single variable is given by\n",
@@ -362,7 +413,7 @@
     "$$\n",
     "\n",
     "where $\\alpha \\neq 0$ and $\\q'$ is a training snapshot after shifting (when desired).\n",
-    "The :class:`ScaleTransformer` class receives a scaler $\\alpha$ and implements this transformation."
+    "The {class}`ScaleTransformer` class receives a scaler $\\alpha$ and implements this transformation."
    ]
   },
   {
@@ -668,7 +719,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 25,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -738,7 +789,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 26,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -806,7 +857,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 27,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -832,7 +883,7 @@
     ":class: note\n",
     "\n",
     "- In this example, the `state_dimension` could be set in the constructor because the `w` argument is a vector of length $n$. However, the `state_dimension` is not required to be set until [`fit_transform()`](TransformerTemplate.fit_transform).\n",
-    "- Because the transformation is dictated by the choice of `\\w` and not calibrated from data, [`fit_transform()`](TransformerTemplate.fit_transform) simply calls [`transform()`](TransformerTemplate.transform).\n",
+    "- Because the transformation is dictated by the choice of `w` and not calibrated from data, [`fit_transform()`](TransformerTemplate.fit_transform) simply calls [`transform()`](TransformerTemplate.transform).\n",
     "- When `locs` is provided in [`inverse_transform()`](TransformerTemplate.inverse_transform), it is assumed that the `states_transformed` are the elements of the state vector at the given locations. That is,`inverse_transform(transform(states)[locs], locs) == states[locs]`.\n",
     ":::"
    ]
diff --git a/docs/source/index.md b/docs/source/index.md
@@ -8,7 +8,7 @@
 [![PyPI](https://img.shields.io/pypi/wheel/opinf)](https://pypi.org/project/opinf/)
 
 :::{attention}
-This documentation is for `opinf` version `0.5`, which introduced major changes from the previous version `0.4.5`.
+This documentation is for `opinf` version `0.5.x`, which introduced major changes from the previous version `0.4.5`.
 See updates and notes for old versions [here](./opinf/changelog.md).
 :::
 
diff --git a/docs/source/opinf/changelog.md b/docs/source/opinf/changelog.md
@@ -5,6 +5,11 @@
 New versions may introduce substantial new features or API adjustments.
 :::
 
+## Version 0.5.11
+
+- New scaling option for ``pre.ShiftScaleTransformer`` so that training snapshots have at maximum norm 1. Contributed by [@nicolearetz](https://github.com/nicolearetz).
+- Small clarifications to ``pre.ShiftScaleTransformer`` and updates to the ``pre`` documentation.
+
 ## Version 0.5.10
 
 New POD basis solver option `basis.PODBasis(solver="method-of-snapshots")` (or `solver="eigh"`), which solves a symmetric eigenvalue problem instead of computing a (weighted) SVD. This method is more efficient than the SVD for snapshot matrices $\mathbf{Q}\in\mathbb{R}^{n\times k}$ where $n \gg k$ and is significantly more efficient than the SVD when a non-diagonal weight matrix is provided.
diff --git a/src/opinf/__init__.py b/src/opinf/__init__.py
@@ -7,7 +7,7 @@
     https://github.com/Willcox-Research-Group/rom-operator-inference-Python3
 """
 
-__version__ = "0.5.10"
+__version__ = "0.5.11"
 
 from . import (
     basis,
diff --git a/src/opinf/pre/__init__.py b/src/opinf/pre/__init__.py
@@ -1,5 +1,7 @@
 # pre/__init__.py
-"""Tools for preprocessing snapshot data prior to compression."""
+"""Tools for preprocessing snapshot data after (optional) lifting but prior to
+compression.
+"""
 
 from ._base import *
 from ._multi import *
diff --git a/src/opinf/pre/_shiftscale.py b/src/opinf/pre/_shiftscale.py
@@ -652,7 +652,8 @@ def load(cls, loadfile):
 
 
 class ShiftScaleTransformer(TransformerTemplate):
-    r"""Process snapshots by centering and/or scaling (in that order).
+    r"""Process snapshots by vector centering and/or affine scaling
+    (in that order).
 
     Transformations with this class are notated below as
 
@@ -664,9 +665,11 @@ class ShiftScaleTransformer(TransformerTemplate):
 
     where :math:`\Q\in\RR^{n \times k}` is the snapshot matrix to be
     transformed and :math:`\Q''\in\RR^{n \times k}` is the transformed snapshot
-    matrix.
+    matrix. Transformation parameters are learned from a training data set, not
+    provided explicitly by the user as in :class:`ShiftTransformer` or
+    :class:`ScaleTransformer`.
 
-    All transformations with this class are `affine` and hence can be written
+    All transformations with this class are *affine* and hence can be written
     componentwise as :math:`\Q_{i,j}'' = \alpha_{i,j} \Q_{i,j} + \beta_{i,j}`
     for some choice of :math:`\alpha_{i,j},\beta_{i,j}\in\RR`.
 
@@ -683,6 +686,14 @@ class ShiftScaleTransformer(TransformerTemplate):
         If given, scale (non-dimensionalize) the centered snapshot entries.
         Otherwise, :math:`\Q'' = \Q'` (default).
 
+        All scaling options multiply :math:`\Q'` by a constant; others
+        (symmetric scalings, ``'standard'`` and those ending in ``'sym'``)
+        shift the entries of :math:`\Q'` by a constant (the mean entry) as
+        well. This is different from setting ``centering=True``, which shifts
+        each column of :math:`\Q` by a vector; however, when ``centering=True``
+        symmetric scaling options are equivalent to their non-symmetric
+        counterparts because in that case the mean of :math:`\Q'` is zero.
+
         **Options:**
 
         .. dropdown:: ``'standard'``
@@ -757,9 +768,39 @@ class ShiftScaleTransformer(TransformerTemplate):
                   :math:`\max_j(\text{abs}(\Q_{i,j}'')) = 1` for each row index
                   :math:`i`
 
+        .. dropdown:: ``'maxnorm'``
+            Maximum Euclidean norm scaling to :math:`[0, 1]` without
+            scalar mean shift
+
+            .. list-table::
+
+              * - Formula
+                - .. math:: \Q'' = \frac{1}{\max_j(\|\Q'_{:,j}\|_2)}\Q'
+              * - ``byrow=False``
+                - :math:`\mean(\Q'')=\frac{\mean(\Q')}{\max_j(\|\Q'_{:,j}\|)}`
+                  and :math:`\max_j(\|\Q''_{:,j}\|) = 1`
+              * - ``byrow=True``
+                - ``ValueError``: use ``'maxabs'`` instead
+
+        .. dropdown:: ``'maxnormsym'``
+            Maximum Euclidean norm scaling to :math:`[0, 1]` with scalar mean
+            shift
+
+            .. list-table::
+
+              * - Formula
+                - .. math::
+                     \Q'' = \frac{\Q' - \text{mean}(\Q')}{
+                     \max_j(\|\Q'_{:,j} - \text{mean}(\Q')\|_2)}
+              * - ``byrow=False``
+                - :math:`\mean(\Q'')=0` and :math:`\max_j(\|\Q''_{:,j}\|) = 1`
+              * - ``byrow=True``
+                - ``ValueError``: use ``'maxabssym'`` instead
+
     byrow : bool
         If ``True``, scale each row of the snapshot matrix separately when a
-        scaling is specified. Otherwise, scale the entire matrix at once.
+        scaling is specified. Otherwise, scale the entire matrix at once
+        (default).
 
     verbose : bool
         If ``True``, print information upon learning a transformation.
@@ -785,6 +826,8 @@ class ShiftScaleTransformer(TransformerTemplate):
             "minmaxsym",
             "maxabs",
             "maxabssym",
+            "maxnorm",
+            "maxnormsym",
         )
     )
 
@@ -824,6 +867,10 @@ def __init__(
                 "scaling=None --> byrow=True will have no effect",
                 errors.OpInfWarning,
             )
+        if self.__byrow and self.__scaling in ("maxnorm", "maxnormsym"):
+            raise ValueError(
+                f"scaling '{self.__scaling}' is invalid when byrow=True"
+            )
 
         # Set other properties.
         self.verbose = verbose
@@ -1049,15 +1096,36 @@ def fit_transform(self, states, inplace: bool = False):
                     0 if axis is None else np.zeros(self.state_dimension)
                 )
 
-            # maxabssym: Q' = (Q - mean(Q)) / max(abs(Q - mean(Q)))
+            # Symmetric MaxAbs: Q' = (Q - mean(Q)) / max(abs(Q - mean(Q)))
             elif self.scaling == "maxabssym":
                 mu = np.mean(Y, axis=axis)
                 Y -= mu if axis is None else mu.reshape((-1, 1))
                 self.scale_ = 1 / np.max(np.abs(Y), axis=axis)
                 self.shift_ = -mu * self.scale_
                 Y += mu if axis is None else mu.reshape((-1, 1))
 
-            else:  # pragma nocover
+            # MaxNorm: Q' = Q / max(norm(Q))
+            elif self.scaling == "maxnorm":
+                # scale such that the norm of each snapshot is <= 1
+                if self.byrow:  # pragma: nocover
+                    raise RuntimeError(
+                        f"invalid scaling '{self.scaling}' for byrow=True"
+                    )
+
+                self.scale_ = 1 / np.max(np.linalg.norm(Y, axis=0, ord=2))
+                self.shift_ = 0
+
+            # Symmetric MaxNorm: Q' = (Q - mean(Q)) / max(norm(Q - mean(Q)))
+            elif self.scaling == "maxnormsym":
+                if self.byrow:  # pragma: nocover
+                    raise RuntimeError(
+                        f"invalid scaling '{self.scaling}' for byrow=True"
+                    )
+                mu = np.mean(Y)
+                self.scale_ = 1 / np.max(np.linalg.norm(Y - mu, axis=0, ord=2))
+                self.shift_ = -mu * self.scale_
+
+            else:  # pragma: nocover
                 raise RuntimeError(f"invalid scaling '{self.scaling}'")
 
             # Apply the scaling.
diff --git a/tests/pre/test_shiftscale.py b/tests/pre/test_shiftscale.py
@@ -7,7 +7,10 @@
 
 import opinf
 
-from .test_base import _TestTransformer
+try:
+    from .test_base import _TestTransformer
+except ImportError:
+    from test_base import _TestTransformer
 
 
 # Functions ===================================================================
@@ -158,7 +161,8 @@ def get_transformers(self, name=None):
                 verbose=False,
             )
             self.requires_training = True
-            if scaling is not None:
+            if scaling is not None and "maxnorm" not in scaling:
+                # "maxnorm" scaling is incompatible with byrow=True
                 yield self.Transformer(
                     centering=centering,
                     scaling=scaling,
@@ -361,6 +365,17 @@ def fit_transform_copy(st, A):
             assert np.isclose(np.mean(Y), 0)
             assert np.isclose(np.max(np.abs(Y)), 1)
 
+            # Test maximum norm scaling.
+            st = self.Transformer(centering=centering, scaling="maxnorm")
+            Y = fit_transform_copy(st, X)
+            assert np.isclose(np.max(np.linalg.norm(Y, axis=0)), 1)
+
+            # Test symmetric maximum norm scaling.
+            st = self.Transformer(centering=centering, scaling="maxnormsym")
+            Y = fit_transform_copy(st, X)
+            assert np.isclose(np.mean(Y), 0)
+            assert np.isclose(np.max(np.linalg.norm(Y, axis=0)), 1)
+
         # Test scaling by row (without and with centering).
         for centering in (False, True):
             # Test standard scaling.
@@ -414,6 +429,18 @@ def fit_transform_copy(st, A):
             assert np.allclose(np.mean(Y, axis=1), 0)
             assert np.allclose(np.max(np.abs(Y), axis=1), 1)
 
+            # Test norm scaling.
+            for s in "maxnorm", "maxnormsym":
+                with pytest.raises(ValueError) as ex:
+                    self.Transformer(
+                        centering=centering,
+                        scaling=s,
+                        byrow=True,
+                    )
+                assert ex.value.args[0] == (
+                    f"scaling '{s}' is invalid when byrow=True"
+                )
+
     def test_mains(self, n=11, k=21):
         """Test fit(), fit_transform(), transform(), transform_ddts(), and
         inverse_transform().