Skip to content

Commit 78eb54b

Browse files
authored
Replace some ReverseDiffs with Mooncake (#623)
* Use Mooncake for Optimization.jl example * Use Mooncake in pPCA * Update a link to AD docs page * Fix bad shortcodes
1 parent b0afb98 commit 78eb54b

File tree

3 files changed

+20
-15
lines changed

3 files changed

+20
-15
lines changed

tutorials/gaussian-processes-introduction/index.qmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,7 @@ please let us know!
131131
Moving on, we generate samples from the posterior using the default `NUTS` sampler.
132132
We'll make use of [ReverseDiff.jl](https://github.com/JuliaDiff/ReverseDiff.jl), as it has
133133
better performance than [ForwardDiff.jl](https://github.com/JuliaDiff/ForwardDiff.jl/) on
134-
this example. See Turing.jl's docs on Automatic Differentiation for more info.
134+
this example. See the [automatic differentiation docs]({{< meta usage-automatic-differentiation >}}) for more info.
135135

136136

137137
```{julia}

tutorials/probabilistic-pca/index.qmd

Lines changed: 15 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ First, we load the dependencies used.
9090

9191
```{julia}
9292
using Turing
93-
using ReverseDiff
93+
using Mooncake
9494
using LinearAlgebra, FillArrays
9595
9696
# Packages for visualization
@@ -108,7 +108,7 @@ You can install them via `using Pkg; Pkg.add("package_name")`.
108108
::: {.callout-caution}
109109
## Package usages:
110110
We use `DataFrames` for instantiating matrices, `LinearAlgebra` and `FillArrays` to perform matrix operations;
111-
`Turing` for model specification and MCMC sampling, `ReverseDiff` for setting the automatic differentiation backend when sampling.
111+
`Turing` for model specification and MCMC sampling, `Mooncake` for automatic differentiation when sampling.
112112
`StatsPlots` for visualising the resutls. `, Measures` for setting plot margin units.
113113
As all examples involve sampling, for reproducibility we set a fixed seed using the `Random` standard library.
114114
:::
@@ -194,8 +194,9 @@ Specifically:
194194

195195
Here we aim to perform MCMC sampling to infer the projection matrix $\mathbf{W}_{D \times k}$, the latent variable matrix $\mathbf{Z}_{k \times N}$, and the offsets $\boldsymbol{\mu}_{N \times 1}$.
196196

197-
We run the inference using the NUTS sampler, of which the chain length is set to be 500, target accept ratio 0.65 and initial stepsize 0.1. By default, the NUTS sampler samples 1 chain.
198-
You are free to try [different samplers](https://turinglang.org/stable/docs/library/#samplers).
197+
We run the inference using the NUTS sampler.
198+
By default, `sample` samples a single chain (in this case with 500 samples).
199+
You can also use [different samplers]({{< meta usage-sampler-visualisation >}}) if you wish.
199200

200201
```{julia}
201202
#| output: false
@@ -205,17 +206,21 @@ setprogress!(false)
205206
```{julia}
206207
k = 2 # k is the dimension of the projected space, i.e. the number of principal components/axes of choice
207208
ppca = pPCA(mat_exp', k) # instantiate the probabilistic model
208-
chain_ppca = sample(ppca, NUTS(;adtype=AutoReverseDiff()), 500);
209+
chain_ppca = sample(ppca, NUTS(; adtype=AutoMooncake(; config=nothing)), 500);
209210
```
210211

211-
The samples are saved in the Chains struct `chain_ppca`, whose shape can be checked:
212+
The samples are saved in `chain_ppca`, which is an `MCMCChains.Chains` object.
213+
We can check its shape:
212214

213215
```{julia}
214216
size(chain_ppca) # (no. of iterations, no. of vars, no. of chains) = (500, 159, 1)
215217
```
216218

217-
The Chains struct `chain_ppca` also contains the sampling info such as r-hat, ess, mean estimates, etc.
218-
You can print it to check these quantities.
219+
Sampling statistics such as R-hat, ESS, mean estimates, and so on can also be obtained from this:
220+
221+
```{julia}
222+
describe(chain_ppca)
223+
```
219224

220225
#### Step 5: posterior predictive checks
221226

@@ -280,7 +285,7 @@ Another way to put it: 2 dimensions is enough to capture the main structure of t
280285
A direct question arises from above practice is: how many principal components do we want to keep, in order to sufficiently represent the latent structure in the data?
281286
This is a very central question for all latent factor models, i.e. how many dimensions are needed to represent that data in the latent space.
282287
In the case of PCA, there exist a lot of heuristics to make that choice.
283-
For example, We can tune the number of principal components using empirical methods such as cross-validation based some criteria such as MSE between the posterior predicted (e.g. mean predictions) data matrix and the original data matrix or the percentage of variation explained [^3].
288+
For example, We can tune the number of principal components using empirical methods such as cross-validation based on some criteria such as MSE between the posterior predicted (e.g. mean predictions) data matrix and the original data matrix or the percentage of variation explained [^3].
284289

285290
For p-PCA, this can be done in an elegant and principled way, using a technique called *Automatic Relevance Determination* (ARD).
286291
ARD can help pick the correct number of principal directions by regularizing the solution space using a parameterized, data-dependent prior distribution that effectively prunes away redundant or superfluous features [^4].
@@ -315,7 +320,7 @@ We instantiate the model and ask Turing to sample from it using NUTS sampler. Th
315320

316321
```{julia}
317322
ppca_ARD = pPCA_ARD(mat_exp') # instantiate the probabilistic model
318-
chain_ppcaARD = sample(ppca_ARD, NUTS(;adtype=AutoReverseDiff()), 500) # sampling
323+
chain_ppcaARD = sample(ppca_ARD, NUTS(; adtype=AutoMooncake(; config=nothing)), 500) # sampling
319324
plot(group(chain_ppcaARD, :α); margin=6.0mm)
320325
```
321326

usage/mode-estimation/index.qmd

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -71,14 +71,14 @@ The above are just two examples, Optimization.jl supports [many more](https://do
7171
We can also help the optimisation by giving it a starting point we know is close to the final solution, or by specifying an automatic differentiation method
7272

7373
```{julia}
74-
using ADTypes: AutoReverseDiff
75-
import ReverseDiff
74+
import Mooncake
75+
7676
maximum_likelihood(
77-
model, NelderMead(); initial_params=[0.1, 2], adtype=AutoReverseDiff()
77+
model, NelderMead(); initial_params=[0.1, 2], adtype=AutoMooncake(; config=nothing)
7878
)
7979
```
8080

81-
When providing values to arguments like `initial_params` the parameters are typically specified in the order in which they appear in the code of the model, so in this case first `` then `m`. More precisely it's the order returned by `Turing.Inference.getparams(model, Turing.VarInfo(model))`.
81+
When providing values to arguments like `initial_params` the parameters are typically specified in the order in which they appear in the code of the model, so in this case first `` then `m`. More precisely it's the order returned by `Turing.Inference.getparams(model, DynamicPPL.VarInfo(model))`.
8282

8383
We can also do constrained optimisation, by providing either intervals within which the parameters must stay, or costraint functions that they need to respect. For instance, here's how one can find the MLE with the constraint that the variance must be less than 0.01 and the mean must be between -1 and 1.:
8484

0 commit comments

Comments
 (0)