Skip to content

Commit 2788f7c

Browse files
authored
a number of minor tidy ups (#624)
* Fix broken links, tidy shortcodes * Minor text fixes * Text fixes * Fix the fixes * Bump versions, remove explicit Zygote dep * remove Zygote, add DynamicPPL.DebugUtils in performance tips * remove config=nothing from Mooncake
1 parent 78eb54b commit 2788f7c

File tree

13 files changed

+232
-243
lines changed

13 files changed

+232
-243
lines changed

Manifest.toml

Lines changed: 132 additions & 129 deletions
Large diffs are not rendered by default.

Project.toml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,6 @@ StatsFuns = "4c63d2b9-4356-54db-8cca-17b64c39e42c"
5454
StatsPlots = "f3b207a7-027a-5e70-b257-86293d7955fd"
5555
Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"
5656
UnPack = "3a884ed6-31ef-47d7-9d2a-63182c4928ed"
57-
Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"
5857

5958
[compat]
6059
Turing = "0.39"

_quarto.yml

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -169,23 +169,23 @@ include-in-header:
169169
# Note that you don't need to prepend `../../` to the link, Quarto will figure
170170
# it out automatically.
171171

172-
get-started: tutorials/docs-00-getting-started
173-
tutorials-intro: tutorials/00-introduction
174-
gaussian-mixture-model: tutorials/01-gaussian-mixture-model
175-
logistic-regression: tutorials/02-logistic-regression
176-
bayesian-neural-network: tutorials/03-bayesian-neural-network
177-
hidden-markov-model: tutorials/04-hidden-markov-model
178-
linear-regression: tutorials/05-linear-regression
179-
infinite-mixture-model: tutorials/06-infinite-mixture-model
180-
poisson-regression: tutorials/07-poisson-regression
181-
multinomial-logistic-regression: tutorials/08-multinomial-logistic-regression
182-
variational-inference: tutorials/09-variational-inference
183-
bayesian-differential-equations: tutorials/10-bayesian-differential-equations
184-
probabilistic-pca: tutorials/11-probabilistic-pca
185-
gplvm: tutorials/12-gplvm
186-
seasonal-time-series: tutorials/13-seasonal-time-series
187-
using-turing-advanced: tutorials/docs-09-using-turing-advanced
188-
using-turing: tutorials/docs-12-using-turing-guide
172+
core-functionality: core-functionality
173+
get-started: getting-started
174+
175+
tutorials-intro: tutorials/coin-flipping
176+
gaussian-mixture-model: tutorials/gaussian-mixture-models
177+
logistic-regression: tutorials/bayesian-logistic-regression
178+
bayesian-neural-network: tutorials/bayesian-neural-networks
179+
hidden-markov-model: tutorials/hidden-markov-models
180+
linear-regression: tutorials/bayesian-linear-regression
181+
infinite-mixture-model: tutorials/infinite-mixture-models
182+
poisson-regression: tutorials/bayesian-poisson-regression
183+
multinomial-logistic-regression: tutorials/multinomial-logistic-regression
184+
variational-inference: tutorials/variational-inference
185+
bayesian-differential-equations: tutorials/bayesian-differential-equations
186+
probabilistic-pca: tutorials/probabilistic-pca
187+
gplvm: tutorials/gaussian-process-latent-variable-models
188+
seasonal-time-series: tutorials/bayesian-time-series-analysis
189189

190190
usage-automatic-differentiation: usage/automatic-differentiation
191191
usage-custom-distribution: usage/custom-distribution
@@ -204,7 +204,7 @@ dev-model-manual: developers/compiler/model-manual
204204
contexts: developers/compiler/minituring-contexts
205205
minituring: developers/compiler/minituring-compiler
206206
using-turing-compiler: developers/compiler/design-overview
207-
using-turing-variational-inference: developers/inference/variational-inference
207+
dev-variational-inference: developers/inference/variational-inference
208208
using-turing-implementing-samplers: developers/inference/implementing-samplers
209209
dev-transforms-distributions: developers/transforms/distributions
210210
dev-transforms-bijectors: developers/transforms/bijectors

getting-started/index.qmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,5 +92,5 @@ The underlying theory of Bayesian machine learning is not explained in detail in
9292
A thorough introduction to the field is [*Pattern Recognition and Machine Learning*](https://www.springer.com/us/book/9780387310732) (Bishop, 2006); an online version is available [here (PDF, 18.1 MB)](https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf).
9393
:::
9494

95-
The next page on [Turing's core functionality]({{<meta using-turing>}}) explains the basic features of the Turing language.
95+
The next page on [Turing's core functionality]({{<meta core-functionality>}}) explains the basic features of the Turing language.
9696
From there, you can either look at [worked examples of how different models are implemented in Turing]({{<meta tutorials-intro>}}), or [specific tips and tricks that can help you get the most out of Turing]({{<meta usage-performance-tips>}}).

tutorials/bayesian-differential-equations/index.qmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -344,7 +344,7 @@ import Mooncake
344344
import SciMLSensitivity
345345
346346
# Define the AD backend to use
347-
adtype = AutoMooncake(; config=nothing)
347+
adtype = AutoMooncake()
348348
349349
# Sample a single chain with 1000 samples using Mooncake
350350
sample(model, NUTS(; adtype=adtype), 1000; progress=false)

tutorials/bayesian-neural-networks/index.qmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -210,7 +210,7 @@ setprogress!(false)
210210
```{julia}
211211
# Perform inference.
212212
n_iters = 2_000
213-
ch = sample(bayes_nn(reduce(hcat, xs), ts), NUTS(; adtype=AutoMooncake(; config=nothing)), n_iters);
213+
ch = sample(bayes_nn(reduce(hcat, xs), ts), NUTS(; adtype=AutoMooncake()), n_iters);
214214
```
215215

216216
Now we extract the parameter samples from the sampled chain as `θ` (this is of size `5000 x 20` where `5000` is the number of iterations and `20` is the number of parameters).

tutorials/gaussian-mixture-models/index.qmd

Lines changed: 45 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -74,9 +74,9 @@ and then drawing the datum accordingly, i.e., in our example drawing
7474
$$
7575
x_i \sim \mathcal{N}([\mu_{z_i}, \mu_{z_i}]^\mathsf{T}, I) \qquad (i=1,\ldots,N).
7676
$$
77-
For more details on Gaussian mixture models, we refer to Christopher M. Bishop, *Pattern Recognition and Machine Learning*, Section 9.
77+
For more details on Gaussian mixture models, refer to Chapter 9 of Christopher M. Bishop, *Pattern Recognition and Machine Learning*.
7878

79-
We specify the model with Turing.
79+
We specify the model in Turing:
8080

8181
```{julia}
8282
using Turing
@@ -130,10 +130,11 @@ burn = 10
130130
chains = sample(model, sampler, MCMCThreads(), nsamples, nchains, discard_initial = burn);
131131
```
132132

133-
::: {.callout-warning collapse="true"}
133+
::: {.callout-warning}
134134
## Sampling With Multiple Threads
135-
The `sample()` call above assumes that you have at least `nchains` threads available in your Julia instance. If you do not, the multiple chains
136-
will run sequentially, and you may notice a warning. For more information, see [the Turing documentation on sampling multiple chains.](https://turinglang.org/dev/docs/using-turing/guide/#sampling-multiple-chains)
135+
The `sample()` call above assumes that you have at least two threads available in your Julia instance.
136+
If you do not, the multiple chains will run sequentially, and you may notice a warning.
137+
For more information, see [the Turing documentation on sampling multiple chains.]({{<meta core-functionality>}}#sampling-multiple-chains)
137138
:::
138139

139140
```{julia}
@@ -159,12 +160,14 @@ We consider the samples of the location parameters $\mu_1$ and $\mu_2$ for the t
159160
plot(chains[["μ[1]", "μ[2]"]]; legend=true)
160161
```
161162

162-
It can happen that the modes of $\mu_1$ and $\mu_2$ switch between chains.
163-
For more information see the [Stan documentation](https://mc-stan.org/users/documentation/case-studies/identifying_mixture_models.html). This is because it's possible for either model parameter $\mu_k$ to be assigned to either of the corresponding true means, and this assignment need not be consistent between chains.
163+
From the plots above, we can see that the chains have converged to seemingly different values for the parameters $\mu_1$ and $\mu_2$.
164+
However, these actually represent the same solution: it does not matter whether we assign $\mu_1$ to the first cluster and $\mu_2$ to the second, or vice versa, since the resulting sum is the same.
165+
(In principle it is also possible for the parameters to swap places _within_ a single chain, although this does not happen in this example.)
166+
For more information see the [Stan documentation](https://mc-stan.org/users/documentation/case-studies/identifying_mixture_models.html), or Bishop's book, where the concept of _identifiability_ is discussed.
164167

165-
That is, the posterior is fundamentally multimodal, and different chains can end up in different modes, complicating inference.
166-
One solution here is to enforce an ordering on our $\mu$ vector, requiring $\mu_k > \mu_{k-1}$ for all $k$.
167-
`Bijectors.jl` [provides](https://turinglang.org/Bijectors.jl/dev/transforms/#Bijectors.OrderedBijector) an easy transformation (`ordered()`) for this purpose:
168+
Having $\mu_1$ and $\mu_2$ swap can complicate the interpretation of the results, especially when different chains converge to different assignments.
169+
One solution here is to enforce an ordering on our $\mu$ vector, requiring $\mu_k \geq \mu_{k-1}$ for all $k$.
170+
`Bijectors.jl` [provides](https://turinglang.org/Bijectors.jl/stable/transforms/#Bijectors.OrderedBijector) a convenient function, `ordered()`, which can be applied to a (continuous multivariate) distribution to enforce this:
168171

169172
```{julia}
170173
using Bijectors: ordered
@@ -194,15 +197,13 @@ end
194197
model = gaussian_mixture_model_ordered(x);
195198
```
196199

197-
198-
Now, re-running our model, we can see that the assigned means are consistent across chains:
200+
Now, re-running our model, we can see that the assigned means are consistent between chains:
199201

200202
```{julia}
201203
#| output: false
202204
chains = sample(model, sampler, MCMCThreads(), nsamples, nchains, discard_initial = burn);
203205
```
204206

205-
206207
```{julia}
207208
#| echo: false
208209
let
@@ -243,6 +244,7 @@ scatter!(x[1, :], x[2, :]; legend=false, title="Synthetic Dataset")
243244
```
244245

245246
## Inferred Assignments
247+
246248
Finally, we can inspect the assignments of the data points inferred using Turing.
247249
As we can see, the dataset is partitioned into two distinct groups.
248250

@@ -259,23 +261,23 @@ scatter(
259261

260262

261263
## Marginalizing Out The Assignments
262-
We can write out the marginal posterior of (continuous) $w, \mu$ by summing out the influence of our (discrete) assignments $z_i$ from
263-
our likelihood:
264-
$$
265-
p(y \mid w, \mu ) = \sum_{k=1}^K w_k p_k(y \mid \mu_k)
266-
$$
264+
265+
We can write out the marginal posterior of (continuous) $w, \mu$ by summing out the influence of our (discrete) assignments $z_i$ from our likelihood:
266+
267+
$$p(y \mid w, \mu ) = \sum_{k=1}^K w_k p_k(y \mid \mu_k)$$
268+
267269
In our case, this gives us:
268-
$$
269-
p(y \mid w, \mu) = \sum_{k=1}^K w_k \cdot \operatorname{MvNormal}(y \mid \mu_k, I)
270-
$$
270+
271+
$$p(y \mid w, \mu) = \sum_{k=1}^K w_k \cdot \operatorname{MvNormal}(y \mid \mu_k, I)$$
271272

272273

273274
### Marginalizing By Hand
274-
We could implement the above version of the Gaussian mixture model in Turing as follows:
275+
276+
We could implement the above version of the Gaussian mixture model in Turing as follows.
277+
275278
First, Turing uses log-probabilities, so the likelihood above must be converted into log-space:
276-
$$
277-
\log \left( p(y \mid w, \mu) \right) = \text{logsumexp} \left[\log (w_k) + \log(\operatorname{MvNormal}(y \mid \mu_k, I)) \right]
278-
$$
279+
280+
$$\log \left( p(y \mid w, \mu) \right) = \text{logsumexp} \left[\log (w_k) + \log(\operatorname{MvNormal}(y \mid \mu_k, I)) \right]$$
279281

280282
Where we sum the components with `logsumexp` from the [`LogExpFunctions.jl` package](https://juliastats.org/LogExpFunctions.jl/stable/).
281283
The manually incremented likelihood can be added to the log-probability with `@addlogprob!`, giving us the following model:
@@ -300,27 +302,25 @@ using LogExpFunctions
300302
end
301303
```
302304

303-
::: {.callout-warning collapse="false"}
305+
::: {.callout-warning}
304306
## Manually Incrementing Probablity
305307

306-
When possible, use of `@addlogprob!` should be avoided, as it exists outside the
307-
usual structure of a Turing model. In most cases, a custom distribution should be used instead.
308+
When possible, use of `@addlogprob!` should be avoided, as it exists outside the usual structure of a Turing model.
309+
In most cases, a custom distribution should be used instead.
308310

309-
Here, the next section demonstrates the preferred method --- using the `MixtureModel` distribution we have seen already to
310-
perform the marginalization automatically.
311+
The next section demonstrates the preferred method: using the `MixtureModel` distribution we have seen already to perform the marginalization automatically.
311312
:::
312313

314+
### Marginalizing For Free With Distribution.jl's `MixtureModel` Implementation
313315

314-
### Marginalizing For Free With Distribution.jl's MixtureModel Implementation
315-
316-
We can use Turing's `~` syntax with anything that `Distributions.jl` provides `logpdf` and `rand` methods for. It turns out that the
317-
`MixtureModel` distribution it provides has, as its `logpdf` method, `logpdf(MixtureModel([Component_Distributions], weight_vector), Y)`, where `Y` can be either a single observation or vector of observations.
316+
We can use Turing's `~` syntax with anything that `Distributions.jl` provides `logpdf` and `rand` methods for.
317+
It turns out that the `MixtureModel` distribution it provides has, as its `logpdf` method, `logpdf(MixtureModel([Component_Distributions], weight_vector), Y)`, where `Y` can be either a single observation or vector of observations.
318318

319319
In fact, `Distributions.jl` provides [many convenient constructors](https://juliastats.org/Distributions.jl/stable/mixture/) for mixture models, allowing further simplification in common special cases.
320320

321321
For example, when mixtures distributions are of the same type, one can write: `~ MixtureModel(Normal, [(μ1, σ1), (μ2, σ2)], w)`, or when the weight vector is known to allocate probability equally, it can be ommited.
322322

323-
The `logpdf` implementation for a `MixtureModel` distribution is exactly the marginalization defined above, and so our model becomes simply:
323+
The `logpdf` implementation for a `MixtureModel` distribution is exactly the marginalization defined above, and so our model can be simplified to:
324324

325325
```{julia}
326326
#| output: false
@@ -334,15 +334,14 @@ end
334334
model = gmm_marginalized(x);
335335
```
336336

337-
As we've summed out the discrete components, we can perform inference using `NUTS()` alone.
337+
As we have summed out the discrete components, we can perform inference using `NUTS()` alone.
338338

339339
```{julia}
340340
#| output: false
341341
sampler = NUTS()
342342
chains = sample(model, sampler, MCMCThreads(), nsamples, nchains; discard_initial = burn);
343343
```
344344

345-
346345
```{julia}
347346
#| echo: false
348347
let
@@ -356,23 +355,22 @@ let
356355
end
357356
```
358357

359-
`NUTS()` significantly outperforms our compositional Gibbs sampler, in large part because our model is now Rao-Blackwellized thanks to
360-
the marginalization of our assignment parameter.
358+
`NUTS()` significantly outperforms our compositional Gibbs sampler, in large part because our model is now Rao-Blackwellized thanks to the marginalization of our assignment parameter.
361359

362360
```{julia}
363361
plot(chains[["μ[1]", "μ[2]"]], legend=true)
364362
```
365363

366-
## Inferred Assignments - Marginalized Model
367-
As we've summed over possible assignments, the associated parameter is no longer available in our chain.
368-
This is not a problem, however, as given any fixed sample $(\mu, w)$, the assignment probability — $p(z_i \mid y_i)$ — can be recovered using Bayes rule:
369-
$$
370-
p(z_i \mid y_i) = \frac{p(y_i \mid z_i) p(z_i)}{\sum_{k = 1}^K \left(p(y_i \mid z_i) p(z_i) \right)}
371-
$$
364+
## Inferred Assignments With The Marginalized Model
365+
366+
As we have summed over possible assignments, the latent parameter representing the assignments is no longer available in our chain.
367+
This is not a problem, however, as given any fixed sample $(\mu, w)$, the assignment probability $p(z_i \mid y_i)$ can be recovered using Bayes's theorme:
372368

373-
This quantity can be computed for every $p(z = z_i \mid y_i)$, resulting in a probability vector, which is then used to sample
374-
posterior predictive assignments from a categorial distribution.
369+
$$p(z_i \mid y_i) = \frac{p(y_i \mid z_i) p(z_i)}{\sum_{k = 1}^K \left(p(y_i \mid z_i) p(z_i) \right)}$$
370+
371+
This quantity can be computed for every $p(z = z_i \mid y_i)$, resulting in a probability vector, which is then used to sample posterior predictive assignments from a categorial distribution.
375372
For details on the mathematics here, see [the Stan documentation on latent discrete parameters](https://mc-stan.org/docs/stan-users-guide/latent-discrete.html).
373+
376374
```{julia}
377375
#| output: false
378376
function sample_class(xi, dists, w)

tutorials/multinomial-logistic-regression/index.qmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ chain
147147
::: {.callout-warning collapse="true"}
148148
## Sampling With Multiple Threads
149149
The `sample()` call above assumes that you have at least `nchains` threads available in your Julia instance. If you do not, the multiple chains
150-
will run sequentially, and you may notice a warning. For more information, see [the Turing documentation on sampling multiple chains.]({{<meta using-turing>}}#sampling-multiple-chains)
150+
will run sequentially, and you may notice a warning. For more information, see [the Turing documentation on sampling multiple chains.]({{<meta core-functionality>}}#sampling-multiple-chains)
151151
:::
152152

153153
Since we ran multiple chains, we may as well do a spot check to make sure each chain converges around similar points.

tutorials/probabilistic-pca/index.qmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -206,7 +206,7 @@ setprogress!(false)
206206
```{julia}
207207
k = 2 # k is the dimension of the projected space, i.e. the number of principal components/axes of choice
208208
ppca = pPCA(mat_exp', k) # instantiate the probabilistic model
209-
chain_ppca = sample(ppca, NUTS(; adtype=AutoMooncake(; config=nothing)), 500);
209+
chain_ppca = sample(ppca, NUTS(; adtype=AutoMooncake()), 500);
210210
```
211211

212212
The samples are saved in `chain_ppca`, which is an `MCMCChains.Chains` object.
@@ -320,7 +320,7 @@ We instantiate the model and ask Turing to sample from it using NUTS sampler. Th
320320

321321
```{julia}
322322
ppca_ARD = pPCA_ARD(mat_exp') # instantiate the probabilistic model
323-
chain_ppcaARD = sample(ppca_ARD, NUTS(; adtype=AutoMooncake(; config=nothing)), 500) # sampling
323+
chain_ppcaARD = sample(ppca_ARD, NUTS(; adtype=AutoMooncake()), 500) # sampling
324324
plot(group(chain_ppcaARD, :α); margin=6.0mm)
325325
```
326326

tutorials/variational-inference/index.qmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ Pkg.instantiate();
1414

1515
This post will look at **variational inference (VI)**, an optimization approach to _approximate_ Bayesian inference, and how to use it in Turing.jl as an alternative to other approaches such as MCMC.
1616
This post will focus on the usage of VI in Turing rather than the principles and theory underlying VI.
17-
If you are interested in understanding the mathematics you can checkout [our write-up]({{<meta using-turing-variational-inference>}}) or any other resource online (there are a lot of great ones).
17+
If you are interested in understanding the mathematics you can checkout [our write-up]({{<meta dev-variational-inference>}}) or any other resource online (there are a lot of great ones).
1818

1919
Let's start with a minimal example.
2020
Consider a `Turing.Model`, which we denote as `model`.

usage/automatic-differentiation/index.qmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ import Mooncake
3737
# Rest of your model here
3838
end
3939
40-
sample(f(), HMC(0.1, 5; adtype=AutoMooncake(; config=nothing)), 100)
40+
sample(f(), HMC(0.1, 5; adtype=AutoMooncake()), 100)
4141
```
4242

4343
By default, if you do not specify a backend, Turing will default to [ForwardDiff.jl](https://github.com/JuliaDiff/ForwardDiff.jl).

usage/mode-estimation/index.qmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ We can also help the optimisation by giving it a starting point we know is close
7474
import Mooncake
7575
7676
maximum_likelihood(
77-
model, NelderMead(); initial_params=[0.1, 2], adtype=AutoMooncake(; config=nothing)
77+
model, NelderMead(); initial_params=[0.1, 2], adtype=AutoMooncake()
7878
)
7979
```
8080

0 commit comments

Comments
 (0)