You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: getting-started/index.qmd
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -92,5 +92,5 @@ The underlying theory of Bayesian machine learning is not explained in detail in
92
92
A thorough introduction to the field is [*Pattern Recognition and Machine Learning*](https://www.springer.com/us/book/9780387310732) (Bishop, 2006); an online version is available [here (PDF, 18.1 MB)](https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf).
93
93
:::
94
94
95
-
The next page on [Turing's core functionality]({{<metausing-turing>}}) explains the basic features of the Turing language.
95
+
The next page on [Turing's core functionality]({{<metacore-functionality>}}) explains the basic features of the Turing language.
96
96
From there, you can either look at [worked examples of how different models are implemented in Turing]({{<metatutorials-intro>}}), or [specific tips and tricks that can help you get the most out of Turing]({{<metausage-performance-tips>}}).
Now we extract the parameter samples from the sampled chain as `θ` (this is of size `5000 x 20` where `5000` is the number of iterations and `20` is the number of parameters).
The `sample()` call above assumes that you have at least `nchains` threads available in your Julia instance. If you do not, the multiple chains
136
-
will run sequentially, and you may notice a warning. For more information, see [the Turing documentation on sampling multiple chains.](https://turinglang.org/dev/docs/using-turing/guide/#sampling-multiple-chains)
135
+
The `sample()` call above assumes that you have at least two threads available in your Julia instance.
136
+
If you do not, the multiple chains will run sequentially, and you may notice a warning.
137
+
For more information, see [the Turing documentation on sampling multiple chains.]({{<metacore-functionality>}}#sampling-multiple-chains)
137
138
:::
138
139
139
140
```{julia}
@@ -159,12 +160,14 @@ We consider the samples of the location parameters $\mu_1$ and $\mu_2$ for the t
159
160
plot(chains[["μ[1]", "μ[2]"]]; legend=true)
160
161
```
161
162
162
-
It can happen that the modes of $\mu_1$ and $\mu_2$ switch between chains.
163
-
For more information see the [Stan documentation](https://mc-stan.org/users/documentation/case-studies/identifying_mixture_models.html). This is because it's possible for either model parameter $\mu_k$ to be assigned to either of the corresponding true means, and this assignment need not be consistent between chains.
163
+
From the plots above, we can see that the chains have converged to seemingly different values for the parameters $\mu_1$ and $\mu_2$.
164
+
However, these actually represent the same solution: it does not matter whether we assign $\mu_1$ to the first cluster and $\mu_2$ to the second, or vice versa, since the resulting sum is the same.
165
+
(In principle it is also possible for the parameters to swap places _within_ a single chain, although this does not happen in this example.)
166
+
For more information see the [Stan documentation](https://mc-stan.org/users/documentation/case-studies/identifying_mixture_models.html), or Bishop's book, where the concept of _identifiability_ is discussed.
164
167
165
-
That is, the posterior is fundamentally multimodal, and different chains can end up in different modes, complicating inference.
166
-
One solution here is to enforce an ordering on our $\mu$ vector, requiring $\mu_k > \mu_{k-1}$ for all $k$.
167
-
`Bijectors.jl`[provides](https://turinglang.org/Bijectors.jl/dev/transforms/#Bijectors.OrderedBijector)an easy transformation (`ordered()`) for this purpose:
168
+
Having $\mu_1$ and $\mu_2$ swap can complicate the interpretation of the results, especially when different chains converge to different assignments.
169
+
One solution here is to enforce an ordering on our $\mu$ vector, requiring $\mu_k \geq \mu_{k-1}$ for all $k$.
170
+
`Bijectors.jl`[provides](https://turinglang.org/Bijectors.jl/stable/transforms/#Bijectors.OrderedBijector)a convenient function, `ordered()`, which can be applied to a (continuous multivariate) distribution to enforce this:
168
171
169
172
```{julia}
170
173
using Bijectors: ordered
@@ -194,15 +197,13 @@ end
194
197
model = gaussian_mixture_model_ordered(x);
195
198
```
196
199
197
-
198
-
Now, re-running our model, we can see that the assigned means are consistent across chains:
200
+
Now, re-running our model, we can see that the assigned means are consistent between chains:
Where we sum the components with `logsumexp` from the [`LogExpFunctions.jl` package](https://juliastats.org/LogExpFunctions.jl/stable/).
281
283
The manually incremented likelihood can be added to the log-probability with `@addlogprob!`, giving us the following model:
@@ -300,27 +302,25 @@ using LogExpFunctions
300
302
end
301
303
```
302
304
303
-
::: {.callout-warning collapse="false"}
305
+
::: {.callout-warning}
304
306
## Manually Incrementing Probablity
305
307
306
-
When possible, use of `@addlogprob!` should be avoided, as it exists outside the
307
-
usual structure of a Turing model. In most cases, a custom distribution should be used instead.
308
+
When possible, use of `@addlogprob!` should be avoided, as it exists outside the usual structure of a Turing model.
309
+
In most cases, a custom distribution should be used instead.
308
310
309
-
Here, the next section demonstrates the preferred method --- using the `MixtureModel` distribution we have seen already to
310
-
perform the marginalization automatically.
311
+
The next section demonstrates the preferred method: using the `MixtureModel` distribution we have seen already to perform the marginalization automatically.
311
312
:::
312
313
314
+
### Marginalizing For Free With Distribution.jl's `MixtureModel` Implementation
313
315
314
-
### Marginalizing For Free With Distribution.jl's MixtureModel Implementation
315
-
316
-
We can use Turing's `~` syntax with anything that `Distributions.jl` provides `logpdf` and `rand` methods for. It turns out that the
317
-
`MixtureModel` distribution it provides has, as its `logpdf` method, `logpdf(MixtureModel([Component_Distributions], weight_vector), Y)`, where `Y` can be either a single observation or vector of observations.
316
+
We can use Turing's `~` syntax with anything that `Distributions.jl` provides `logpdf` and `rand` methods for.
317
+
It turns out that the `MixtureModel` distribution it provides has, as its `logpdf` method, `logpdf(MixtureModel([Component_Distributions], weight_vector), Y)`, where `Y` can be either a single observation or vector of observations.
318
318
319
319
In fact, `Distributions.jl` provides [many convenient constructors](https://juliastats.org/Distributions.jl/stable/mixture/) for mixture models, allowing further simplification in common special cases.
320
320
321
321
For example, when mixtures distributions are of the same type, one can write: `~ MixtureModel(Normal, [(μ1, σ1), (μ2, σ2)], w)`, or when the weight vector is known to allocate probability equally, it can be ommited.
322
322
323
-
The `logpdf` implementation for a `MixtureModel` distribution is exactly the marginalization defined above, and so our model becomes simply:
323
+
The `logpdf` implementation for a `MixtureModel` distribution is exactly the marginalization defined above, and so our model can be simplified to:
324
324
325
325
```{julia}
326
326
#| output: false
@@ -334,15 +334,14 @@ end
334
334
model = gmm_marginalized(x);
335
335
```
336
336
337
-
As we've summed out the discrete components, we can perform inference using `NUTS()` alone.
337
+
As we have summed out the discrete components, we can perform inference using `NUTS()` alone.
`NUTS()` significantly outperforms our compositional Gibbs sampler, in large part because our model is now Rao-Blackwellized thanks to
360
-
the marginalization of our assignment parameter.
358
+
`NUTS()` significantly outperforms our compositional Gibbs sampler, in large part because our model is now Rao-Blackwellized thanks to the marginalization of our assignment parameter.
361
359
362
360
```{julia}
363
361
plot(chains[["μ[1]", "μ[2]"]], legend=true)
364
362
```
365
363
366
-
## Inferred Assignments - Marginalized Model
367
-
As we've summed over possible assignments, the associated parameter is no longer available in our chain.
368
-
This is not a problem, however, as given any fixed sample $(\mu, w)$, the assignment probability — $p(z_i \mid y_i)$ — can be recovered using Bayes rule:
## Inferred Assignments With The Marginalized Model
365
+
366
+
As we have summed over possible assignments, the latent parameter representing the assignments is no longer available in our chain.
367
+
This is not a problem, however, as given any fixed sample $(\mu, w)$, the assignment probability $p(z_i \mid y_i)$ can be recovered using Bayes's theorme:
372
368
373
-
This quantity can be computed for every $p(z = z_i \mid y_i)$, resulting in a probability vector, which is then used to sample
374
-
posterior predictive assignments from a categorial distribution.
This quantity can be computed for every $p(z = z_i \mid y_i)$, resulting in a probability vector, which is then used to sample posterior predictive assignments from a categorial distribution.
375
372
For details on the mathematics here, see [the Stan documentation on latent discrete parameters](https://mc-stan.org/docs/stan-users-guide/latent-discrete.html).
Copy file name to clipboardExpand all lines: tutorials/multinomial-logistic-regression/index.qmd
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -147,7 +147,7 @@ chain
147
147
::: {.callout-warning collapse="true"}
148
148
## Sampling With Multiple Threads
149
149
The `sample()` call above assumes that you have at least `nchains` threads available in your Julia instance. If you do not, the multiple chains
150
-
will run sequentially, and you may notice a warning. For more information, see [the Turing documentation on sampling multiple chains.]({{<metausing-turing>}}#sampling-multiple-chains)
150
+
will run sequentially, and you may notice a warning. For more information, see [the Turing documentation on sampling multiple chains.]({{<metacore-functionality>}}#sampling-multiple-chains)
151
151
:::
152
152
153
153
Since we ran multiple chains, we may as well do a spot check to make sure each chain converges around similar points.
Copy file name to clipboardExpand all lines: tutorials/variational-inference/index.qmd
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@ Pkg.instantiate();
14
14
15
15
This post will look at **variational inference (VI)**, an optimization approach to _approximate_ Bayesian inference, and how to use it in Turing.jl as an alternative to other approaches such as MCMC.
16
16
This post will focus on the usage of VI in Turing rather than the principles and theory underlying VI.
17
-
If you are interested in understanding the mathematics you can checkout [our write-up]({{<metausing-turing-variational-inference>}}) or any other resource online (there are a lot of great ones).
17
+
If you are interested in understanding the mathematics you can checkout [our write-up]({{<metadev-variational-inference>}}) or any other resource online (there are a lot of great ones).
18
18
19
19
Let's start with a minimal example.
20
20
Consider a `Turing.Model`, which we denote as `model`.
0 commit comments