Skip to content

Minor revision to the description of morphological models #16

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 2 additions & 4 deletions doc/Command-Reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: userdoc
title: "Command Reference"
author: Hector Banos, Diep Thi Hoang, Dominik Schrempf, Heiko Schmidt, Jana Trifinopoulos, Minh Bui, Thomas Wong, Nhan Ly-Trong, Hiroaki Sato
date: 2025-05-30
date: 2025-06-05
docid: 19
icon: book
doctype: manual
Expand Down Expand Up @@ -295,7 +295,7 @@ The following `MODEL`s are available:
| Protein | Mixture models: C10, ..., C60 (CAT model) ([Lartillot and Philippe, 2004]), EX2, EX3, EHO, UL2, UL3, EX_EHO, LG4M, LG4X, CF4. See [Protein models](Substitution-Models#protein-models) for more details. |
| Codon | MG, MGK, MG1KTS, MG1KTV, MG2K, GY, GY1KTS, GY1KTV, GY2K, ECMK07/KOSI07, ECMrest, ECMS05/SCHN05 and combined empirical-mechanistic models. See [Codon models](Substitution-Models#codon-models) for more details. |
| Binary | JC2, GTR2. See [Binary and morphological models](Substitution-Models#binary-and-morphological-models) for more details. |
| Morphology | MK, (GTRX), ORDERED. WARNING: GTRX (which can also be invoked as GTR) can only be applied to data with non-arbitrary state labels (e.g., recoded amino acids [for practical application, see [Najle et al., 2023]; [xgrau/recoded-mixture-models]] and certain types of genomic information) and should not be used for general morphological characters (transformational morphological characters; for the term, see [Sereno, 2007]). See [Binary and morphological models](Substitution-Models#binary-and-morphological-models) for more details. |
| Morphology | MK, (GTRX), ORDERED. WARNING: GTRX (which can also be invoked as GTR) should be only applied to data with non-arbitrary state labels and should not be used for general morphological characters (most transformational morphological characters; for the term, see [Sereno, 2007]). See [Binary and morphological models](Substitution-Models#binary-and-morphological-models) for more details. |

The following `FreqType`s are supported:

Expand Down Expand Up @@ -802,7 +802,5 @@ The first few lines of the output file example.phy.sitelh (printed by `-wslr` op
[Strimmer and von Haeseler, 1997]: http://www.pnas.org/content/94/13/6815.long
[Yang, 1994]: https://doi.org/10.1007/BF00160154
[Yang, 1995]: http://www.genetics.org/content/139/2/993.abstract
[Najle et al., 2023]: https://doi.org/10.1016/j.cell.2023.08.027
[xgrau/recoded-mixture-models]: https://github.com/xgrau/recoded-mixture-models
[Sereno, 2007]: https://doi.org/10.1111/j.1096-0031.2007.00161.x

8 changes: 4 additions & 4 deletions doc/Substitution-Models.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: userdoc
title: "Substitution Models"
author: Hector Banos, Cuong Cao Dang, Heiko Schmidt, Jana Trifinopoulos, Minh Bui, Nhan Ly-Trong, Hiroaki Sato
date: 2024-05-30
date: 2025-06-05
docid: 10
icon: book
doctype: manual
Expand Down Expand Up @@ -370,15 +370,15 @@ The binary alignments should contain state `0` and `1`, whereas for morphologica

Except for `GTR2` that has unequal state frequencies, all other models have equal state frequencies. Users can change how state frequencies are modeled in morphological models by appending `+FQ`, `+F`, `+F{...}`, or `+FO`.

> **WARNING**: Models with unequal rates and/or frequencies (e.g., `GTR2+FO`, `MK+FO`, `GTRX+FQ`, `GTRX+FO`) should not be applied to general morphological characters (transformational morphological characters; for the term, see [Sereno, 2007]) as their state labels are fundamentally arbitrary. These models are for data with non-arbitrary state labels (e.g., recoded amino acids [for practical application, see [Najle et al., 2023]; [xgrau/recoded-mixture-models]] and certain types of genomic information). For morphological data, it is the common practice to apply the `MK+FQ+ASC` model (or for ordered [additive] characters `ORDERED+FQ+ASC`) (for `+ASC`, see below) with or without rate heterogeneity across characters parameters.
> **WARNING**: Models with unequal rates and/or frequencies (e.g., `GTR2+FO`, `MK+FO`, `GTRX+FQ`, `GTRX+FO`) should not be applied to general morphological characters (most transformational morphological characters; for the term, see [Sereno, 2007]) as their state labels are fundamentally arbitrary. These models are for data with non-arbitrary state labels (e.g., recoded amino acids [for practical application, see [Najle et al., 2023]; [xgrau/recoded-mixture-models]] and certain types of genomic information). For morphological data, it is the common practice to apply the `MK+FQ+ASC` model (or for ordered [additive] characters `ORDERED+FQ+ASC`) (for `+ASC`, see below) with or without rate heterogeneity across characters parameters.

> **WARNING**: If you use `GTRX` for your multistate data, because of its sometimes very great number of free parameters, please make sure your data are sufficiently large and always test for model fit.


> **TIP**: Recent studies have indicated that applying a single morphological model to morphological data with heterogeneity of state space among characters may not be appropriate ([Khakurel et al., 2024]; [Mulvey et al., 2025]; [Huang, 2025 preprint]), and users may need to partition data by the number of states in each character before analyzing them in IQ-TREE. For information on how to analyze partitioned morphological data in IQ-TREE and some caveats about it, please refer to [davidcerny/GEOS26100-Fall2022], https://davidcerny.github.io/post/teaching_revbayes/, [Černý & Simonoff (2023)], and [ej91016/MorphoParse].
> **TIP**: Recent studies have indicated that applying a single morphological model to morphological data with heterogeneity of state space among characters may not be appropriate ([Khakurel et al., 2024]; [Mulvey et al., 2025]; [Huang, 2025 preprint]), and users may need to partition data by the number of states in each character before analyzing them in IQ-TREE. For information on how to analyze partitioned morphological data in IQ-TREE and some caveats about it, please refer to [davidcerny/GEOS26100-Fall2022], <https://davidcerny.github.io/post/teaching_revbayes/>, [Černý & Simonoff (2023)], and [ej91016/MorphoParse].
{: .tip}

> **TIP**: For binary morphological characters where `0`s represent ancestral conditions and `1`s represent derived conditions, mainly neomorphic (`absent`/`present`) morphological characters (for the term, see [Sereno, 2007]), allowing asymmetrical frequencies in models would make sense (see e.g. [Pyron, 2017]; [Sun et al., 2018]; https://ms609.github.io/hyoliths/bayesian.html). This can be achieved in IQ-TREE, for example, by using the `GTR2` model.
> **TIP**: For binary morphological characters where `0`s represent ancestral conditions and `1`s represent derived conditions, mainly neomorphic (`absent`/`present`) morphological characters (for the term, see [Sereno, 2007]), allowing asymmetrical frequencies in models would make sense (see e.g. [Pyron, 2017]; [Sun et al., 2018]; <https://ms609.github.io/hyoliths/bayesian.html>). This can be achieved in IQ-TREE, for example, by using the `GTR2` model.
{: .tip}

>**TIP**: If morphological alignments do not contain constant sites (typically the case), then [an ascertainment bias correction model (`+ASC`)](#ascertainment-bias-correction) should be applied to correct the branch lengths for the absence of constant sites.
Expand Down