Graphical Tutorial

A Walkthrough

Although it has a graphical interface, this version of AB12PHYLO is also commonly started from a commandline; by invoking ab12phylo. On its initial run, it will download a test dataset, try to create a desktop entry on common Linux distributions and check the system for three important non-python tools: RAxML-NG, IQ-Tree 2 and BLAST+. If they are not installed or outdated, AB12PHYLO can download the latest static binaries from GitHub or the NCBI.

ab12phylo separates the phylogenetic pipeline into eight sequential steps, each on its own page. In the screenshot below, these are listed in the column on the left, below and beginning with read data. Go to the previous/next page by clicking Back/Next on the lower right, or press ALT+Left/ALT+Right. Keep in mind the pipeline is sequential, therefore e.g. when a tree has already been constructed, changing the input dataset will of course require the intermediate steps to be run again.
In the menu on the upper right, you will find the usual options to create a new project, open, save or save as with a different name. You can also import the results of an ab12phylo-cmd analysis to make use of the easier tree modifications in the graphical version. Menu items are bound to common hotkeys like CTRL+N, CTRL+O, CTRL+S, CTRL+SHIFT+S and CTRL+I, and there are also CTRL+H and CTRL+Q for help and quit.

1 Define a Dataset

Many elements of the interface have tooltips to help you. You can see one below, telling you that selecting trace data by file extension and parent directory is recursive.

Wellsplates

If you have wellsplate mappings, they have to be the same for all sequenced loci and must be in .csv format like this: wellsplates/box_2.csv

CS322,CS313,CS084,...
CS079,CS327,CS042,...
...

Reference taxa

In contrast to ab12phylo-cmd, the graphical ab12phylo accepts multiple files with reference data per gene. A complete set of sequences for a reference taxon is built from the set of per-gene sequences by matching descriptions as described here.

2 Filename Parsing

All the columns in the table on the right must be extracted from the filename with regular expressions and their capturing groups. If you are using a single RegEx, the number of capturing groups must match the expected number of values to extract.

As a help, Try online will re-direct you to a page at regex101.com with your current filenames and expressions already filled in.

If you have them, the expression to match reverse reads must only actually match those; meaning if you have F and R labels for forward and reverse reads, write an expression that only matches the reads labeled R.

As you can see below, using named capturing groups is ok. Also check if any unexpected genes appear beside the table.

3 ABI trace quality control

When you proceed to this page, input data is read into memory. ab12phylo will warn you about missing data and highlight any problems in the column of IDs on the left, with an explanation of the text markup in the tooltip.

Find the details on ABI trace trimming here, but the effect should already be apparent by comparing the screenshots above and below.

You can zoom in/out by holding CTRL and scrolling, or pressing CTRL+/CTRL-. Clicking the plot will still select the correct samples, but the ID will probably not be right beside the line anymore.
You can change the colors at Menu > Preferences > Color Scheme.
Clicking Refresh or pressing F5 will generally re-plot a visualization, and Reset will re-include samples you manually excluded before.
The preview can be limited to a single gene by selecting it from the drop-down menu visible below, but equal trimming will always be applied to all genes.

4 Build single-gene MSAs

As this stage already might take a little time, this is a good point to save your project! -> CTRL+S.
Multiple Sequence Alignments (MSAs) can be constructed locally or via an EMBL-EBI online service. You can customise the command with the in-line help for the respective tool shown in the window on the right.

5 Trim MSAs

The Gblocks 0.91b presets defined for ab12phylo-cmd are also available in graphical ab12phylo: Skip, relaxed, balanced or strict trimming, as well as the Gblocks default. Each single-gene MSA is trimmed separately, but the preview shows the concatenated multi-gene MSA.

6 BLAST

ab12phylo has an interface to the public NCBI BLAST API, and can download pre-compiled databases (via the update_blastdb.pl script shipped in BLAST+) if FTP is enabled on the system. If BLAST+ is used this way, it will run de-synced from the rest of the pipeline, meaning you can proceed while BLAST is running.

However, with most databases being very large, and remote API calls being de-prioritized very quickly, a manual web BLAST is usually a lot faster:

Download the results as an .XML ...

... and import them to ab12phylo. Make sure to select the correct gene first!

7 ML tree re-construction

ab12phylo infers a maximum-likelihood phylogenetic tree via RAxML-NG or IQ-Tree 2, with only the latter one available for Windows. Depending on the ML tool selected, the windows on the right show the commands that will be run, and the in-line help below. If you modify these commands, please make sure they are valid and that their number stays the same.

You can limit the number of CPUs/hardware threads use for raxml-ng or iqtree2 as shown below, where only five of the available 8 cores are at high load as intended.

For datasets larger than what your desktop computer or laptop may be able to comfortably handle or to shorten runtimes, ab12phylo can export a .ZIP with the MSA, the selected tool and a shell script to run the computationally intensive tree inference on a Linux cluster.

IQ-Tree 2 can run 'ultrafast' bootstrapping and also find the best-fitting evolutionary model. Finding the best model with iqtree2 is a separate run within ab12phylo, but will be part of the usual 4-step tree inference task if you export a .ZIP.

8 Tree plotting and population genetics

This last page shows the resulting tree.
ab12phylo allows you to select taxa by matching query keywords against the sample IDs or species annotations. Selections can be extended to subtrees by clicking on subtree or with Select clade in the right-click context menu.

The context menu has a number of other tree modifications:

If you click calculate or Calculate diversity, ab12phylo will compute basic population genetics statistics for the selection of taxa. As these statistics do not work for gap - or unknown N characters, ab12phylo has two numerical thresholds to regulate when -/N at an MSA site are either replaced with the most frequent nucleotide at this position, or the site is discarded entirely. All but acceptable positions for selected taxa are then shown as transparent, as can be seen in the top right of the screenshot below. Taxa with identical sequences are then highlighted in the same color, as indicated by the brown, gray and pink shading below.

Each highlighting color then represents a haplotype, and ab12phylo can collapse these haplotypes into single nodes. Manual collapsing and renaming of taxa by editing the ID column in the middle are also possible.

There are various plot settings in the fold-out menu on the bottom right, including picking specific genes for species annotation instead of the best-scoring BLAST hit, or setting the threshold at which bootstrapping support value blobs flip from red for low to blue for acceptable support.

As ab12phylo-cmd, the graphical ab12phylo can generate plots in several file formats (.PNG, .SVG, .PDF), and by default writes an msa_annotated.fasta, as well as a modified_tree.nwk if applicable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Graphical Tutorial

A Walkthrough

1 Define a Dataset

Wellsplates

Reference taxa

2 Filename Parsing

3 ABI trace quality control

4 Build single-gene MSAs

5 Trim MSAs

6 BLAST

7 ML tree re-construction

8 Tree plotting and population genetics

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally