---
title: "Tree Annotation"
author: "Guangchuang Yu and Tommy Tsan-Yuk Lam\\

        School of Public Health, The University of Hong Kong"
date: "`r Sys.Date()`"
bibliography: ggtree.bib
biblio-style: apalike
output:
  prettydoc::html_pretty:
    toc: true
    theme: cayman
    highlight: github
  pdf_document:
    toc: true
vignette: >
  %\VignetteIndexEntry{04 Tree Annotation}
  %\VignetteEngine{knitr::rmarkdown}
  %\usepackage[utf8]{inputenc}
---

```{r style, echo=FALSE, results="asis", message=FALSE}
knitr::opts_chunk$set(tidy = FALSE,
		   message = FALSE)
```


```{r echo=FALSE, results="hide", message=FALSE}
library("ape")
library("ggplot2")
library("cowplot")
library("treeio")
library("ggtree")

CRANpkg <- function (pkg) {
    cran <- "https://CRAN.R-project.org/package"
    fmt <- "[%s](%s=%s)"
    sprintf(fmt, pkg, cran, pkg)
}

Biocpkg <- function (pkg) {
    sprintf("[%s](http://bioconductor.org/packages/%s)", pkg, pkg)
}

inset <- ggtree::inset
```



# Annotate clades

`r Biocpkg("ggtree")` [@yu_ggtree:_2017] implements _`geom_cladelabel`_ layer to annotate a selected clade with a bar indicating the clade with a corresponding label.

The _`geom_cladelabel`_ layer accepts a selected internal node number. To get the internal node number, please refer to [Tree Manipulation](treeManipulation.html#internal-node-number) vignette.


```{r}
set.seed(2015-12-21)
tree <- rtree(30)
p <- ggtree(tree) + xlim(NA, 6)

p + geom_cladelabel(node=45, label="test label") +
    geom_cladelabel(node=34, label="another clade")
```

Users can set the parameter, `align = TRUE`, to align the clade label, and use the parameter, `offset`, to adjust the position.

```{r}
p + geom_cladelabel(node=45, label="test label", align=TRUE, offset=.5) +
    geom_cladelabel(node=34, label="another clade", align=TRUE, offset=.5)
```

Users can change the color of the clade label via the parameter `color`.

```{r}
p + geom_cladelabel(node=45, label="test label", align=T, color='red') +
    geom_cladelabel(node=34, label="another clade", align=T, color='blue')
```

Users can change the `angle` of the clade label text and relative position from text to bar via the parameter `offset.text`.

```{r}
p + geom_cladelabel(node=45, label="test label", align=T, angle=270, hjust='center', offset.text=.5) +
    geom_cladelabel(node=34, label="another clade", align=T, angle=45)
```

The size of the bar and text can be changed via the parameters `barsize` and `fontsize` respectively.

```{r}
p + geom_cladelabel(node=45, label="test label", align=T, angle=270, hjust='center', offset.text=.5, barsize=1.5) +
    geom_cladelabel(node=34, label="another clade", align=T, angle=45, fontsize=8)
```

Users can also use `geom_label` to label the text.

```{r}
p + geom_cladelabel(node=34, label="another clade", align=T, geom='label', fill='lightblue')
```

## Annotate clades for unrooted tree

`r Biocpkg("ggtree")` provides `geom_clade2` for labeling clades of unrooted
layout trees.


```{r fig.wdith=7, fig.height=7, fig.align='center', warning=FALSE, message=FALSE}
pg <- ggtree(tree, layout="daylight")
pg + geom_cladelabel2(node=45, label="test label", angle=10) +
    geom_cladelabel2(node=34, label="another clade", angle=305)
```

# Labelling associated taxa (Monophyletic, Polyphyletic or Paraphyletic)

`geom_cladelabel` is designed for labelling Monophyletic (Clade) while there are related taxa that are not form a clade. `ggtree` provides `geom_strip` to add a strip/bar to indicate the association with optional label (see [the issue](https://github.com/GuangchuangYu/ggtree/issues/52)).

```{r fig.width=5, fig.height=5, fig.align="center", warning=FALSE}
nwk <- system.file("extdata", "sample.nwk", package="treeio")
tree <- read.tree(nwk)
ggtree(tree) + geom_tiplab() + 
  geom_strip(5, 7, barsize=2, color='red') + 
  geom_strip(6, 12, barsize=2, color='blue')
```


# Highlight clades

`ggtree` implements _`geom_hilight`_ layer, that accepts an internal node number and add a layer of rectangle to highlight the selected clade.

```{r fig.width=5, fig.height=5, fig.align="center", warning=FALSE}
ggtree(tree) + geom_hilight(node=21, fill="steelblue", alpha=.6) +
    geom_hilight(node=17, fill="darkgreen", alpha=.6)
```


```{r fig.width=5, fig.height=5, fig.align="center", warning=FALSE}
ggtree(tree, layout="circular") + geom_hilight(node=21, fill="steelblue", alpha=.6) +
    geom_hilight(node=23, fill="darkgreen", alpha=.6)
```

Another way to highlight selected clades is setting the clades with different colors and/or line types as demonstrated in [Tree Manipulation](treeManipulation.html#groupclade) vignette.

## Highlight balances

In addition to _`geom_hilight`_, `ggtree` also implements _`geom_balance`_
which is designed to highlight neighboring subclades of a given internal node.

```{r fig.width=4, fig.height=5, fig.align='center', warning=FALSE}
ggtree(tree) +
  geom_balance(node=16, fill='steelblue', color='white', alpha=0.6, extend=1) +
  geom_balance(node=19, fill='darkgreen', color='white', alpha=0.6, extend=1)
```

## Highlight clades for unrooted tree

`r Biocpkg("ggtree")` provides `geom_hilight_encircle` to support highlight
clades for unrooted layout trees.


```{r fig.width=5, fig.height=5, fig.align='center', warning=FALSE, message=FALSE}
pg + geom_hilight_encircle(node=45) + geom_hilight_encircle(node=34, fill='darkgreen')
```


# Taxa connection

Some evolutionary events (e.g. reassortment, horizontal gene transfer) can be modeled by a simple tree. `ggtree` provides `geom_taxalink` layer that allows drawing straight or curved lines between any of two nodes in the tree, allow it to represent evolutionary events by connecting taxa.

```{r fig.width=5, fig.height=5, fig.align="center", warning=FALSE}
ggtree(tree) + geom_tiplab() + geom_taxalink('A', 'E') + 
  geom_taxalink('F', 'K', color='red', arrow=grid::arrow(length=grid::unit(0.02, "npc")))
```


# Tree annotation with output from evolution software

The `r Biocpkg("treeio")` package implemented several parser functions to parse
output from commonly used software in evolutionary biology.

Here, we used [BEAST](http://beast2.org/) [@bouckaert_beast_2014] output as an
example. For details, please refer to the
[Importer](https://bioconductor.org/packages/devel/bioc/vignettes/treeio/inst/doc/Importer.html) vignette.


```{r warning=FALSE, fig.width=5, fig.height=5, fig.align='center'}
file <- system.file("extdata/BEAST", "beast_mcc.tree", package="treeio")
beast <- read.beast(file)
ggtree(beast, aes(color=rate))  +
    geom_range(range='length_0.95_HPD', color='red', alpha=.6, size=2) +
    geom_nodelab(aes(x=branch, label=round(posterior, 2)), vjust=-.5, size=3) +
    scale_color_continuous(low="darkgreen", high="red") +
    theme(legend.position=c(.1, .8))
```


# Tree annotation with user specified annotation

Integrating user data to annotate phylogenetic tree can be done at different
levels. The `r Biocpkg("treeio")` package implements `full_join` methods to
[combine tree data to phylogenetic tree object](https://bioconductor.org/packages/devel/bioc/vignettes/treeio/inst/doc/Importer.html).
The `r CRANpkg("tidytree")` package supports [linking tree data to phylogeny
using tidyverse verbs](https://cran.r-project.org/web/packages/tidytree/vignette/tiytree.html).
`r Biocpkg("ggtree")` supports mapping external data to phylogeny for
visualization  and annotation on the fly.


## The `%<+%` operator

Suppose we have the following data that associate with the tree and would like to attach the data in the tree.

```{r}
nwk <- system.file("extdata", "sample.nwk", package="treeio")
tree <- read.tree(nwk)
p <- ggtree(tree)

dd <- data.frame(taxa = LETTERS[1:13],
                 place = c(rep("GZ", 5), rep("HK", 3), rep("CZ", 4), NA),
                 value = round(abs(rnorm(13, mean=70, sd=10)), digits=1))
## you don't need to order the data
## data was reshuffled just for demonstration
dd <- dd[sample(1:13, 13), ]
row.names(dd) <- NULL
```

```{r eval=FALSE}
print(dd)
```

```{r echo=FALSE, results='asis'}
knitr::kable(dd)
```

We can imaging that the _place_ column stores the location that we isolated the
species and _value_ column stores numerical values (*e.g.* bootstrap values).

We have demonstrated using the operator, `%<%`, to update a tree view with a new
tree. Here, we will introduce another operator, `%<+%`, that attaches annotation
data to a tree view. The only requirement of the input data is that its first
column should be matched with the node/tip labels of the tree.

After attaching the annotation data to the tree by `%<+%`, all the columns in
the data are visible to `r Biocpkg("ggtree")`. As an example, here we attach the
above annotation data to the tree view, `p`, and add a layer that showing the
tip labels and colored them by the isolation site stored in _place_ column.

```{r fig.width=6, fig.height=5, warning=FALSE, fig.align="center"}
p <- p %<+% dd + geom_tiplab(aes(color=place)) +
       geom_tippoint(aes(size=value, shape=place, color=place), alpha=0.25)
p + theme(legend.position="right")
```

Once the data was attached, it is always attached. So that we can add other layers to display these information easily.

```{r fig.width=6, fig.height=5, warning=FALSE, fig.align="center"}
p + geom_text(aes(color=place, label=place), hjust=1, vjust=-0.4, size=3) +
    geom_text(aes(color=place, label=value), hjust=1, vjust=1.4, size=3)
```


# Visualize tree with associated matrix

<!--
At first we implemented `gplot` function to visualize tree with heatmap but it has [an issue](https://github.com/GuangchuangYu/ggtree/issues/3) that it can't always guarantee the heatmap aligning to the tree properly, since the line up is between two figures and it's currently not supported internally by ggplot2. I have implemented another function `gheatmap` that can do the line up properly by creating a new layer above the tree.
-->

The `gheatmap` function is designed to visualize phylogenetic tree with heatmap of associated matrix.

In the following example, we visualized a tree of H3 influenza viruses with their associated genotype.

```{r fig.width=8, fig.height=6, fig.align="center", warning=FALSE, message=FALSE}
beast_file <- system.file("examples/MCC_FluA_H3.tree", package="ggtree")
beast_tree <- read.beast(beast_file)

genotype_file <- system.file("examples/Genotype.txt", package="ggtree")
genotype <- read.table(genotype_file, sep="\t", stringsAsFactor=F)
colnames(genotype) <- sub("\\.$", "", colnames(genotype))
p <- ggtree(beast_tree, mrsd="2013-01-01") + geom_treescale(x=2008, y=1, offset=2)
p <- p + geom_tiplab(size=2)
gheatmap(p, genotype, offset=5, width=0.5, font.size=3, colnames_angle=-45, hjust=0) +
    scale_fill_manual(breaks=c("HuH3N2", "pdm", "trig"), values=c("steelblue", "firebrick", "darkgreen"))
```

The _width_ parameter is to control the width of the heatmap. It supports another parameter _offset_ for controlling the distance between the tree and the heatmap, for instance to allocate space for tip labels.


For time-scaled tree, as in this example, it's more often to use `x` axis by using `theme_tree2`. But with this solution, the heatmap is just another layer and will change the `x` axis. To overcome this issue, we implemented `scale_x_ggtree` to set the x axis more reasonable.

<!-- User can also use `gplot` and tweak the positions of two plot to align properly. -->



```{r fig.width=8, fig.height=6, fig.align="center", warning=FALSE}
p <- ggtree(beast_tree, mrsd="2013-01-01") + geom_tiplab(size=2, align=TRUE, linesize=.5) + theme_tree2()
pp <- (p + scale_y_continuous(expand=c(0, 0.3))) %>%
    gheatmap(genotype, offset=8, width=0.6, colnames=FALSE) %>%
        scale_x_ggtree()
pp + theme(legend.position="right")
```


# Visualize tree with multiple sequence alignment

With `msaplot` function, user can visualize multiple sequence alignment with phylogenetic tree, as demonstrated below:
```{r fig.width=8, fig.height=6, fig.align='center', warning=FALSE}
fasta <- system.file("examples/FluA_H3_AA.fas", package="ggtree")
msaplot(ggtree(beast_tree), fasta)
```

A specific slice of the alignment can also be displayed by specific _window_ parameter.

```{r fig.width=7, fig.height=7, fig.align='center', warning=FALSE}
msaplot(ggtree(beast_tree), fasta, window=c(150, 200)) + coord_polar(theta='y')
```


# Plot tree with associated data

For associating phylogenetic tree with different type of plot produced by user's data, `ggtree` provides `facet_plot` function which accepts an input `data.frame` and a `geom` function to draw the input data. The data will be displayed in an additional panel of the plot.

```{r warning=F, fig.width=10, fig.height=6}
tr <- rtree(30)

d1 <- data.frame(id=tr$tip.label, val=rnorm(30, sd=3))
p <- ggtree(tr)

p2 <- facet_plot(p, panel="dot", data=d1, geom=geom_point, aes(x=val), color='firebrick')
d2 <- data.frame(id=tr$tip.label, value=abs(rnorm(30, mean=100, sd=50)))

facet_plot(p2, panel='bar', data=d2, geom=geom_segment, aes(x=0, xend=value, y=y, yend=y), size=3, color='steelblue') + theme_tree2()
```


# Plot tree with images and suplots

Please refer to the following vignettes:

+ [Annotating phylogenetic tree with images](https://guangchuangyu.github.io/software/ggtree/vignettes/ggtree-ggimage.html)
+ [Annotate a phylogenetic tree with insets](https://guangchuangyu.github.io/software/ggtree/vignettes/ggtree-inset.html)



# References