--- title: 'CEV User Guide' author: 'Dan Knight, Helena Winata' date: "`r Sys.Date()`" output: html_document: toc: true number_sections: false vignette: > %\VignetteIndexEntry{User Guide} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r, echo=F, message=F} library(CancerEvolutionVisualization); knitr::opts_chunk$set( warning = FALSE, message = FALSE, fig.align = 'center', fig.height = 3.5, collapse = TRUE ); # rmarkdown::render('UserGuide.Rmd'); ``` # Introduction CancerEvolutionVisualization (CEV) creates customizable, publication quality plots for representing tumour evolution data. This guide will focus on phylogentic tree visulaization using CEV. For simple plots, this package will handle most settings right out of the box. However, more complex plots may require some trial and error to achieve the right arrangement of nodes and branches. This guide will show best practices for creating plots, as well as examples of common use cases and tips for refining plot settings. ## Installation ### CRAN (recommended) To install the latest version from CRAN, run the following command in R: ```{r eval=FALSE} install.packages('CancerEvolutionVisualization', dependencies = TRUE); library(CancerEvolutionVisualization); ``` ### GitHub To install the main branch version from GitHub, run the following command in R: ```{r eval=FALSE} devtools::install_github('uclahs-cds/public-R-CancerEvolutionVisualization', ref = 'main'); library(CancerEvolutionVisualization); ``` To install from a specific branch, replace `main` with the branch name. # Basic Phylogenetic Tree Visualization ## Input Phylogenetic Data There are many methods for determining subpopulations within genomic data, and you should be free to use whatever method you prefer for a given dataset. This package only handles visualization - not analysis. Therefore, data must be prepared and formatted before being passed to any CEV functions. The input for phylogenetic tree visualization is a data frame where each row defines a parent-child relationship between 2 subclones. To load the data required for this user guide, run the following code: ```{r load-data} load('data/simple.example.Rda'); load('data/complex.example.Rda'); ``` The `simple.example` contains an example of a simple tree with 4 nodes while the `complex.example` contains a more complex tree with 25 nodes. Both data frames contain a `tree` data frame and the `simple.example` also contains a `text` dataframe. The `tree` component contains the tree data, while the `text` component contains the text annotations. ### Simple Example The `simple.example` tree data frame contains informationof the tree structure as well as aesthetic node-by-node customization settings (colours, edge type, etc.). The `text` data frame contains text annotations for each node. ```{r simple-example, echo=F} knitr::kable(simple.example$tree, caption = 'Simple Example Tree'); knitr::kable(simple.example$text, caption = 'Simple Example Text', format = 'html', table.attr = 'style="width: 40%;"'); ``` ## Ex. 1.1: Minimal Tree The simplest input format is a column containing the parent node of each individual node. By default, the row index is assigned as `node.id`. Each node is restricted to one parent. The root node will not have a parent, so a value of `NA` is used. To plot the tree, we can use the `SRCGrob` function. This function will return a `grob` object that can be passed to `grid.draw` to render the plot.Alternatively, we provided a wrapper function `create.phylogenetic.tree` that will automatically render the plot or save the plot into a TIFF, PNG, PDF or SVG file. ```{r} parent.only <- data.frame(simple.example$tree[, 'parent', drop = FALSE]); parent.only.tree <- SRCGrob(parent.only); grid.draw(parent.only.tree); ``` ## Ex. 1.2: Using `node.id`, `parent` and `label` columns With the minimal input, the tree will be rendered with numeric node labels, corresponding to the row index (default `node.id`). A `node.id` column can be included in the input data frame if the IDs reported in the `parent` column does not correspong to row indexes. ```{r} node.id <- data.frame( node.id = as.character(c(2, 5, 6, 1)), parent = as.character(c(NA, 2, 2, 5)) ); node.id.tree <- create.phylogenetic.tree(node.id); ``` By default the `node.id` will be used to label the nodes. To customize node labels, a `label` column can be included in the input data frame to override the `node.id` values. ```{r} node.id$label <- c('A', 'B', 'C', 'D'); node.id.tree <- create.phylogenetic.tree(node.id); ``` ## Ex. 1.3: Branch Lengths It's common to associate branch lengths with a the values of a particular variable (for example, PGA or SNVs). Up to two branch lengths can be specified. Including a `length.1` and/or `length.2` column in the tree dataframe will enable this branch scaling behaviour, and automatically adding a corresponding y-axis. Specifying multiple length columns will result in multiple (distinctly coloured) parallel lines. For each branch, the next node will be placed at the end of the longest line. ```{r} branch.lengths <- simple.example$tree[, c('parent', 'length.1', 'length.2')]; branch.lengths.tree <- create.phylogenetic.tree(branch.lengths); ``` ## Ex. 1.4: Branch Scaling Branches are scaled automatically, but users can further scale each branch with the `scale1` and `scale2` parameters. These values scale each branch proportionally, so `scale1 = 1.5` would make the first set of branch lengths 50% longer. ```{r} scaled.tree <- create.phylogenetic.tree( branch.lengths, scale1 = 1.5, scale2 = 0.5 ); ``` ## Ex. 1.5: Y-Axis Labels The y-axis are automatically generated and lengths of different sizes are scaled to fit the plot. The y-axis labels can be customized by specifying the `yaxis1.label` and `yaxis2.label` parameters. ```{r, fig.width=10} yaxis.tree <- create.phylogenetic.tree( tree = branch.lengths, yaxis1.label = 'PGA (%)', yaxis2.label = 'Number of SNVs' ); ``` ## Ex. 1.6: Axis Tick Placement The default axis tick positions can be overridden with the `yat` parameter. This expects a list of vectors, each corresponding to the ticks on the y-axis. ```{r} yaxis1.ticks <- c(10, 20, 30, 35, 40); yaxis2.ticks <- c(100, 250, 400); yat.tree <- create.phylogenetic.tree( branch.lengths, yat = list( yaxis1.ticks, yaxis2.ticks ) ); ``` ## Ex. 1.7: Scale Bars Alternatively, the y-axis can be replaced with a scale bar. The `scale.bar = TRUE` parameter will add a scale bar to the plot, replacing the y-axis. The scale bar will be placed at the top of the plot, and the y-axis will be removed. To further customize the scale bar postion and size, users can use the following parameters: - `scale.bar.coords` specifies the relative x and y coordinates of the scale bar. Both values should range from 0 to 1. - `scale.size.{1,2}` specifies the size of the scale bar if the default is unsatisfactory. - `scale.padding` specifies the padding between the scale bars if multiple scale bars are present. ```{r} scalebar.tree <- create.phylogenetic.tree( tree = branch.lengths, yaxis1.label = 'PGA (%)', yaxis2.label = 'Number of SNVs', scale.bar = TRUE, scale.bar.coords = c(0, 0.6), scale.size.2 = 1000, scale.padding = 4 ); ``` ## Ex. 1.8: Visualizing Cellular Prevalence A `CP` column containing the cellular prevalence or cancer cell fraction (CCF) of each subclone can be added to the input tree dataframe. These values typically range between 0 and 1, and the sum of all child nodes must not be larger than their parent node's value. Whether you are using 'CCF, 'CP' o@Opeioc10!2022 r any other metric, make sure the x-axis label matches the metric used. ```{r, fig.height=4} CP <- simple.example$tree[, c('parent', 'length.1', 'length.2', 'CP')]; CP.default.tree <- create.phylogenetic.tree( CP, xaxis.label = 'CCF' ); ``` To control the overall scale of the polygons, users can modify the `polygon.scale` parameters. The `polygon.colour.scheme` parameter can be used to specify a colour palette for the polygons. When a single colour is provided, a light-to-dark gradient will be generated based on the given colour. If multiple colours are provided, the gradient will transition between the given colours. An optional `polygon.col` column can be included in the tree input to override the polygon colour scheme. ```{r, include=F} steelblue.CP.tree <- create.phylogenetic.tree( CP, polygon.colour.scheme = 'steelblue', horizontal.padding = -0.9, main = 'Single Color Gradient' ); steelblue.purple.CP.tree <- create.phylogenetic.tree( CP, polygon.colour.scheme = c('steelblue', 'purple'), horizontal.padding = -0.9, main = 'Multicolor Gradient' ); override.CP <- simple.example$tree[, c('parent', 'length.1', 'length.2', 'CP', 'polygon.col')]; override.CP.tree <- create.phylogenetic.tree( override.CP, main = 'Manual Override', horizontal.padding = -0.9, ); ``` ```{r, echo=F, fig.width=12} BoutrosLab.plotting.general::create.multipanelplot( plot.objects = list( steelblue.CP.tree, steelblue.purple.CP.tree, override.CP.tree ), layout.height = 1, layout.width = 3, x.spacing = 10, resolution = 300 ); ``` Polygon transparency can be specified in the tree input dataframe using the `polygon.alpha` # Customizing Node Arrangement {#node-spacing} CEV provides several methods for refining the spacing and arrangement of a tree's nodes. This is especially useful in complex trees, which often require more attention to avoid visual problems such as node collisions and uneven branch/level spacing. Here, we see a tree with many issues. Consider this example tree. ```{r complex-tree, echo=F, fig.height=3.5} complex.tree.input <- complex.example$tree; complex.tree <- create.phylogenetic.tree(complex.tree.input); ``` ## Ex. 2.1: Node Spread An optional `spread` column can be included in the input tree data.frame. Spread operates _relatively_ as a percentage of the initial angle calculation. - A `spread` value of 1 or `NA` will leave the spacing _unchanged_. - A `spread` value greater than 1 will _increase_ the space between nodes. For example, a `spread` value of 1.25 will spread the nodes 25% more. - A `spread` value less than 1 will _decrease_ the space between nodes. For example, a `spread` of 0.85 will spread the nodes 15% less. To create more space for the numerous nodes in the lower levels of our example tree, we can increase the spread of the nodes at the top level. To create even more room, we can _decrease_ the spread of some lower level nodes where appropriate. ```{r node-spread} spread.tree.input <- complex.tree.input; spread.tree.input$spread <- 1; spread.tree.input$spread[2:5] <- c(2, 2, 3, 2); spread.tree.input$spread[c(6:7, 17:18, 24:25)] <- 0.5; spread.tree.input$spread[c(8:10, 18:21)] <- 0.75; spread.tree.input$spread[c(11:16)] <- 1.75; spread.tree <- create.phylogenetic.tree(spread.tree.input); ``` ## Ex. 2.2: Node Angles Alternative, an `angle` column can be specified to manually set the angle of each node. Angels are specified in degrees, where 0 points opposite from the parent edge. Angles can be provided in radians when `use.radians = TRUE`. When `angle` and `spread` are both specified, `angle` will take precedence. ```{r node-angle} angle.input <- complex.tree.input; angle.input$angle <- NA; angle.input$angle[2:5] <- c(-80, -20, 30, 85); angle.tree <- create.phylogenetic.tree(angle.input); ``` # Phylogenetic Tree Modes CEV currently supports two modes for visualizing phylogenetic trees: radial and dendrogram. The default mode is radial, but users can switch to dendrogram mode by setting the `mode` column to `dendrogram`. ## Ex. 3.1: Radial Mode This mode spreads nodes out radially from the root node. Examples for plotting and customizing radial trees have been shown in the previous [sections]{#node-spacing}. ## Ex. 3.2: Dendrogram Mode This mode is useful for trees with many nodes, as it avoids node collisions and can be easier to read. ```{r, fig.width = 13} dendrogram.input <- complex.tree.input; dendrogram.input$mode <- 'dendrogram'; dendrogram.tree <- create.phylogenetic.tree(dendrogram.input); ``` # Customizing Phylogenetic Tree Aesthetics CEV gives the user control over numerous visual aspects of the tree. By specifying optional columns and values in the tree input `data.frame`, the user has individual control of the colour, width, and line type of each node, label border, and edge. ## Supported Aesthetic Input Columns