--- title: "User Guide" author: - name: 'Dan Knight' email: danknight@mednet.ucla.edu date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{User Guide} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r, echo=F, message=F} library(CancerEvolutionVisualization); ``` # Introduction CancerEvolutionVisualization (CEV) creates publication quality phylogenetic tree plots. For simple plots, this package will handle most settings right out of the box. However, more complex plots may require some trial and error to achieve the right arrangement of nodes and branches. This guide will show best practices for creating plots, as well as examples of common use cases and tips for refining plot settings. # Input Data There are many methods for determining subpopulations within genomic data, and you should be free to use whatever method you prefer for a given dataset. This package only handles visualization - not analysis. Therefore, data must be prepared and formatted before being passed to any CEV functions. ## Tree Dataframe This is the primary source of data for a plot. It defines the tree's structure of parent and child nodes. It also provides information about the number of mutations at each node. ### Ex. 1.1: Parent Data ```{r echo=F} load('data/simple.example.Rda'); load('data/complex.example.Rda'); ``` The simplest input format is a column containing the parent node of each individual node. (A node will only have one parent.) The root node will not have a parent, so a value of `NA` is used. ```{r echo=F} parent.only <- data.frame(simple.example$tree[, 'parent', drop = FALSE]); knitr::kable( parent.only, col.names = c(colnames(simple.example$tree)[1]), row.names = TRUE ); ``` ```{r, fig.show='hide'} parent.only.tree <- SRCGrob(parent.only); ``` ```{r echo=F} grid.draw(parent.only.tree); ``` ### Ex. 1.2: Branch Lengths It's common to associate branch lengths with a the values of a particular variable (for example, PGA or SNVs). Up to two branch lengths can be specified. Including a `length1` and/or `length2` column will enable this branch scaling behaviour, and automatically add y-axis ticks and labels. Multiple length values can be used together. All columns whose names contain `length` will be used. (For example, `length1` and `snv.length` are both valid.) Multiple columns will result in multiple (distinctly coloured) parallel lines. Any branch length conflicts will be resolved automatically. For each branch, the next node will be placed at the end of the longest line. ```{r echo=F} branch.lengths <- simple.example$tree[, 1:3]; knitr::kable( branch.lengths, row.names = TRUE ); ``` ```{r, fig.show='hide'} branch.lengths.tree <- SRCGrob(branch.lengths); ``` ```{r, fig.dim=c(5, 3.5)} grid.draw(branch.lengths.tree); ``` ### Ex. 1.3: Complex Trees CEV provides several methods for refining the spacing and arrangement of a tree's nodes. This is especially useful in complex trees, which often require more attention to avoid visual problems such as node collisions and uneven branch/level spacing. Here, we see a tree with many issues. Consider this example tree. ```{r complex-tree, echo=F, fig.height=3.5} complex.tree.input <- complex.example$tree; grid.draw(SRCGrob(complex.tree.input)); ``` #### Node Spread An optional `spread` column can be included in the input tree data.frame. Spread operates _relatively_ as a percentage of the initial angle calculation. - A `spread` value of 1 or `NA` will leave the spacing unchanged _unchanged_. - A `spread` value greater than 1 will _increase_ the space between nodes. For example, a `spread` value of 1.25 will spread the nodes 25% more. - A `spread` value less than 1 will _decrease_ the space between nodes. For example, a `spread` of 0.85 will spread the nodes 15% less. To create more space for the numerous nodes in the lower levels of our example tree, we can increase the spread of the nodes at the top level. To create even more room, we can _decrease_ the spread of some lower level nodes where appropriate. ```{r node-spread, echo=F} spread.tree.input <- complex.tree.input; spread.tree.input$spread <- NA; spread.tree.input$spread[2:5] <- c(1.25, 1.5, 2, 1.5); spread.tree.input$spread[c(6:7, 17:18, 19:20, 24:25)] <- 0.5; spread.tree.input$spread[c(8:10)] <- 0.75; knitr::kable(spread.tree.input[!is.na(spread.tree.input$spread) | is.na(spread.tree.input$parent), ]); ``` ```{r, echo=F, fig.height=3.5} grid.draw(SRCGrob(spread.tree.input)); ``` ### Ex. 1.4: Styling the Tree CEV gives the user control over numerous visual aspects of the tree. By specifying optional columns and values in the tree input data.frame, the user has individual control of the colour, width, and line type of each node, label border, and edge. #### Optional Style Columns | Style | Column | | --- | --- | | Node Colour | `node.col` | | Node Label Colour | `node.label.col` | | Node Border Colour | `border.col` | | Node Border Width | `border.width` | | Node Border Line Type | `border.type` | | | | | Edge Colour | `edge.col.1`, `edge.col.2` | | Edge Width | `edge.width.1`, `edge.width.2` | | Edge Line Type | `edge.type.1`, `edge.type.2` | Default values replace missing columns and `NA` values, allowing node-by-node, and edge-by-edge control as needed. For sparsely defined values (for example, only specifying a single edge), it can be convenient to initialize a column with `NA`s, then manually assign specific nodes as needed. #### Line Types Valid values for line type columns are based on lattice's values (with some additions and differences). | Line Type | | --- | | `NA` | | `'none'` | | `'solid'` | | `'dashed'` | | `'dotted'` | | `'dotdash'` | | `'longdash'` | | `'twodash'` | #### Styled Tree ```{r echo=F} node.style <- simple.example$tree[, c( 'parent', 'length1', 'length2', 'node.col', 'node.label.col', 'border.col', 'border.width', 'border.type', 'edge.col.1', 'edge.type.1', 'edge.col.2', 'edge.width.2' )]; knitr::kable( node.style[, !(colnames(node.style) %in% c('parent', 'length1', 'length2'))], row.names = TRUE ); ``` ```{r, fig.show='hide'} node.style.tree <- SRCGrob(node.style); ``` ```{r, fig.dim=c(5, 3.5), echo=F} grid.draw(node.style.tree); ``` ### Ex. 1.5: Showing Cellular Prevalence A `cellular.prevalence` column can also be added. These values must range between 0 and 1, and the sum of all child nodes must not be larger than their parent node's value. ```{r echo=F} CP <- simple.example$tree[, c('parent', 'length1', 'length2', 'CP')]; knitr::kable( CP, row.names = TRUE ); ``` ```{r, fig.show='hide'} CP.tree <- SRCGrob(CP); ``` ```{r, echo=F} grid.draw(CP.tree); ``` ### Ex. 1.6: Simplifying the Tree Complex trees may benefit from simpler visual styles. For example, there may not be room to render the node ellipses. CEV provides node-by-node control with the `draw.node` column. ```{r echo=F} nodeless <- data.frame( parent = c( NA, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5 ), draw.node = TRUE, spread = NA ); nodeless$spread[6:nrow(nodeless)] <- 0.6; nodeless$draw.node[c(2, 6:nrow(nodeless))] <- FALSE; knitr::kable( nodeless[, c('parent', 'draw.node')], row.names = TRUE ); ``` ```{r, fig.show='hide'} nodeless.tree <- SRCGrob(nodeless); ``` ```{r, echo=F} grid.draw(nodeless.tree); ``` ## Text Dataframe This secondary dataframe can be used to specify additional text corresponding to each node. ### Ex. 6: Node Text Each row must include a node ID for the text. Text will be stacked next to the specified node. ```{r echo=F} simple.text.data <- simple.example$text[, 1:2]; knitr::kable( simple.text.data, col.names = colnames(simple.example$text)[1:2] ); ``` ```{r, fig.show='hide'} simple.text.tree <- SRCGrob(parent.only, simple.text.data); ``` ```{r} grid.draw(simple.text.tree); ``` ### Ex. 7: Specifying Colour and Style - An optional `col` column can be included to specify the colour of each text. - A `fontface` column can be included to bold, italicize, etc. These values correspond to the standard R `fontface` values. - `NA` values in each column will default to `black` and `plain` respectively. ```{r echo=F} knitr::kable( simple.example$text ); ``` ```{r, fig.show='hide'} full.text.tree <- SRCGrob(parent.only, simple.example$text); ``` ```{r} grid.draw(full.text.tree); ``` # Plot Parameters The default settings should produce a reasonable baseline plot, but many users will want more control over their plot. This section will highlight some of the most common parameters in `SRCGRob`. ## Plot Size ### Ex. 8: Plot Width with Horizontal Padding Some plots require more or less horizontal padding between the x-axes and the tree itself. The `horizontal.padding` parameter scales the default padding proportionally. For example, `horizontal.padding = -0.2` would reduce the padding by 20%. ```{r, fig.show='hide'} padding.tree <- SRCGrob( branch.lengths, horizontal.padding = -0.8 ); ``` ```{r, fig.dim=c(3.5, 3.5)} grid.draw(padding.tree); ``` ### Ex. 9: Branch Scaling Branches are scaled automatically, but users can further scale each branch with the `scale1` and `scale2` parameters. These values scale each branch proportionally, so `scale1 = 1.1` would make the first set of branch lengths 10% longer. ```{r, fig.show='hide'} scaled.tree <- SRCGrob( branch.lengths, scale1 = 1.5, scale2 = 0.5 ); ``` ```{r, fig.height=4} grid.draw(scaled.tree); ``` ### Ex. 10: Plot Title The main title of the plot is referred to as `main` in plot parameters. `main` sets the title text, `main.cex` sets the font size, and `main.y` is used to move the main title up if more space is required for the plot. ```{r, fig.show='hide'} title.tree <- SRCGrob( parent.only, main = 'Example Plot' ); ``` ```{r, echo=F} title.tree$vp$y <- unit(0.8, 'npc'); ``` ```{r, fig.height=3.5} grid.draw(title.tree); ``` ## X-Axes A y-axis will be added automatically for each branch length column (the left-sided axis corresponding to the first branch length column, and the right with the second length column). ### Ex. 11: Y-Axis Ticks are placed automatically based on the plot size and the branch lengths. ### Ex. 12: Axis Title Axis titles are specified with the `yaxis1.label` and `yaxis2.label` parameters. ```{r, fig.show='hide'} axis.title.tree <- SRCGrob( parent.only, yaxis1.label = 'SNVs', horizontal.padding = -0.6 ); ``` ```{r, echo=F} axis.title.tree$vp$x <- unit(0.75, 'npc'); ``` ```{r} grid.draw(axis.title.tree); ``` ### Ex. 13: Axis Tick Placement The default axis tick positions can be overridden with the `yat` parameter. This expects a list of vectors, each corresponding to the ticks on an x-axis. ```{r, fig.show='hide'} xaxis1.ticks <- c(10, 20, 30, 35, 40); xaxis2.ticks <- c(100, 250, 400); yat.tree <- SRCGrob( branch.lengths, yat = list( xaxis1.ticks, xaxis2.ticks ), horizontal.padding = -0.4 ); ``` ```{r, fig.width=5} grid.draw(yat.tree); ``` ### Ex. 14: Normal ```{r, fig.show='hide'} normal.tree <- SRCGrob( parent.only, add.normal = TRUE ); ``` ```{r} grid.draw(normal.tree); ```