User Guide

Introduction

CancerEvolutionVisualization (CEV) creates publication quality phylogenetic tree plots. For simple plots, this package will handle most settings right out of the box. However, more complex plots may require some trial and error to achieve the right arrangement of nodes and branches.

This guide will show best practices for creating plots, as well as examples of common use cases and tips for refining plot settings.

Input Data

There are many methods for determining subpopulations within genomic data, and you should be free to use whatever method you prefer for a given dataset. This package only handles visualization - not analysis. Therefore, data must be prepared and formatted before being passed to any CEV functions.

Tree Dataframe

This is the primary source of data for a plot. It defines the tree’s structure of parent and child nodes. It also provides information about the number of mutations at each node.

Ex. 1.1: Parent Data

The simplest input format is a column containing the parent node of each individual node. (A node will only have one parent.) The root node will not have a parent, so a value of NA is used.

parent
1 NA
2 1
3 1
4 2
parent.only.tree <- SRCGrob(parent.only);

Ex. 1.2: Branch Lengths

It’s common to associate branch lengths with a the values of a particular variable (for example, PGA or SNVs). Up to two branch lengths can be specified. Including a length1 and/or length2 column will enable this branch scaling behaviour, and automatically add y-axis ticks and labels.

Multiple length values can be used together. All columns whose names contain length will be used. (For example, length1 and snv.length are both valid.) Multiple columns will result in multiple (distinctly coloured) parallel lines. Any branch length conflicts will be resolved automatically. For each branch, the next node will be placed at the end of the longest line.

parent length1 length2
1 NA 12 850
2 1 10 1000
3 1 15 1100
4 2 10 760
branch.lengths.tree <- SRCGrob(branch.lengths);
## Only the first 2 "length" columns will be used. More branch lengths will be supported in a future version.
## Warning in add.axes(clone.out, yaxis.position, scale1 = scale1, scale2 =
## scale2, : Missing second y-axis label
grid.draw(branch.lengths.tree);

Ex. 1.3: Complex Trees

CEV provides several methods for refining the spacing and arrangement of a tree’s nodes. This is especially useful in complex trees, which often require more attention to avoid visual problems such as node collisions and uneven branch/level spacing. Here, we see a tree with many issues.

Consider this example tree.

Node Spread

An optional spread column can be included in the input tree data.frame. Spread operates relatively as a percentage of the initial angle calculation.

  • A spread value of 1 or NA will leave the spacing unchanged unchanged.
  • A spread value greater than 1 will increase the space between nodes. For example, a spread value of 1.25 will spread the nodes 25% more.
  • A spread value less than 1 will decrease the space between nodes. For example, a spread of 0.85 will spread the nodes 15% less.

To create more space for the numerous nodes in the lower levels of our example tree, we can increase the spread of the nodes at the top level. To create even more room, we can decrease the spread of some lower level nodes where appropriate.

parent spread
1 NA NA
2 1 1.25
3 1 1.50
4 1 2.00
5 1 1.50
6 2 0.50
7 2 0.50
8 3 0.75
9 3 0.75
10 3 0.75
17 5 0.50
18 5 0.50
19 6 0.50
20 6 0.50
24 9 0.50
25 9 0.50

Ex. 1.4: Styling the Tree

CEV gives the user control over numerous visual aspects of the tree. By specifying optional columns and values in the tree input data.frame, the user has individual control of the colour, width, and line type of each node, label border, and edge.

Optional Style Columns

Style Column
Node Colour node.col
Node Label Colour node.label.col
Node Border Colour border.col
Node Border Width border.width
Node Border Line Type border.type
Edge Colour edge.col.1, edge.col.2
Edge Width edge.width.1, edge.width.2
Edge Line Type edge.type.1, edge.type.2

Default values replace missing columns and NA values, allowing node-by-node, and edge-by-edge control as needed. For sparsely defined values (for example, only specifying a single edge), it can be convenient to initialize a column with NAs, then manually assign specific nodes as needed.

Line Types

Valid values for line type columns are based on lattice’s values (with some additions and differences).

Line Type
NA
'none'
'solid'
'dashed'
'dotted'
'dotdash'
'longdash'
'twodash'

Styled Tree

node.col node.label.col border.col border.width border.type edge.col.1 edge.type.1 edge.col.2 edge.width.2
1 white black black NA NA NA NA green4 2
2 blue2 white white 2 dotted blue2 dashed green4 2
3 NA NA NA NA NA NA NA green4 2
4 NA lightblue lightblue 3 dotted lightblue dotted green4 2
node.style.tree <- SRCGrob(node.style);
## Only the first 2 "length" columns will be used. More branch lengths will be supported in a future version.
## Warning in add.axes(clone.out, yaxis.position, scale1 = scale1, scale2 =
## scale2, : Missing second y-axis label

Ex. 1.5: Showing Cellular Prevalence

A cellular.prevalence column can also be added. These values must range between 0 and 1, and the sum of all child nodes must not be larger than their parent node’s value.

parent length1 length2 CP
1 NA 12 850 1.00
2 1 10 1000 0.40
3 1 15 1100 0.23
4 2 10 760 0.31
CP.tree <- SRCGrob(CP);
## Only the first 2 "length" columns will be used. More branch lengths will be supported in a future version.
## Warning in add.axes(clone.out, yaxis.position, scale1 = scale1, scale2 =
## scale2, : Missing second y-axis label

Ex. 1.6: Simplifying the Tree

Complex trees may benefit from simpler visual styles. For example, there may not be room to render the node ellipses. CEV provides node-by-node control with the draw.node column.

parent draw.node
1 NA TRUE
2 1 FALSE
3 2 TRUE
4 2 TRUE
5 2 TRUE
6 3 FALSE
7 3 FALSE
8 3 FALSE
9 4 FALSE
10 4 FALSE
11 4 FALSE
12 4 FALSE
13 4 FALSE
14 5 FALSE
15 5 FALSE
16 5 FALSE
17 5 FALSE
nodeless.tree <- SRCGrob(nodeless);

Text Dataframe

This secondary dataframe can be used to specify additional text corresponding to each node.

Ex. 6: Node Text

Each row must include a node ID for the text. Text will be stacked next to the specified node.

name node
GENE1 2
GENE2 2
GENE3 2
GENE4 3
GENE5 3
simple.text.tree <- SRCGrob(parent.only, simple.text.data);
grid.draw(simple.text.tree);

Ex. 7: Specifying Colour and Style

  • An optional col column can be included to specify the colour of each text.
  • A fontface column can be included to bold, italicize, etc. These values correspond to the standard R fontface values.
  • NA values in each column will default to black and plain respectively.
name node col fontface
GENE1 2 red plain
GENE2 2 black plain
GENE3 2 blue NA
GENE4 3 NA italic
GENE5 3 red plain
full.text.tree <- SRCGrob(parent.only, simple.example$text);
grid.draw(full.text.tree);

Plot Parameters

The default settings should produce a reasonable baseline plot, but many users will want more control over their plot. This section will highlight some of the most common parameters in SRCGRob.

Plot Size

Ex. 8: Plot Width with Horizontal Padding

Some plots require more or less horizontal padding between the x-axes and the tree itself. The horizontal.padding parameter scales the default padding proportionally. For example, horizontal.padding = -0.2 would reduce the padding by 20%.

padding.tree <- SRCGrob(
    branch.lengths,
    horizontal.padding = -0.8
    );
## Only the first 2 "length" columns will be used. More branch lengths will be supported in a future version.
## Warning in add.axes(clone.out, yaxis.position, scale1 = scale1, scale2 =
## scale2, : Missing second y-axis label
grid.draw(padding.tree);

Ex. 9: Branch Scaling

Branches are scaled automatically, but users can further scale each branch with the scale1 and scale2 parameters. These values scale each branch proportionally, so scale1 = 1.1 would make the first set of branch lengths 10% longer.

scaled.tree <- SRCGrob(
    branch.lengths,
    scale1 = 1.5,
    scale2 = 0.5
    );
## Only the first 2 "length" columns will be used. More branch lengths will be supported in a future version.
## Warning in add.axes(clone.out, yaxis.position, scale1 = scale1, scale2 =
## scale2, : Missing second y-axis label
grid.draw(scaled.tree);

Ex. 10: Plot Title

The main title of the plot is referred to as main in plot parameters. main sets the title text, main.cex sets the font size, and main.y is used to move the main title up if more space is required for the plot.

title.tree <- SRCGrob(
    parent.only,
    main = 'Example Plot'
    );
grid.draw(title.tree);

X-Axes

A y-axis will be added automatically for each branch length column (the left-sided axis corresponding to the first branch length column, and the right with the second length column).

Ex. 11: Y-Axis

Ticks are placed automatically based on the plot size and the branch lengths.

Ex. 12: Axis Title

Axis titles are specified with the yaxis1.label and yaxis2.label parameters.

axis.title.tree <- SRCGrob(
    parent.only,
    yaxis1.label = 'SNVs',
    horizontal.padding = -0.6
    );
grid.draw(axis.title.tree);

Ex. 13: Axis Tick Placement

The default axis tick positions can be overridden with the yat parameter. This expects a list of vectors, each corresponding to the ticks on an x-axis.

xaxis1.ticks <- c(10, 20, 30, 35, 40);
xaxis2.ticks <- c(100, 250, 400);

yat.tree <- SRCGrob(
    branch.lengths,
    yat = list(
        xaxis1.ticks,
        xaxis2.ticks
        ),
    horizontal.padding = -0.4
    );
## Only the first 2 "length" columns will be used. More branch lengths will be supported in a future version.
## Warning in add.axes(clone.out, yaxis.position, scale1 = scale1, scale2 =
## scale2, : Missing second y-axis label
grid.draw(yat.tree);

Ex. 14: Normal

normal.tree <- SRCGrob(
    parent.only,
    add.normal = TRUE
    );
grid.draw(normal.tree);