CancerEvolutionVisualization (CEV) creates publication quality phylogenetic tree plots. For simple plots, this package will handle most settings right out of the box. However, more complex plots may require some trial and error to achieve the right arrangement of nodes and branches.
This guide will show best practices for creating plots, as well as examples of common use cases and tips for refining plot settings.
There are many methods for determining subpopulations within genomic data, and you should be free to use whatever method you prefer for a given dataset. This package only handles visualization - not analysis. Therefore, data must be prepared and formatted before being passed to any CEV functions.
This is the primary source of data for a plot. It defines the tree’s structure of parent and child nodes. It also provides information about the number of mutations at each node.
The simplest input format is a column containing the parent node of
each individual node. (A node will only have one parent.) The root node
will not have a parent, so a value of NA
is used.
parent | |
---|---|
1 | NA |
2 | 1 |
3 | 1 |
4 | 2 |
It’s common to associate branch lengths with a the values of a
particular variable (for example, PGA or SNVs). Up to two branch lengths
can be specified. Including a length1
and/or
length2
column will enable this branch scaling behaviour,
and automatically add y-axis ticks and labels.
Multiple length values can be used together. All columns whose names
contain length
will be used. (For example,
length1
and snv.length
are both valid.)
Multiple columns will result in multiple (distinctly coloured) parallel
lines. Any branch length conflicts will be resolved automatically. For
each branch, the next node will be placed at the end of the longest
line.
parent | length1 | length2 | |
---|---|---|---|
1 | NA | 12 | 850 |
2 | 1 | 10 | 1000 |
3 | 1 | 15 | 1100 |
4 | 2 | 10 | 760 |
## Only the first 2 "length" columns will be used. More branch lengths will be supported in a future version.
## Warning in add.axes(clone.out, yaxis.position, scale1 = scale1, scale2 =
## scale2, : Missing second y-axis label
CEV provides several methods for refining the spacing and arrangement of a tree’s nodes. This is especially useful in complex trees, which often require more attention to avoid visual problems such as node collisions and uneven branch/level spacing. Here, we see a tree with many issues.
Consider this example tree.
An optional spread
column can be included in the input
tree data.frame. Spread operates relatively as a percentage of
the initial angle calculation.
spread
value of 1 or NA
will leave the
spacing unchanged unchanged.spread
value greater than 1 will increase
the space between nodes. For example, a spread
value of
1.25 will spread the nodes 25% more.spread
value less than 1 will decrease the
space between nodes. For example, a spread
of 0.85 will
spread the nodes 15% less.To create more space for the numerous nodes in the lower levels of our example tree, we can increase the spread of the nodes at the top level. To create even more room, we can decrease the spread of some lower level nodes where appropriate.
parent | spread | |
---|---|---|
1 | NA | NA |
2 | 1 | 1.25 |
3 | 1 | 1.50 |
4 | 1 | 2.00 |
5 | 1 | 1.50 |
6 | 2 | 0.50 |
7 | 2 | 0.50 |
8 | 3 | 0.75 |
9 | 3 | 0.75 |
10 | 3 | 0.75 |
17 | 5 | 0.50 |
18 | 5 | 0.50 |
19 | 6 | 0.50 |
20 | 6 | 0.50 |
24 | 9 | 0.50 |
25 | 9 | 0.50 |
CEV gives the user control over numerous visual aspects of the tree. By specifying optional columns and values in the tree input data.frame, the user has individual control of the colour, width, and line type of each node, label border, and edge.
Style | Column |
---|---|
Node Colour | node.col |
Node Label Colour | node.label.col |
Node Border Colour | border.col |
Node Border Width | border.width |
Node Border Line Type | border.type |
Edge Colour | edge.col.1 , edge.col.2 |
Edge Width | edge.width.1 , edge.width.2 |
Edge Line Type | edge.type.1 , edge.type.2 |
Default values replace missing columns and NA
values,
allowing node-by-node, and edge-by-edge control as needed. For sparsely
defined values (for example, only specifying a single edge), it can be
convenient to initialize a column with NA
s, then manually
assign specific nodes as needed.
Valid values for line type columns are based on lattice’s values (with some additions and differences).
Line Type |
---|
NA |
'none' |
'solid' |
'dashed' |
'dotted' |
'dotdash' |
'longdash' |
'twodash' |
node.col | node.label.col | border.col | border.width | border.type | edge.col.1 | edge.type.1 | edge.col.2 | edge.width.2 | |
---|---|---|---|---|---|---|---|---|---|
1 | white | black | black | NA | NA | NA | NA | green4 | 2 |
2 | blue2 | white | white | 2 | dotted | blue2 | dashed | green4 | 2 |
3 | NA | NA | NA | NA | NA | NA | NA | green4 | 2 |
4 | NA | lightblue | lightblue | 3 | dotted | lightblue | dotted | green4 | 2 |
## Only the first 2 "length" columns will be used. More branch lengths will be supported in a future version.
## Warning in add.axes(clone.out, yaxis.position, scale1 = scale1, scale2 =
## scale2, : Missing second y-axis label
A cellular.prevalence
column can also be added. These
values must range between 0 and 1, and the sum of all child nodes must
not be larger than their parent node’s value.
parent | length1 | length2 | CP | |
---|---|---|---|---|
1 | NA | 12 | 850 | 1.00 |
2 | 1 | 10 | 1000 | 0.40 |
3 | 1 | 15 | 1100 | 0.23 |
4 | 2 | 10 | 760 | 0.31 |
## Only the first 2 "length" columns will be used. More branch lengths will be supported in a future version.
## Warning in add.axes(clone.out, yaxis.position, scale1 = scale1, scale2 =
## scale2, : Missing second y-axis label
Complex trees may benefit from simpler visual styles. For example,
there may not be room to render the node ellipses. CEV provides
node-by-node control with the draw.node
column.
parent | draw.node | |
---|---|---|
1 | NA | TRUE |
2 | 1 | FALSE |
3 | 2 | TRUE |
4 | 2 | TRUE |
5 | 2 | TRUE |
6 | 3 | FALSE |
7 | 3 | FALSE |
8 | 3 | FALSE |
9 | 4 | FALSE |
10 | 4 | FALSE |
11 | 4 | FALSE |
12 | 4 | FALSE |
13 | 4 | FALSE |
14 | 5 | FALSE |
15 | 5 | FALSE |
16 | 5 | FALSE |
17 | 5 | FALSE |
This secondary dataframe can be used to specify additional text corresponding to each node.
Each row must include a node ID for the text. Text will be stacked next to the specified node.
name | node |
---|---|
GENE1 | 2 |
GENE2 | 2 |
GENE3 | 2 |
GENE4 | 3 |
GENE5 | 3 |
col
column can be included to specify the
colour of each text.fontface
column can be included to bold, italicize,
etc. These values correspond to the standard R fontface
values.NA
values in each column will default to
black
and plain
respectively.name | node | col | fontface |
---|---|---|---|
GENE1 | 2 | red | plain |
GENE2 | 2 | black | plain |
GENE3 | 2 | blue | NA |
GENE4 | 3 | NA | italic |
GENE5 | 3 | red | plain |
The default settings should produce a reasonable baseline plot, but
many users will want more control over their plot. This section will
highlight some of the most common parameters in
SRCGRob
.
Some plots require more or less horizontal padding between the x-axes
and the tree itself. The horizontal.padding
parameter
scales the default padding proportionally. For example,
horizontal.padding = -0.2
would reduce the padding by
20%.
## Only the first 2 "length" columns will be used. More branch lengths will be supported in a future version.
## Warning in add.axes(clone.out, yaxis.position, scale1 = scale1, scale2 =
## scale2, : Missing second y-axis label
Branches are scaled automatically, but users can further scale each
branch with the scale1
and scale2
parameters.
These values scale each branch proportionally, so
scale1 = 1.1
would make the first set of branch lengths 10%
longer.
## Only the first 2 "length" columns will be used. More branch lengths will be supported in a future version.
## Warning in add.axes(clone.out, yaxis.position, scale1 = scale1, scale2 =
## scale2, : Missing second y-axis label
The main title of the plot is referred to as main
in
plot parameters. main
sets the title text,
main.cex
sets the font size, and main.y
is
used to move the main title up if more space is required for the
plot.
A y-axis will be added automatically for each branch length column (the left-sided axis corresponding to the first branch length column, and the right with the second length column).
Ticks are placed automatically based on the plot size and the branch lengths.
Axis titles are specified with the yaxis1.label
and
yaxis2.label
parameters.
The default axis tick positions can be overridden with the
yat
parameter. This expects a list of vectors, each
corresponding to the ticks on an x-axis.
xaxis1.ticks <- c(10, 20, 30, 35, 40);
xaxis2.ticks <- c(100, 250, 400);
yat.tree <- SRCGrob(
branch.lengths,
yat = list(
xaxis1.ticks,
xaxis2.ticks
),
horizontal.padding = -0.4
);
## Only the first 2 "length" columns will be used. More branch lengths will be supported in a future version.
## Warning in add.axes(clone.out, yaxis.position, scale1 = scale1, scale2 =
## scale2, : Missing second y-axis label