Extending ggplot2 for fun and profit

Thomas Lin Pedersen

October 11, 2016

Introduction

Overview

  • About me
  • Recent changes to ggplot2
    • Grammar
    • Extensions!
  • ggraph
    • A grammar for relational data
      • Layouts
      • Nodes
      • Edges

About me

Portrait

  • Computational Biologist/Bioinformatician
  • Just started at Gubra
    • We are looking for data scientists!

About me

  • R developer
    • 6 packages on CRAN
    • 5 packages on Bioconductor
    • 5-10 packages in work on GitHub
  • Focus on bringing researchers closer to their data
    • Frameworks, GUIs and API design
    • Visualization

About me

data-imaginist

Blog
Data Imaginist
Twitter
@thomasp85
GitHub
thomasp85

Do reach out with ideas and collaborations!

(I’m busy but very interested in making cool stuff!)

ggplot2

The ggplot2 way

Declaractive visualization based on a graphic grammar

  • Geometric representation
  • Data subsetting
  • Statistical transformations
  • Positional adjustments
  • Scales (legends)
  • Coordinate systems
  • Plot styling

The ggplot2 way

#       Plot data
#          |
ggplot(diamonds) + 
# representation      Mapping to scales           Setting to constants
#     |                       |                           |
  geom_point(aes(x = carat, y = price, colour = cut), alpha = 0.3) +
#                                       Statistical transformation
#                                                   |
  geom_density_2d(aes(x = carat, y = price), stat = 'density2d') + 
#             Scale specification
#                     |
  scale_colour_brewer(type = 'qual') +
#  Data subsetting/splitting
#           |
  facet_wrap(~color) +
#   Styling
#     |
  theme_bw()

The ggplot2 way

Extending ggplot2

Opinionated at a cost

Extension support
Before v2 After v2
Sort of doable Build in

In v2.1

  • geoms
  • stats
  • coords
  • scales

In v2.2

  • facets

Extending ggplot2

A whole ecosystem has emerged - keep track of it on https://www.ggplot2-exts.org

Some highlights:

Relational data in ggplot2

ggraph

ggraph logo

ggraph

What is relational data?

  • Data points and their connections
  • Networks/graph
  • Trees and hierarchies

Everything can be relational!

ggraph

ggraph is not the only game in town:

But…

ggraph

… ggraph offers a very powerful grammar extensions that is not limited to node-edge diagrams.

Key concepts:

  • Layout
  • Edges
  • Nodes
  • Connections

Layouts

Mapping between hierarchical structure and position of nodes

Network example

Also for non-obvious cases

Treemap

Hive plot

Layouts

Often drives the interpretation of the network…

  • Nodes close to each other are more similar

… But be careful

  • Algorithm behavior is non-obvious for complex networks
  • Small changes in network structure can lead to large changes in layout
  • Can’t compare network layouts
  • You’re not smart because your visualization looks complex

For more hairball critic read Martin Krzywinski justification of hive plots

Layouts

In ggraph the layouts are defined as part of the setup:

gr <- erdos.renyi.game(10, 0.5)
gr_p <- ggraph(gr, 'igraph', algorithm = 'kk')

… or created beforehand

layout <- createLayout(gr, 'igraph', algorithm = 'kk')
gr_p <- ggraph(data = layout)

Edges

  • The connection between entities

  • Often a line of some sort

  • Not necessarily needed

Edges

Edges comes in many types

  • geom_edge_link()
  • geom_edge_loop()
  • geom_edge_fan()
  • geom_edge_diagonal()
  • geom_edge_elbow()
  • …

… and flavors

  • geom_edge_link()
  • geom_edge_link2()
  • geom_edge_link0()

Edges

… and some are layout specific

  • geom_edge_arc()
  • geom_edge_hive()

Edges

gr <- graph_from_data_frame(highschool)
p <- ggraph(gr, 'igraph', algorithm = 'kk')
p + geom_edge_link()

Edges

p + geom_edge_arc()

Edges

p + geom_edge_fan()

Edges

p + geom_edge_fan0(arrow = arrow(length = unit(0.3, 'cm')))

Edges

p + geom_edge_fan(aes(colour = factor(year), alpha = ..index..)) + 
  scale_edge_alpha(guide = 'edge_direction')

Edges

hr <- as.dendrogram(hclust(dist(iris[, 1:4]), 'ward.D2'))
p <- ggraph(hr, 'dendrogram')
p + geom_edge_link()

Edges

p <- ggraph(hr, 'dendrogram', repel = TRUE)
p + geom_edge_link()

Edges

p + geom_edge_elbow()

Edges

p + geom_edge_diagonal()

Edges

# Set class of node to the class of it's children they are of equal class
hr <- treeApply(hr, function(node, children, ...) {
  if (is.leaf(node)) {
    attr(node, 'nodePar') <- list(species=iris[as.integer(attr(node, 'label')),5])
    attr(node, 'nodePar')$class <- attr(node, 'nodePar')$species
  } else {
    classes <- lapply(children, attr, which = 'nodePar')
    classes <- unique(sapply(classes, `[[`, 'class'))
    if (length(classes) == 1 && !anyNA(classes)) {
      attr(node, 'nodePar')$class <- classes
    } else {
      attr(node, 'nodePar')$class <- NA
    }
  }
  node
}, direction = 'up')

Edges

p <- ggraph(hr, 'dendrogram', repel = TRUE)
p <- p + geom_edge_diagonal2(aes(colour = node.class), gEdges('long', nodePar = 'class'))
p

Nodes

The entities that are connected.

Nodes

Often points and/or labels. Size and colour often used to show node qualities

  • geom_node_point()
  • geom_node_text()

Layouts can introduce new node types:

  • geom_node_treemap()

Nodes

  • Nodes use regular ggplot2 scales
  • Edges use separate edge scale

Don’t go crazy with colour!

Nodes

p <- p + geom_node_point(aes(filter = leaf, colour = class)) + 
  geom_node_text(aes(filter = members > 25, label = members))
p

Nodes

ggraph(den_to_igraph(hr), 'treemap') + 
  geom_node_treemap(aes(filter = leaf, fill = species), colour = NA) + 
  geom_node_treemap(aes(size = members), colour = 'grey20') + 
  scale_size(range = c(0.5, 4), trans = 'sqrt', guide = 'none')

Connections

Relationship between nodes outside of network structure

Currently only support for hierarchical edge bundles

Connections

Connections

Connections

ggraph(flareGraph, 'dendrogram', circular = TRUE) +
  geom_edge_bundle(aes(alpha = ..index..), data = gCon(importFrom, importTo)) +
  geom_node_point(aes(filter = leaf, colour = class)) +
  coord_fixed()

Untangle the hairball

highGr <- graph_from_data_frame(highschool)
V(highGr)$community <- cluster_walktrap(highGr)$membership
V(highGr)$popularity <- degree(highGr, mode = 'in')
p <- ggraph(highGr, 'igraph', alg = 'kk') + 
  geom_edge_fan(aes(alpha = ..index..)) + 
  geom_node_point(aes(size = popularity))
p

p + facet_graph(nodes = community, edges = year)