Network visualization – part 6: D3 and R (networkD3)

I was never that much into JavaScript until I was introduced to D3.js. This open source JS library provides the features for dynamic data manipulation and visualization and allows users to become active participants in data visualization process. As such, D3.js plots are not a static “as it is” data representation, but allow users to explore data points, hierarchies among the data, filter data by groups, and similar.

As somebody who uses R often for data analysis, I was excited to see the some of the libraries that link R and D3, such as plotly and networkD3 (or previously https://cran.r-project.org/web/packages/d3Network/). Here, I will focus on the networkD3 package.

As before, the network I will use as an illustration is a weighted network of characters’ coappearances in Victor Hugo’s novel “Les Miserables” (from D. E. Knuth, The Stanford GraphBase: A Platform for Combinatorial Computing, Addison-Wesley, Reading, MA, 1993) that consists of 77 nodes, corresponding to characters, and 254 weighted edges, corresponding to the number of characters coappearances in the same chapter of the book. I used four properties to characterize this network (for the sole purpose of making visualization more interesting) – the network nodes were characterized with two properties: degree and betweenness centrality, and the network edges were characterized with two properties: weight and Dice similarity (to see more details about these properties, see Network Visualization part 1 blog post). For calculation of network properties, I used the igraph package.

The networkD3 package provides a function called igraph_to_networkD3, that uses an igraph object to convert it into a format that networkD3 uses to create a network representation. As I used igraph object to store my network, including node and edge properties, I was hoping that I may only need to use this function to create a visualization of my network. However, this function does not work exactly like that (which is not that surprising, given the differences in how D3.js works and how igraph object is defined). Instead, it extracts lists of nodes and edges from the igraph object, but not the information about all node and edges properties (the exception is a priori specified information about nodes membership groups/clusters, which can be derived from one or more network properties, e.g., node degree). Additionally, the igraph_to_networkD3 function does not plot the network itself, but only extracts parameters that are later used in the forceNetwork function that plots the network.

So let’s focus on the forceNetwork function instead. This function creates a D3.js force directed network graph from two data frames, one containing information about network nodes and the other one containing information about network edges. In our case, these data frames, denoted as nodeList and edgeList, respectively, contain the following columns (for more details see the code at the end of this post):

nodeList:ID“, “nName“, “nodeDegree“, and “nodeBetweenness
edgeList:SourceID“, “SourceName“, “TargetID“, “TargetName“, “Weight“, and “diceSim

Given the information about nodes and edges stored in these data frame, we will use the forceNetwork function to create a network in which node size corresponds to the node betweenness value, node color corresponds to node degree, distance between two nodes and edge thickness corresponds to edge weight, and edge color corresponds to the dice similarity. Each node will be described by its name. The forceNetwork function expects edge list to contain pairs of interactions in form of their IDs (starting from zero). Node attributes, stored in the nodeList data frame, are expected to be ordered in the increasing order starting with the first node (ID = 0). Based on this ordering, the forceNetwork function will know which node ID to map to specified node property (in our example, we used node name, “nName“). To use more than a single property, one will have to combine two existing node properties into a new one. Basically, this means that one needs to create a new column in the nodeList data frame, as the forceNetwork function uses column name (string) to specify this property. Node color is defined by the “Group” variable – all nodes assigned to the same group will be colored the same. Hence, if we want all nodes to be colored differently, each node will be assigned to different group (one can use node ID as a group number). In our case, we colored nodes based on their degree (“nodeDegree“) – all nodes with the same degree will be colored the same color. Variable Nodesize is used to define the size of the node. We used “nodeBetweenness” column to define node size. To define node link, we used a JS function to calculate distance between two nodes based on the value of the edge weight. The function uses variables already defined in the forceNetwork function (e.g., Value, or Nodesize), not variables/column names from the node and edge data frame. Thus, if you plan to use JS to perform any type of mathematical operations, selection of variables assign to the function is important (and also limiting factor). To define link color, we will interpolate edge colors based on their dice similarity values, using the “colorRampPalette” function (similar has been done and explained in one of the previous visualization blog posts (Network Visualization Part 2):

F2 <- colorRampPalette(c("#FFFF00", "#FF0000"), bias = nrow(edgeList), space = "rgb", interpolate = "linear")
colCodes <- F2(length(unique(edgeList$diceSim)))
edges_col <- sapply(edgeList$diceSim, function(x) colCodes[which(sort(unique(edgeList$diceSim)) == x)])

We can also define node opacity, opacity of node labels when they are inactive (no mouse over their corresponding nodes), and ability to zoom. Finally, the forceNetwork function provides an option to include additional functionalities, as a character string with a JavaScript expression that will be evaluated when there is a click on the node.

While the presented options allow us to create network representation described above, networkD3 package still lacks a number of features full D3.js library has and as such, has possible application limitations. For example, we cannot use different types of nodes (beside circles), edges (directed or undirected) or line styles (dashed, curved, etc)., we cannot assign edge labels or use multiple node labels, there are no filtering or zoom-in-zoom-out options that would accounted for different network structures (node clusters as a high-level visualization vs nodes within clusters as a low-level, in-depth visualization), etc.

Let's go back to our example. Given the above defined node and edge data frames, we can create a D3 object (denoted as D3_network_LM) as follows:

D3_network_LM <- networkD3::forceNetwork(Links = edgeList, Nodes = nodeList, Source = "SourceID", Target = "TargetID",Value = "Weight", NodeID = "nName", Nodesize = "nodeBetweenness", Group = "nodeDegree", height = 500, width = 1000, fontSize = 20, linkDistance = networkD3::JS("function(d) { return 10*d.value; }"), linkWidth = networkD3::JS("function(d) { return d.value/5; }"),opacity = 0.85, zoom = TRUE, opacityNoHover = 0.1, linkColour = edges_col)

To see the network we created, we just need to type its name:

D3_network_LM

D3 network representation

D3 network representation

Since we allowed the zoom option, double click on any node will zoom in the network and allow us to see that node and its neighborhood in more details:

D3 network representation - zoomed in

D3 network representation - zoomed in

Alternatively, we can save it as html file:
networkD3::saveNetwork(D3_network_LM, "D3_LM.html", selfcontained = TRUE)

and use it independently from R: click here to see the network exported as html file.

This example demonstrated that it is relatively easy to create a simple but still visually descriptive D3 network visualization from R with the networkD3 package. The simplicity to visualize network with networkD3 may be enough to make one ignore the lack of some features that would be available when working directly with D3, but would require significant time spent in learning D3 and designing a custom network visualization.

Finally, here is the code used to create the network:

############################################################################################
############################################################################################
# Plotting networks in R - an example how to plot a network and 
# customize its appearance using networkD3 library
############################################################################################
############################################################################################
# Clear workspace 
# rm(list = ls())
############################################################################################

# Read a data set. 
# Data format: dataframe with 3 variables; variables 1 & 2 correspond to interactions; variable 3 is weight of interaction
edgeList <- read.table("lesmis.txt", header = FALSE, sep = "\t")
colnames(edgeList) <- c("SourceName", "TargetName", "Weight")

# Create a graph. Use simplyfy to ensure that there are no duplicated edges or self loops
gD <- igraph::simplify(igraph::graph.data.frame(edgeList, directed=FALSE))

# Create a node list object (actually a data frame object) that will contain information about nodes
nodeList <- data.frame(ID = c(0:(igraph::vcount(gD) - 1)), # because networkD3 library requires IDs to start at 0
                       nName = igraph::V(gD)$name)

# Map node names from the edge list to node IDs
getNodeID <- function(x){
  which(x == igraph::V(gD)$name) - 1 # to ensure that IDs start at 0
}
# And add them to the edge list
edgeList <- plyr::ddply(edgeList, .variables = c("SourceName", "TargetName", "Weight"), 
                        function (x) data.frame(SourceID = getNodeID(x$SourceName), 
                                                TargetID = getNodeID(x$TargetName)))

############################################################################################
# Calculate some node properties and node similarities that will be used to illustrate 
# different plotting abilities and add them to the edge and node lists

# Calculate degree for all nodes
nodeList <- cbind(nodeList, nodeDegree=igraph::degree(gD, v = igraph::V(gD), mode = "all"))

# Calculate betweenness for all nodes
betAll <- igraph::betweenness(gD, v = igraph::V(gD), directed = FALSE) / (((igraph::vcount(gD) - 1) * (igraph::vcount(gD)-2)) / 2)
betAll.norm <- (betAll - min(betAll))/(max(betAll) - min(betAll))
nodeList <- cbind(nodeList, nodeBetweenness=100*betAll.norm) # We are scaling the value by multiplying it by 100 for visualization purposes only (to create larger nodes)
rm(betAll, betAll.norm)

#Calculate Dice similarities between all pairs of nodes
dsAll <- igraph::similarity.dice(gD, vids = igraph::V(gD), mode = "all")

F1 <- function(x) {data.frame(diceSim = dsAll[x$SourceID +1, x$TargetID + 1])}
edgeList <- plyr::ddply(edgeList, .variables=c("SourceName", "TargetName", "Weight", "SourceID", "TargetID"), 
                           function(x) data.frame(F1(x)))

rm(dsAll, F1, getNodeID, gD)

############################################################################################
# We will also create a set of colors for each edge, based on their dice similarity values
# We'll interpolate edge colors based on the using the "colorRampPalette" function, that 
# returns a function corresponding to a collor palete of "bias" number of elements (in our case, that
# will be a total number of edges, i.e., number of rows in the edgeList data frame)
F2 <- colorRampPalette(c("#FFFF00", "#FF0000"), bias = nrow(edgeList), space = "rgb", interpolate = "linear")
colCodes <- F2(length(unique(edgeList$diceSim)))
edges_col <- sapply(edgeList$diceSim, function(x) colCodes[which(sort(unique(edgeList$diceSim)) == x)])

rm(colCodes, F2)
############################################################################################
# Let's create a network

D3_network_LM <- networkD3::forceNetwork(Links = edgeList, # data frame that contains info about edges
                        Nodes = nodeList, # data frame that contains info about nodes
                        Source = "SourceID", # ID of source node 
                        Target = "TargetID", # ID of target node
                        Value = "Weight", # value from the edge list (data frame) that will be used to value/weight relationship amongst nodes
                        NodeID = "nName", # value from the node list (data frame) that contains node description we want to use (e.g., node name)
                        Nodesize = "nodeBetweenness",  # value from the node list (data frame) that contains value we want to use for a node size
                        Group = "nodeDegree",  # value from the node list (data frame) that contains value we want to use for node color
                        height = 500, # Size of the plot (vertical)
                        width = 1000,  # Size of the plot (horizontal)
                        fontSize = 20, # Font size
                        linkDistance = networkD3::JS("function(d) { return 10*d.value; }"), # Function to determine distance between any two nodes, uses variables already defined in forceNetwork function (not variables from a data frame)
                        linkWidth = networkD3::JS("function(d) { return d.value/5; }"),# Function to determine link/edge thickness, uses variables already defined in forceNetwork function (not variables from a data frame)
                        opacity = 0.85, # opacity
                        zoom = TRUE, # ability to zoom when click on the node
                        opacityNoHover = 0.1, # opacity of labels when static
                        linkColour = edges_col) # edge colors

# Plot network
D3_network_LM 
         
# Save network as html file
networkD3::saveNetwork(D3_network_LM, "D3_LM.html", selfcontained = TRUE)

################################################################################
# sessionInfo()
#
# R version 3.3.1 (2016-06-21)
# Platform: x86_64-redhat-linux-gnu (64-bit)
# Running under: Fedora 24 (Workstation Edition)
# 
# locale:
#   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
# [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
# [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
# [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
# 
# attached base packages:
#   [1] stats     graphics  grDevices utils     datasets  methods   base     
# 
# loaded via a namespace (and not attached):
#   [1] htmlwidgets_0.7  plyr_1.8.4       magrittr_1.5     htmltools_0.3.5  tools_3.3.1      igraph_1.0.1    
# [7] yaml_2.1.13      Rcpp_0.12.7      jsonlite_1.1     digest_0.6.10    networkD3_0.2.13
# 
################################################################################

Posted in Graphs, Networks, Visualization | Tagged , , , , , | Leave a comment