In the fourth and final part of my graph visualization series, I’ll show how to create 3D network plots. 3D plots are more than just pretty plots – they allow you to rotate, scale, and zoom in and out of the network. These options may help identify interesting interaction patterns. Unfortunately, there are not many 3D network visualization tools (especially not free ones). I’m also not aware of any tools that have a R library.
So how are we going to create a 3D plot from R?
Well, we need to be clever: we will pretend that our graph represents a chemical structure and use Jmol, an open-source 3D viewer for chemical structures, to visualize it. As there is no direct link between R and Jmol, the only way to visualize our network is to create a corresponding Jmol file (.mol2 file) in R and open it in Jmol. This is similar to what we have done for Gephi.
As before, we will use the weighted network of characters’ coappearances in Victor Hugo’s novel “Les Miserables” (LesMiserables.txt). Unfortunately, the number of molecule (graph) properties that we can load in Jmol directly from .mol2 file is limited, so we will only use one property – node degree – and we will use it to define a node size. Note that various molecule (graph) properties in Jmol can be set through Jmol scripts (see below), but we won’t focus on those in this blog.
First, let’s load our network and calculate degrees of all nodes:
rm(list = ls())
# Read data
dataSet <- read.table("lesmis.txt", header = FALSE, sep = "\t") # Create a graph. Use simplify to ensure that there are no duplicated edges or self loops gD <- simplify(graph.data.frame(dataSet, directed=FALSE)) # Calculate degree for all nodes degAll <- degree(gD, v = V(gD), mode = "all")
Next, we'll calculate the node coordinates using layout function from igraph library:
coord3D <- layout.fruchterman.reingold(gD, dim = 3)
Now, we are ready to make the .mol2 file. We'll call it 3D_lesmis.mol2. We'll use the igraph library functions vcount() and ecount(gD) to compute the number of nodes and edges. For more details about .mol2 file format, see: mol2.pdf
Next, we will add a list of nodes into our 3D_lesmis.mol2 file. Each node is represented by an atom, and the atom type will define the color of the node. We will define nodes as follows: nodes with degree one will be represented as helium atoms (light blue); degree two as sodium atoms (purple); degree 3-5 as oxygen atoms (red); degree 6-10 as gold (yellow); degree 11-15 as phosphorus (orange); degree larger than 15 as chlorine atoms (green). For a full color scheme, see: jscolors. In this step, we will also assign node coordinates to each atom:
Here is the resulting file: 3D_lesmis.mol2. The resulting network is shown in Figure 1A:
By default, the color of the edges are inferred from the color of the nodes. We can change that easily as follows: do a right click anywhere in the network window to open a pop-up menu. Then click on "Color," then "Bonds," and then select the color of choice (I used white). Now all edges are the same color (white), as shown in Figure 1B. I find the edges too thick compared to nodes sizes. To change edge thickness, go to the "Display" menu, select "Bond," and then the type you want (I chose "Wireframe"). As we can see in Figure 1C, our network looks nice. Unsurprisingly (as we used the same layout), it looks similar to the networks we created in Cytoscape and Gephi. However, here we can rotate the network and see the interactions from various angles and detail levels without a need to create multiple network plots. There is one more thing we are missing - node labels. They are a part of the .mol2 file, but are not visible by default. To show labels, go to the "Display" menu, select "Label," and then select "Name." Super easy! The resulting network is shown in the Figure 2A.
Jmol allows us to do additional customizations with Jmol scripts. The scripts are very intuitive and they can be used to define different node (atom) and edge (bond) sizes, edge colors, as well as the node label size. To use Jmol scripts, we need Jmol console. You can find it at "File" menu -> "Console."
When using Jmol scripts, the first thing to do is to select a node (or edge) on which we want to work. When a node/edge is selected, we can define its attributes. For example, to change the size of helium atoms, we use a command select: select _He" (there must be an underscore before the atom symbol). Then, we can define its size with a command spacefill. We can also define the edge thickness between any two helium atoms using the wireframe command, as well as the color of this edge with the color bonds command. Finally, we can define the size of the label accompanying the atom/node with the font label command.
So let's how it works.
select _He; spacefill 0.2; wireframe 10; color bonds blue; font label 10;
select _Na; spacefill 0.4; wireframe 12; color bonds purple; font label 12;
select _O; spacefill 0.6; wireframe 14; color bonds red; font label 14;
select _Au; spacefill 0.8; wireframe 16; color bonds yellow; font label 16;
select _P; spacefill 1.0; wireframe 18; color bonds orange; font label 18;
select _Cl; spacefill 1.5; wireframe 20; color bonds green; font label 20;
Figure 2B shows the results of the script above (with zoomed in). In case that we are not interested in all labels, e.g., for nodes that do not have many interacting partners, we can remove some of the labels using the label hide command (Figure 2C and 2D):
In the third part of “how to quickly visualize networks directly from R” series, I’ll write about the hive plots and “HiveR” package. The concept of hive plots is fundamentally different from the Cytoscape and Gephi plots.
Cytoscape and Gephi use a number of layout algorithms to plot networks as node-edge diagrams in the Euclidean plane. The layout algorithms determine node (and edge) positions based on various criteria, e.g., the number of direct interacting partners, smallest number of edge crossings, or similar edge length between all nodes. Clearly, the resulting plots are sensitive to changes and even a small change in underlying topology can lead to a change in the final layout. For this reason, it is hard to assess how similar (or different) two networks are solely based on their resulting layouts/plots. Additionally, standard network layouts generally work well for visualization of small/medium size networks, while visualization of large network often results in the “hairball” network plots that lack identifiable structural patterns.
Conversely to standard network plots (i.e., layout algorithms), the goal of hive plots is to capture and expose both trends and patterns in network structure that arise from large number of nodes and edges, rather than solely representing network structure in the form of node-edge diagrams. Thus, in the hive plots individual nodes and edges are not as important as individual elements, but as parts of a system.
Hive plots map nodes onto radially distributed linear axes and edges between nodes are drawn as curved links that connect the axes. Nodes are assigned to axes and position along the axis (denoted as the radius) based on their qualitative or quantitative properties, e.g., network structure, node, edge annotation, or any other meaningful properties of the network. Thus, using hive plots, users can create their own rules for a mapping between the network properties of interest and layout. As such, hive plots give users the ability to assess network structure using network properties they are interested in, as well as the ability to compare two networks based on the selected properties.
To demonstrate how this works, I will use the same network I used to demonstrate network visualization in Cytoscape – the weighted network of characters’ coappearances in Victor Hugo’s novel “Les Miserables” (LesMiserables.txt). I will also use the same node and edge properties: the degree of a node, betweenness centrality of a node, Dice similarity of two nodes, and the coappearance weight. For more information, see Network visualization – part 1 and Network visualization – part 2).
Given a network in an edge list format (data from column 1 and column 2 correspond to the interacting pairs of nodes), we can use the “edge2HPD” command to create a hive object. For example, if the list of interactions is given in the data frame denoted as “dataSet.ext,” we can create a hive object as:
hive1 <- edge2HPD(edge_df = dataSet.ext)
This function will assign all nodes to a single axis. Additionally, all nodes will be assigned the same position along the axis (the same radius), the same color, and node size. If the data frame contained third column, e.g., weights, the "edge2HPD" function will also assign the values from that column to the corresponding edges. To adjust node radius, we will use the "mineHPD" function and
it "rad <- tot.edge.count" option. This option will assign to each node a radius that corresponds to its degree: hive2 <- mineHPD(hive1, option = "rad <- tot.edge.count")
We'll also use the "mineHPD" function (and its the "axis <- source.man.sink" option) to assign nodes to different axes. The "axis <- source.man.sink" option assumes that the edges provided in the data frame represent directed edges, i.e., the first column in the data frame represents the "from" and the second column represents the "to" node. This option examines the nodes and their corresponding edges to determine if the node is a source (has only outgoing edges), sink (has only has incoming edges), or manager (has both types of edges). For now we will ignore the fact that our network is not directed and we'll use this function/option as follows: hive3 <- mineHPD(hive2, option = "axis <- source.man.sink")
Hive plot requires that none of the edges starts and ends at the same node, not that any edges has zero length because the axis and radius of the start and end nodes are the same. We will use the "remove zero edge" option from the "mineHPD" function to remove any such edge (note that this will not influence the resulting plot).
hive4 <- mineHPD(hive3, option = "remove zero edge")
Finally, let's plot the hive plot using the "plotHive" function: plotHive(hive4, method = "abs", bkgnd = "white", axLabs = c("source", "hub", "sink"), axLab.pos = 1)
Figure 1A shows the resulting (default) plot. We can see that most nodes are either sources or manager nodes. Unfortunately, this does not mean too much for us, as our graph is undirected and the obtained visualization does not correspond/describe our data truthfully. We can try to customize the plot to see whether or not it'll highlight some of the real properties our data has. To do so, we use the option to directly access hive object elements that correspond to node color, node size, edge color, and edge weight: "hive4$nodes$color," "hive4$nodes$size, "hive4$edges$color," and "hive4$edges$weight," respectively. We assigned node color based on the node degree, node size based on node's betweenness centrality, edge color based on Dice similarity, and edge thickness based on the weight. Figure 1B-D shows the obtained results. The customization has brought out some patterns, but it still includes the "direction" bias.
From default to customized hive plot (edge2HPD version)
HiveR also allows users to create a hive object from the adjacency matrix.
Using the "igraph" package, we can create a graph that corresponds to the data frame we used above. We can specify that our graph is undirected. Next, we can extract an adjacency matrix from the graph. Given that all available HiveR functions assume that underlying graphs are directed, we will create only the upper triangle of the adjacency matrix. Finally, we will use the "adj2HPD" function to create a hive object
Repeating the same steps as above (for "edge2HPD"), we created the following hive plot:
Customized hive plot (adj2HPD version)
We can see that it is very similar to the plot created with edge2HPD. There are more interactions between source, manager, and sink nodes than before, but it is still hard to say what that information/observation means for the undirected graph as ours.
Trying to overcome this problem, I wrote a few additional options for the "mineHPD" function (see "mod.mineHPD" function).
For example, I wanted to assign low connected nodes to one axis, medium connected nodes to another axis, and highly connected nodes to the third axis. I used used the "axis <- deg_five_ten_more" function to do so. I decided that at this point I am not interested in node radius, so I assign random radius value to all nodes ("rad <- random" option). The resulting plot is in the Figure 3A. This plot looks similar to the previous ones. However, from this plot we can say for sure that out data contains a large number of highly connected nodes that interact with each other. To further evaluate this observation, I used the "axis <- split" option.This function splits each of the 3 axes into 2 new axes (thus, resulting in 6 axes) and provides the better visualization of the interactions between the nodes on the same axis (in the original plot). Indeed, the Figure 3B shows how the "strength" of interactions between the highly connected nodes. Next, I wanted to create a plot in which node radius to corresponds node's betweenness centrality. To do so, I used the option "rad <- userDefined." This option requires the information about the source (data frame) where that contains information about the nodes and their corresponding (betweenness centrality) values that will be used for the radius. Similarly as before, I wanted to assign nodes to axes based on their degree. In this case, I wanted nodes with degree 1 to be assigned to axis 1, nodes with degree 2 to be assigned to axis 2, and all other nodes to be assigned to axis 3. To do so, I used the "deg_one_two_more" function. The resulting plot and its "split" version are shown in Figure 3C and Figure 3D. [caption id="attachment_200" align="alignnone" width="300"] Customized hive plot (new customization functions)[/caption]
These new functionalities can help us identify some additional patterns in the underlying interactions, especially in undirected ones. However, adding a new functionality every time we want to create a hive plot in slightly different is not necessarily the optimal way, as the ways to define node radius or axis assignment are unlimited (well, not exactly, but if properly configured - almost unlimited). To address this issue, I expanded the "edge2HPD" function (see "mod.edge2HPD" function) to include the options for automatic node color, size radius, and axis assignment, as well as the automatic assignment of edge color and weight.
Using this function we can create a hive plot in which we assigned nodes to axes randomly (Figure 4A). You can notice how the structural patterns, we observed previously, are lost in this plot. This directly demonstrate the significance of the appropriate node axis assignment. To test this, we clustered nodes based on the Dice similarity (using hierarchical clustering). We "cut" the resulting tree in the way that results in six non-overlapping clusters. We assigned nodes from each of the six cluster to six axes. Figure 4B captures the relationship between the interactions within and across clusters.
Customized hive plot (new edge2HPD function)
Here are the additional functions and the complete code used for create hive plots: