Networks are used to describe and model various real-world phenomena such as social relationships or communications, transportation routes, electrical power grids, molecular interactions, etc. , and thus, it is not surprising that network visualization is a hot research problem.
However, network visualization is not a simple problem. A good network visualization should provide insights into network structural patterns, help identify key nodes and edges, and lead to better understanding of mechanisms of the phenomena it represents. Complexity of network visualization problem becomes more evident in the visualization of large scale networks, as these are often (poorly) visualized as a a big “hairballs,” preventing users from identifying any underlying network structural patterns.
As somebody who works with large scale networks, I know how tricky it can be to visualize a network in a comprehensible way. It often requires not only a decision about the most appropriate network layout, but also a decision about which node/edge properties to select in order to create the most informative network plot: a plot that is worth thousand words. Often, the optimal network layout and node/edge properties are not obvious. Sometimes, there is more than one layout/property that results in informative plots. Clearly, a step-by-step manual network visualization may not be the most efficient way to explore the space of various possibilities for visualizations of a given network. In this and the next couple of posts, I will show a few ways to quickly visualize networks directly from R, using different R packages.
For illustration purposes, I used a weighted network of characters’ coappearances in Victor Hugo’s novel “Les Miserables” (from D. E. Knuth, The Stanford GraphBase: A Platform for Combinatorial Computing, Addison-Wesley, Reading, MA, 1993). This network consists of 77 nodes, corresponding to characters, and 254 weighted edges, corresponding to the number of characters coappearances in the same chapter of the book. Note that this network is not a large scale network; however, this post is about visualization itself (not about identification of the optimal visualization layout/properties) and the example can be easily adjusted for the visualization of the large scale networks.
For all network manipulations, I used the “igraph” package. Given the Les Miserables network (LesMiserables.txt) in the three columns edge list format (column 1 = character 1, column2 = character 2, column3 = number of co-appearances between characters 1 and 2), I used the “graph.data.frame” command to create a network from the data frame and the “simplify” command to ensure that all edges in the network are unique and that there are no self loops.
Furthermore, before I plot the network, I characterize each of its nodes with two properties: degree and betweenness centrality, and each of its edges with two properties: weight and Dice similarity. The degree of a node tells us the number of edges incident to the node and the betweenness centrality tells us how central the node is, in terms of the number of shortest paths from all nodes to all others that pass through that node. The edge weight represents the number of characters co-appearances (these values came with the data set) and the Dice similarity represents the similarity between two nodes’ first neighborhoods, given as:
D(node1, node2) = 2 * number of mutual neighbours of node1 and node2 / (number of neighbours of node1 + number of neighbours of node2)
All these properties can be easily computed using the igraph package. Once computed, we will add them to the graph object use the “set.vertex.attribute” and “set.edge.attribute” commands. Next, we need to transform the network from igraph to graphNEL format, as RCytoscape requires networks to be in that format. Transformation can be easily done with the “igraph.to.graphNEL()” command. Although this command passes the values of properties/attributes, attribute names need to be redefined explicitly. To do so, we use the “initNodeAttribute” and “initEdgeAttribute” commands.
Now we are ready to plot our network. First, we will create a new Cytoscape window using the “new.CytoscapeWindow” command (don’t forget to activate the Cytoscape RPC plugin first). Then, we will use the “displayGraph” command to send the graph to that window. At this point, all nodes are displayed at the same position and overlap each other. Thus, we need to create a layout. We can see the list of available layouts and their properties with the “getLayoutNames” and “getLayoutPropertyNames” commands. For illustration purposes, I selected the 18th layout – a force directed “Fruchterman-Rheingold” layout. I used Dice similarities to define the force between two nodes. Using the “layoutNetwork” command, I applied the selected layout to network, and you can see the result below:
We can see that the selected layout put interconnected nodes closer to each other and that we can easily identify which characters in the novel are likely to appear together.
I am not a big fan of the default Cytoscape color scheme, so I defined my own using “setDefault[BackgroundColor/EdgeColor/NodeColor,…]” commands. And here is what I got:
Looks nicer. However, the information we can get from this plot is still the same. As the network nodes represent characters, we can decide to replace somewhat boring circles with the characters’ images using the command “setNodeImageDirect.” For the illustration purposes, I used only images for the characters of Jean Valjean, Cosette, Fantine, and Javert. This feature allows us to see a more personalized node representation.
To incorporate the rest of node/edge properties in the network visualization, I used “setNodeColorRule,” “setNodeSizeRule,” and “setEdgeColorRule” commands with the degree, betweenness centrality, and edge weight attributes. Specifically, I defined: 1) a node color based on the node degree – larger the node degree – darker the node; 2) a node size based on the node betweenness centrality – more central node – larger the node; and 3) an edge color based on the edge weight – larger the edge weight – darker the edge. The resulting network visualization allows us not only to see which characters in the novel are likely to appear together, but also which characters within such groups appeared more often together, which characters had a central role, and which represented a link between characters.
And here is the complete code: