Network visualization – part 1: Cytoscape

Networks are used to describe and model various real-world phenomena such as social relationships or communications, transportation routes, electrical power grids, molecular interactions, etc. , and thus, it is not surprising that network visualization is a hot research problem.

However, network visualization is not a simple problem. A good network visualization should provide insights into network structural patterns, help identify key nodes and edges, and lead to better understanding of mechanisms of the phenomena it represents. Complexity of network visualization problem becomes more evident in the visualization of large scale networks, as these are often (poorly) visualized as a a big “hairballs,” preventing users from identifying any underlying network structural patterns.

As somebody who works with large scale networks, I know how tricky it can be to visualize a network in a comprehensible way. It often requires not only a decision about the most appropriate network layout, but also a decision about which node/edge properties to select in order to create the most informative network plot: a plot that is worth thousand words. Often, the optimal network layout and node/edge properties are not obvious. Sometimes, there is more than one layout/property that results in informative plots. Clearly, a step-by-step manual network visualization may not be the most efficient way to explore the space of various possibilities for visualizations of a given network. In this  and the next couple of posts, I will show a few ways to quickly visualize networks directly from R, using different R packages.

For illustration purposes, I used a weighted network of characters’ coappearances in Victor Hugo’s novel “Les Miserables” (from D. E. Knuth, The Stanford GraphBase: A Platform for Combinatorial Computing, Addison-Wesley, Reading, MA, 1993). This network consists of 77 nodes, corresponding to characters, and 254 weighted edges, corresponding to the number of characters coappearances in the same chapter of the book. Note that this network is not a large scale network; however, this post is about visualization itself (not about identification of the optimal visualization layout/properties) and the example can be easily adjusted for the visualization of the large scale networks.

This post is about plotting networks in Cytoscape using the “RCytoscape” package.

For all network manipulations, I used the “igraph” package. Given the Les Miserables network (LesMiserables.txt) in the three columns edge list format (column 1 = character 1, column2 = character 2, column3 = number of co-appearances between characters 1 and 2), I used the “graph.data.frame” command to create a network from the data frame and the “simplify” command to ensure that all edges in the network are unique and that there are no self loops.

Furthermore, before I plot the network, I characterize each of its nodes with two properties: degree and betweenness centrality, and each of its edges with two properties: weight and Dice similarity. The degree of a node tells us the number of edges incident to the node and the betweenness centrality tells us how central the node is, in terms of the number of shortest paths from all nodes to all others that pass through that node. The edge weight represents the number of characters co-appearances (these values came with the data set) and the Dice similarity represents the similarity between two nodes’ first neighborhoods, given as:

D(node1, node2) = 2 * number of mutual neighbours of  node1 and node2 / (number of neighbours of node1 + number of neighbours of node2)

All these properties can be easily computed using the igraph package. Once computed, we will add them to the graph object use the “set.vertex.attribute” and “set.edge.attribute” commands. Next, we need to transform the network from igraph to graphNEL format, as RCytoscape requires networks to be in that format. Transformation can be easily done with the “igraph.to.graphNEL()” command. Although this command passes the values of properties/attributes, attribute names need to be redefined explicitly. To do so, we use the “initNodeAttribute” and “initEdgeAttribute” commands.

Now we are ready to plot our network. First, we will create a new Cytoscape window using the “new.CytoscapeWindow” command (don’t forget to activate the Cytoscape RPC plugin first). Then, we will use the “displayGraph” command to send the graph to that window. At this point, all nodes are displayed at the same position and overlap each other. Thus, we need to create a layout. We can see the list of available layouts and their properties with the “getLayoutNames” and “getLayoutPropertyNames” commands. For illustration purposes, I selected the 18th layout – a force directed “Fruchterman-Rheingold” layout. I used Dice similarities to define the force between two nodes. Using the “layoutNetwork” command, I applied the selected layout to network, and you can see the result below:

Default Cytoscape network visual style

Default Cytoscape network visual style

We can see that the selected layout put interconnected nodes closer to each other and that we can easily identify which characters in the novel are likely to appear together.

I am not a big fan of the default Cytoscape color scheme, so I defined my own using “setDefault[BackgroundColor/EdgeColor/NodeColor,…]” commands. And here is what I got:

User specified network visual style

User specified network visual style

Looks nicer. However, the information we can get from this plot is still the same. As the network nodes represent characters, we can decide to replace somewhat boring circles with the characters’ images using the command “setNodeImageDirect.” For the illustration purposes, I used only images for the characters of Jean Valjean, Cosette, Fantine, and Javert. This feature allows us to see a more personalized node representation.

User specified node visual style - circle vs image

User specified node visual style – circle vs image

To incorporate the rest of node/edge properties in the network visualization, I used “setNodeColorRule,” “setNodeSizeRule,” and “setEdgeColorRule” commands with the degree, betweenness centrality, and edge weight attributes. Specifically, I defined: 1) a node color based on the node degree – larger the node degree – darker the node; 2) a node size based on the node betweenness centrality – more central node – larger the node; and 3) an edge color based on the edge weight – larger the edge weight – darker the edge. The resulting network visualization allows us not only to see which characters in the novel are likely to appear together, but also which characters within such groups appeared more often together, which characters had a central role, and which represented a link between characters.

Full user network visual style customization

Full user network visual style customization

And here is the complete code:

# Plotting networks in R 
# An example how to plot networks and customize their appearance in Cytoscape directly from R, using RCytoscape package

############################################################################################
# Clear workspace 
rm(list = ls())

# Load libraries
library("igraph")
library("plyr")

# Read a data set. 
# Data format: dataframe with 3 variables; variables 1 & 2 correspond to interactions; variable 3 corresponds to the weight of interaction
dataSet <- read.table("lesmis.txt", header = FALSE, sep = "\t")

# Create a graph. Use simplify to ensure that there are no duplicated edges or self loops
gD <- simplify(graph.data.frame(dataSet, directed=FALSE))

# Print number of nodes and edges
# vcount(gD)
# ecount(gD)

############################################################################################
# Calculate some node properties and node similarities that will be used to illustrate 
# different plotting abilities

# Calculate degree for all nodes
degAll <- degree(gD, v = V(gD), mode = "all")

# Calculate betweenness for all nodes
betAll <- betweenness(gD, v = V(gD), directed = FALSE) / (((vcount(gD) - 1) * (vcount(gD)-2)) / 2)
betAll.norm <- (betAll - min(betAll))/(max(betAll) - min(betAll))
rm(betAll)

#Calculate Dice similarities between all pairs of nodes
dsAll <- similarity.dice(gD, vids = V(gD), mode = "all")

############################################################################################
# Add new node/edge attributes based on the calculated node properties/similarities

gD <- set.vertex.attribute(gD, "degree", index = V(gD), value = degAll)
gD <- set.vertex.attribute(gD, "betweenness", index = V(gD), value = betAll.norm)

# Check the attributes
# summary(gD)

F1 <- function(x) {data.frame(V4 = dsAll[which(V(gD)$name == as.character(x$V1)), which(V(gD)$name == as.character(x$V2))])}
dataSet.ext <- ddply(dataSet, .variables=c("V1", "V2", "V3"), function(x) data.frame(F1(x)))

gD <- set.edge.attribute(gD, "weight", index = E(gD), value = 0)
gD <- set.edge.attribute(gD, "similarity", index = E(gD), value = 0)

# The order of interactions in gD is not the same as it is in dataSet or as it is in the edge list,
# and for that reason these values cannot be assigned directly

E(gD)[as.character(dataSet.ext$V1) %--% as.character(dataSet.ext$V2)]$weight <- as.numeric(dataSet.ext$V3)
E(gD)[as.character(dataSet.ext$V1) %--% as.character(dataSet.ext$V2)]$similarity <- as.numeric(dataSet.ext$V4)

# Check the attributes
# summary(gD)

####################################
# Print network in Cytoscape
# This requires RCytoscape package and CytoscapeRPC plugin

library("RCytoscape")

gD.cyt <- igraph.to.graphNEL(gD)

# We have to create attributes for graphNEL
# We'll keep the same name, so the values are passed from igraph

gD.cyt <- initNodeAttribute(gD.cyt, 'degree', 'numeric', 0) 
gD.cyt <- initNodeAttribute(gD.cyt, 'betweenness', 'numeric', 0) 
gD.cyt <- initEdgeAttribute (gD.cyt, "weight", 'integer', 0)
gD.cyt <- initEdgeAttribute (gD.cyt, "similarity", 'numeric', 0)

# Now we can create a new graph window in cytoscape
# Be sure that CytoscapeRPC plugin is activated
gDCW <- new.CytoscapeWindow("Les Miserables", graph = gD.cyt, overwriteWindow = TRUE)

# We can display graph, with defaults color/size scheme
displayGraph(gDCW)

# If you also want to choose a layout from R, a list  of available layouts can be accessed as follows:
cy <- CytoscapeConnection()
hlp <-getLayoutNames(cy)
# We'll select the layour number 18 - "fruchterman-rheingold" layout 
# See properties for the given layout
# getLayoutPropertyNames(cy, hlp[18])
# Apply values to some of the properties and plot the layout
setLayoutProperties (gDCW, hlp[18], list (edge_attribute = 'similarity', iterations = 1000))
layoutNetwork(gDCW, hlp[18])

# Figure 1 made here

# Now, we can define our own default color/size scheme
setDefaultBackgroundColor(gDCW, '#FFFFFF')
setDefaultEdgeColor(gDCW, '#CDC9C9')
setDefaultEdgeLineWidth(gDCW, 4)
setDefaultNodeBorderColor(gDCW, '#000000')
setDefaultNodeBorderWidth(gDCW, 3)
setDefaultNodeShape(gDCW, 'ellipse')
setDefaultNodeColor(gDCW, '#87CEFA')
setDefaultNodeSize(gDCW, 60)
setDefaultNodeFontSize(gDCW, 20)
setDefaultNodeLabelColor(gDCW, '#000000')

# And we can replot it 
redraw(gDCW)       

# Figure 2 made here

# Now, we can replace some nodes with images
# You need to download images and put them in the "Images" folder 
# Or you can change the code to use online images (provide URLs)
setNodeImageDirect (gDCW, c('Cosette', 'Fantine', 'Javert', 'JeanValjean'), c(sprintf ('file://%s/%s', getwd (), 'Images//Cosette.jpg'), sprintf ('file://%s/%s', getwd (), 'Images//Fantine.jpg'), sprintf ('file://%s/%s', getwd (), 'Images//Javert.jpg'), sprintf ('file://%s/%s', getwd (), 'Images//JeanValjean.jpg')))
redraw (gDCW)

# Figure 3 made here

# Finally, we can define rules for node colors, node sizes, and edge colors
setNodeColorRule(gDCW, 'degree', c(min(degAll), mean(degAll), max(degAll)), c('#F5DEB3', '#FFA500', '#FF7F50', '#FF4500', '#FF0000'), mode = 'interpolate')
setNodeSizeRule(gDCW, 'betweenness', c(min(betAll.norm), mean(betAll.norm), max(betAll.norm)), c(30, 45, 60, 80, 100), mode = 'interpolate')
setEdgeColorRule(gDCW, 'weight', c(min(as.numeric(dataSet.ext$V3)), mean(as.numeric(dataSet.ext$V3)), max(as.numeric(dataSet.ext$V3))), c('#FFFF00', '#00FFFF', '#00FF7F', '#228B22', '#006400'), mode='interpolate')
redraw (gDCW)

# Figure 4 made here

This entry was posted in Graphs, Networks, Visualization and tagged , , . Bookmark the permalink.

9 Responses to Network visualization – part 1: Cytoscape

  1. Guillermo Reales says:

    Hi there, I was following your explanations with great interest but unhappily I realized that there’s no CytoscapeRPC plugin available for Cytoscape 3.1. Any suggestion on how to work this out?
    Thanks and Congrats for the post and blog!

    • admin says:

      Thanks!

      I still use the old version of Cytoscape to plot networks from R. I really hope that the CytoscapeRPC plugin will become available for Cytoscape 3 soon, so I don’t have to switch between versions.
      Did you contact the CytoscapeRPC authors?

  2. Bill Longabaugh says:

    The recommended way to communicate with Cytoscape in the Version 3 series is through the cyREST API. See: http://apps.cytoscape.org/apps/cyrest

  3. admin says:

    Thanks!

  4. Magesh says:

    Hi
    I am new to graph. I was trying to recreate your example. When I execute the code:

    degAll <- degree(gD, v = V(gD), mode = "all")

    I am getting the following error:

    Error in degree(gd, v = V(gd), mode = "all") :
    unused arguments (v = V(gd), mode = "all")

    Don't know how to proceed further. I am using R Version 3.2.0 and igraph 1.0.1 and ply 1.8.3. Could you please help?

    • admin says:

      Hmmm… I am not sure what’s the problem. I just ran the code from the blog and it works for me (R version 3.2.2, igraph version 1.0.1, and plyr version 1.8.3).

      Does it work if you do degAll < - degree(gD)?

    • Raja says:

      use gD in the place of gd and see if the error goes.

  5. Nick says:

    Hi,
    Thank you for your clear and useful post. I learned a lot from it.
    I’m applying your way to my own data-set but I was just wondering about calculating betweenness:
    “…(((vcount(gD) – 1) * (vcount(gD)-2)) / 2)”
    I cannot really figure out the reasons why you did this. Is it a common approach, in other words, should I apply it to my own data-set as well?

    Thank you again!

    • admin says:

      Yes, you can see more at https://en.wikipedia.org/wiki/Betweenness_centrality

      Here, I did it just for the illustration purposes.
      Whether or not you should apply it on your own data depends on what you want to evaluate. If it’s of interest for you to evaluate if some nodes are the key connectors in the network, in the sense that the shortest paths from all/many nodes to all/many other nodes pass through those connectors (e.g., are they transportation hubs?), then yes. Otherwise, probably not (but again, depends on which question you want to answer).

      Hope this helps!

Leave a Reply

Your email address will not be published. Required fields are marked *