Finding Subgraphs and Triads in Networks
Before we begin, a small warning. This particular post is of particular relevance to me as my main area of academic research is based on the use of subgraphs in social networks. Because I know a lot about this topic, there is a risk that I could just go on and on about it without really explaining it in simple terms. For the sake of everyone sanity, I will do my best to keep things short and sweet.
What are subgraphs?
A formal definition of subgraphs can be taken from Wikipedia :
A subgraph of a graph G is another graph formed from a subset of the vertices and edges of G . The vertex subset must include all endpoints of the edge subset, but may also include additional vertices.
Wikipedia editors
In other words, a subgraph is a smaller portion of a graph that can be found in a much larger graph. A subgraph can take the form of anything but often enough, subgraphs are a useful way for looking for a specific type of structure or group of nodes. This is where the next point comes in.
Why are subgraphs useful?
Subgraphs are useful as they give us a smaller window into the structures is embedded within the network. Subgraphs are essentially the building bricks of larger networks. Knowing this gives us a window into how the network is constructed. For example, the subgraph highlighted above is known as a feed-forward formation. This particular substructure is very important within the social science world as it resembles the “enemy of my enemy is my friend” type of interaction otherwise known as indirect reciprocity .
How are subgraphs extracted?
There are many ways in which subgraphs can be extracted but the most common approach is to count them according to their distinct structure. The general idea is that a subgraph is determined by the number of nodes features within it ranging anything from 2 to 100’s. For the sake of simplicity, it’s usually conventional to stick to a small number such as three – hence why triads are mentioned.
What are the challenges?
Finding subgraphs is what we call in the academic world an “NP-complete” problem (see subgraph isomorphism problem for more). In other words, the general rule is, the larger the number of nodes featured in a subgraph, the greater the number of combinations form. This is a really important concept to grasp as it will determine how long the analysis will take to perform. Below is a table that shows how many subgraphs can be found with a particular node size. As you can see it doesn’t scale well.
No. Nodes | No. Combinations (undirected) | No. Combinations (directed) |
---|---|---|
3 | 2 | 13 |
4 | 6 | 199 |
5 | 21 | 9,364 |
6 | 112 | 1,530,843 |
Implementations
networkx provides an easy to use method for counting the number of distinct triads in a network. In order for this to work, a network is required to have directed edges so triads can be extracted. As an example, counting each of the subgraphs can be achieved using a simple random graph generator with the following…
>>> G = nx.fast_gnp_random_graph(n=50, p=0.15, directed=True)
>>> nx.triadic_census(G)
{'003': 7538,
'012': 7896,
'102': 636,
'021D': 647,
'021U': 662,
'021C': 1375,
'111D': 220,
'111U': 217,
'030T': 227,
'030C': 79,
'201': 17,
'120D': 23,
'120U': 21,
'120C': 35,
'210': 7,
'300': 0}
So what exactly does this return? The data returned from the triadic census produces a dictionary keyed by a distinct triad. I won’t go into the reasoning of why the triads are named as they are but the codes can be explained here along with a visual lookup. Each triad is mapped to how many times it was found within the network.
Applications
There are many reasons why subgraphs are an important concept to grasp within network science. Here are just a few examples of how subgraphs can be used to analyse complex networks
Motifs
Motifs are under or over-represented subgraphs present within a network. The general idea is that subgraphs are counted and then offset by a random baseline to discover which particular structures are significant to that network. This approach has origins in biological systems to study statistically significant substructures within neural circuits however, recently this approach has been widely used to study online social networks including my own research.
Cliques
As we’ve mentioned before, cliques are an important part of a network as they give us an idea of how nodes are clustered. Subgraph analysis can be used to find very specific formations which may result in finding more cliques. As an example, the triad identified by the code 300
represents a fully connected subgraph of nodes.
Path Finding
Subgraph analysis can also be used to find links and chains between nodes within a network. This is particularly useful if you’re interested in finding simple linear connections or cycles among a group of nodes. The substructures give insight towards understanding how a connection can be made from node A to B.
Conclusion and Final Thoughts
I cannot overstate the fact that subgraphs are a really important component of any form of network analysis. Most of the things covered in this guide so far refer to global metrics (by that I mean metrics that represent the entire network). Using this approach we look at things on a smaller scale and it is the basis for understanding how a network is structured.