Social networks are perhaps known for their very community-centric structures. This makes sense considering it is quite common for like-minded people to group together and form their own clusters. This in turn can produce some rather interesting network structures.
There are many ways in which communities can be detected and in this blog post, we introduce just a few algorithms that can get the job done effectively. Before we start, let’s get a general idea of what these algorithms do.
How does community detection work?
The simplest possible explanation is that community detection algorithms clusters nodes into non-overlapping groups. To understand it’s a little bit more let’s consider using an example. The image below demonstrates a simple undirected graph of seven nodes.
Just by purely looking at the example network, you can tell there are two obvious communities of users. Nodes A, B and C are one community and nodes E, F, G H form another. It was relatively easy for us to pick out the structures, however, our strategy for selecting which node belongs to which group may be different. There are several tactics in which we can use to make community detection a little easier. Here are a few suggestions…
- Remove weak ties : The edge between C and G serves a nature breaking point to separate the two groups.
- Preserve strong ties : Nodes A, B and C and nodes E, F, G and H are fully connected suggesting that they have strong ties with each other.
- Distance : The distance between a pair of notes may indicate whether or not they should be part of the same group.
Now that we’ve seen a few ways in which community detection can be performed, we can now begin to look at some of the implementations that are widely used by the networkx. Here are some of the main algorithms.
Girvan Newman is perhaps one of the most widely used community detection algorithms out there. It is also one of the most effective (if not, the most effective) algorithms available. It operates by removing edges that link to nodes with a high betweenness centrality. We talked a little bit about that in the previous post .
Using this community detection approach on Python is a little more involved than what we are used to.
>>> comp = nx.girvan_newman(G)n
As opposed to specifying the number of communities you want to find, the networkx implementation removes edges one by one on each iteration. In doing so, it progressively increases the number of communities it finds.
If you wanted to enumerate over the first five iterations, this can be achieved using the following code.
>>> import itertools >>> comp = nx.girvan_newman(G) >>> k = 5 >>> for communities in itertools.islice(comp, k): ... print(tuple(sorted(c) for c in communities))
More examples can be found here .
K-clique is a method for detecting highly connected groups of nodes in large networks. A clique is a subgraph (or partition) of a network where all the nodes are fully connected to each other. An example of cliques are shown above, highlighted as Group 1 and 2.
To use k-clique in networkx, we need to specify the size of the clique k
>>> cliques = nx.k_clique_communities(G, k)
Unlike other community detection methods, in order for the community to be found, a group of nodes need to form a fully connected clique before it is detected.
The general idea behind tree partitioning is that clusters of nodes are formed based upon a combination of node and edge weights. The only catch to this approach is that the network must be structured like a tree. In other words, it has a hierarchical structure. One of the most popular tree partitioning algorithms is known as Lukes partitioning.
Using networkx, the algorithm requires a cut-off point
max_size such that a partition must not exceed this size.
>>> tree = nx.lukes_partitioning(G, max_size)n
Modularity has made an appearance on this blog several times now and it is perhaps one of the easiest and most convenient ways to detect communities in networks. It is by no means the most accurate but is one of the fastest ways to get what you need. Generally, this approach is used to get a rough idea as to whether the network can be broken down into communities.
Using networkx, the
label_propagation_communities function is used to label communities using a label propagation approach to estimate which node belongs to which community. Modularity is then used to verify the result. Putting this all together looks like this…
>>> import networkx.algorithms.community as nx_comm ... >>> comms = nx_comm.label_propagation_communities(G) >>> nx_comm.modularity(G, comms)
In this blog post, we quickly went over some of the ways in which communities can be detected in social network structures. Each of the algorithms featured in this post are all very different and are required for very different tasks. Nonetheless, the job of finding communities is basically the same.
Maybe I should do a blog post looking at the effectiveness of each approach in terms of speed and accuracy.