In the last blog post , we looked at simple metrics to describe the overall structure of a network. Most of what we’ve introduced in the series so far focuses on structures and metrics of the entire network however, in this post we will look at things from a different perspective.
In this post, we will be introducing the concept of centrality and its utility for understanding nodes within the network. We will cover what centrality actually means, why we use it and how it can be used within networkx.
What exactly is centrailty?
If you recall from the previous post, we introduced many different ways in which we can analyse a network. One of these involved looking at the in / out-degree.
Degree is a good example of centrality as it can be used to assess a node’s position within a network relative to others based upon the number of connections it has with others.
Why do we use centrality?
Centrality is used for a number of reasons but its main role is for ranking the value of nodes in a network. Here are a few examples of where centrality is used.
In a social network, where nodes represent people, centrality can be used to represent the popularity and influence of a user with respect to others. This may be in an online or offline setting where connections can represent many different things such as friendship or communication.
World Wide Web / Internet
The World Wide Web (if people still call it that these days) is a network where nodes represent websites and hyperlinks represent edges. There are various algorithms (one of which will be mention later) Where centrality is used to indicate the importance of a website based upon the number of links it receives from external websites.
This may not be as obvious but transportation is another example of a network. In this case, nodes represent things like bridges, stations and towns and the edges are the things that connect them e.g. roads. Centrality can be used to value the importance of a particular piece of infrastructure and how it is used by vehicles.
How can it be used?
There are many different ways in which centrality can be performed. In this blog post, we will look at just a few. As usual, the networkx implementation will be included too.
Degree centrality is perhaps the most simple measure of them all. Based upon what we learnt previously, degree centrality is based upon the number of connections (ingoing and outgoing if directed) of a node.
The calculation is based upon the ratio between a node’s degree divided by the maximum degree found in the network. In this case, nodes with many connections rank higher than others.
While this solution works to a certain extent, the ranking of a node is based upon a node’s immediate connections (its neighbours) without considering links beyond this.
Eigenvector centrality is a little harder to explain but, in short, its value is based upon the centrality of nodes it’s connected to. This calculation is performed based upon the adjacency matrix ( click here if you don’t know what an adjacency matrix is) where eigenvalues are extracted.
Eigenvector centrality works well for directed networks when nodes point to other nudes. This has an effect where if a group of nodes (with high centrality values) point to the same node (the target), it too will also have a high centrality measure. It’s almost like the spread of influence where one node having an effect on the other.
Unlike degree centrality, with this approach, the entire network is factored into the ranking of nodes. This comes at a cost of speed depending on the size of the network.
Betweenness and Closeness
Betweenness and closeness are two popular approaches for calculating the centrality of nodes using alternative techniques.
Betweenness centrality is based upon the number of shortest paths that a given node is a part of within the network as a whole. In other words, centrality, in this case, provides a measure for how important a node is based upon how often it appears to link other nodes together. For example, if it were a communication network, nodes with a high betweenness centrality are important as they carry more information between users – almost like an exchange point.
Here are just a few implementations in networkx
nx.betweenness_centrality(G): the basic implementation form betweenness centrality
nx.communicability_betweenness_centrality(G): uses triads (nodes in groups of 3) to simulate walks connecting between nodes.
As the name suggests, closeness is based upon how close a node is to other nodes in the network. Closeness centrality is based upon the sum of the distance between the target node and all other remaining nodes in the networks, by means of the shortest possible paths. A node with a high closeness centrality suggests that it’s closely knit to as many nodes a possible giving the impression that it’s more central.
A few implementations of these include…
nx.closeness_centrailty(G): used to compute the closeness centrality using the basic implementation.
nx.current_flow_closeness_centraility(G): also known as information centrality.
The approaches mentioned above are some of the most popular techniques in the widely used. However, there are a few other techniques that are lesser-known and deserve an honourable mention. Without going into too much detail these include the following…
nx.load_centraility(G): this approach uses a variation of betweenness centrality but the factors in weighted edges
nx.subgraph_centraility(G): uses subgraphs (a subset of notes within the same network) to detect the number of cycles that a node belongs to.
nx.harmonic_centraility(G): very similar to between the centrality but the algorithm for determining the shortest paths is different
There are many different ways in which centrality can be used to understand more about the nodes within a network. In this blog post, we mentioned some of the most popular solutions along with some other techniques which proved to be just as effective. Furthermore, we have also shown how to use centrality as a technique for profiling individual nodes within a complex network.