As you may have seen from some of the examples presented on this blog, social networks can vary in size from the simplest of interactions to full-on large-scale complex networks. This can pose a bit of a challenge for those wanting to understand what is going on in such a massive network.
Thankfully there is a solution to this and it’s known as core analysis – core analysis. Also known as degeneracy, this is the process of decomposing a large network down into its simplest form.
The most widely used algorithm is known as k-core which involves the process of repeatedly removing various edges from a node such that in order for the node to be present in the network, it requires a minimum number of edges k (its degree). In the context of this approach, the value of k represents the order of the core.
Here is a simple example. Let’s create a network composed of 10 nodes where the edges are formed with the probability of 0.4.
>>> G = nx.fast_gnp_random_graph(n=10, p=0.4)
As you can see this produces a fairly simple network where all nodes are linked to each other. By applying k-core where k= 2, we get the following result.
>>> G_k = G.copy()
>>> G_k = nx.k_core(G_k, k=2)
Hopefully, you can begin to see why k-core is a useful way of reducing the size of a graph. For much larger graphs value of k can get into its hundreds or thousands.
How is this different from community detection?
At this point, you might be thinking “how exactly is this any different to community detection?”. “I mean they’re basically looking at smaller groups of nodes, right?”. While this is true, k-core is designed for reducing the size and for examining the highly connected ‘core’ whereas community detection can be considered a form of clustering and/or grouping.
A rather fun analogy would be to think of k-core as like an onion. Increasing the value of k is like removing layers of an onion until you get to the most centralised/valuable part of the product.
While k-core is one of the most used and recognised algorithms to get to the core of the network, there are other alternatives that are listed below.
k-shell is somewhat similar to k-core however, k-shell is primarily used as a ranking method for the degree of nodes. The general idea is that a network can have shells ranking from k=1 to infinity. Where k equals one represents nodes which only have one edge connected to them. It’s more about extracting layers out of a network rather than a single component.
k-crust involves performing k-core however, rather than taking the core of a network at k , the discarded nodes and edges are attend instead. Essentially, it returns the outer components of the core known as the ‘crust’.
>>> nx.k_corona(G, k)
Don’t worry, this has got nothing to do with coronavirus or COVID-19 😉
k-corona is a method for extracting nodes within a core that have exactly k connections with neighbouring nodes. This returns all nodes and edges which satisfy the criteria.
In this blog post, we featured a very quick overview of how and why k-core analysis is used within large-scale networks. Additionally, we went through a select few algorithms which inherit some of the features from k-core and to be used in a similar manner. k-core is not exactly a straightforward topic to describe but it’s a really useful method for understanding the most valuable components of highly interactive networks.