Links are incredibly valuable for studying relationships in complex networks. Depending on the structure of the network, links can be analysed to evaluate important connections such as bridges and weak ties . Link analysis can be used to allude to significant connections between an interesting set of nodes.
Link analysis has many applications ranging from medical research to combatting fraud but perhaps its most valuable contribution can be found through the internet. The World Wide Web as we know it is essentially a large interconnected network of websites and hyperlinks. In this context, link analysis can be used to establish the importance of a website based upon the number of incoming hyperlinks which point to said page.
Using networkx, there are two main algorithms used to evaluate the links within a network: Pagerank and HITS. These defined as follows:
PageRank is based upon a variation of eigenvector centrality (we talked about this here) and is widely used on the internet. A little bit of history, PageRank was introduced by the founders of Google as a way of ranking web pages indexed on their servers. It was designed to measure the value of the website based upon how many other websites are linked to it as well as factoring in the probability of a user clicking on it.
If you’d like to know more about the algorithm, why not check out the Wikipedia article as goes into more depth than this blog post.
The HITS (Hyperlink-Induced Topic Search) algorithm is another technique used to rank webpages based on connected hyperlinks in a hub formation. The algorithm calculates two metrics to determine the value of a node: authority and hub. The authority is an estimation of the value of the sites’ content and hub, the estimated quality of its outgoing hyperlinks.
Of all the blog posts in this series, this has got to be one of the shortest of them all. The emphasis of this blog post is based around the World Wide Web and using hyperlinks and webpages to construct networks. I’d like to stress that while this approach is mostly used on the web, it can be used elsewhere.