Building User-Hashtag Co-occurrence Network from Timelines on Mastodon

Hashtags are an incredibly useful feature for annotating posts and have an active role on microblogging platforms such as Twitter and Mastodon. They are particularly helpful for discovering new people to follow and finding interesting posts.

In a separate post, I used Twitter to build hashtag co-occurrence networks based upon a specific hashtag. The general idea was that given a specific hashtag, how are they used with other hashtags, and how can they be used in detecting communities?

As mentioned previously, hashtag co-occurrence networks are useful for modelling the similarity between certain hashtags in terms of their usage with others. In this case, a bipartite network is used to map users to the hashtags they have used previously. This could then be used to find like-minded users based on mutual hashtags and discover clusters of hashtags which co-appear together.

If you want to know more about the theory of hashtag co-occurrence networks, check out this post.

Given Mastodon’s sudden surge in popularity, I thought I would make a new post covering how this can be done using Python and the Mastodon API based on a public timeline of a specific instance. Most of the code used in this post is explained in a separate post about scraping Mastodon timelines.

Code

In this post, a user-hashtag co-occurrence network is built based upon public activity on mastodonapp.uk within the past six hours. This was achieved using the following Python code which makes use of requests, json and pandas.

In this example, unlike the previous, the timeline of the local instance is scraped as opposed to the federated timeline by setting the local URL parameter to True. This was done to ensure that topics are relevant to a specific instance (mastodonapp.uk, in this case) and to make sure that the data was a manageable size.

If there were any hashtags that were features in the post t, they were added to network, the list of results along with the accompanying username, timestamp and ID of the post.

Example

Using the code provided, this produced a graph with a total of 788 nodes and 761 edges as shown below…

An example of a hashtag co-occurrence network based on activity from mastodonapp.uk

Hashtags are represented by yellow nodes and users as blue nodes. An edge between them indicates that the user posted the hashtag. Nodes are sized according to degree (number of connected edges) to estimate the popularity of a hashtag.

Conclusions

These networks are really useful if you want to understand how hashtags are used by others. They provide a certain level of context and can be used to discover similar users / topics based upon mutual connections. In this blog post, we focused on just one Mastodon instance however, this could be expanded to include all Mastodon instances within the fediverse or just a specific instance in particular.