Finding New Followers on Mastodon Using Python and Network Science

I’m still relatively new to Mastodon and, so far, I love it. Ever since Twitter started going down hill (for reasons which I won’t cover yet again) I have used Mastodon as my go-to microblogging platform of choice.

I’ve been on Mastodon for a couple of months now, and I’m on the lookout for new people to follow. I want to follow people who have interests similar to mine based on the content / topics I promote.

Using User-Hashtag Networks

In a separate blog post, I demonstrated that it is possible to build a network of users and hashtags where a connection was formed if the user featured a hashtag in their status. This provided the basis of understanding a user’s use of hashtags and what hashtags they have in common with others (mutual connections). In another blog post, I applied these networks to #ClimateChange to see if it is possible to discover other interesting hashtags around this topic based upon co-occurrences.

This got me thinking, could I use a similar approach for finding new people to follow on Mastodon? The answer to this is sort of. The reason for this is that I tested it on my own profile and, considering that I’m still relatively new to the platform and therefore haven’t posted much, I don’t have much data to provide.

Collecting Profile Hashtags

To get things going, I modified a Python script to only collect hashtags which I have used across all the statuses I’ve ever published on my Mastodon profile. I’ve covered how to do this in a blog post about using Python to scrape Mastodon timelines.

Using this technique, I modified the Python script to only focus on hashtags that were embedded in the status and stored it as a pair in n undirected graph with the help of networkx.

...
for t in toots:
    user = t['account']['acct']
    for tag in t['tags']:
        timestamp = pd.Timestamp(t['created_at'])
        tag = f"#{tag['name']}"
        hashtags.add(tag)
            
        G.add_edge(user, tag, timestamp=timestamp)
...

As a result, this produced a graph centred around a single user profile (me) along with the hashtags that I used. In my case, this is what it looks like using my profile as an example…

Example of a user-hashtag network centred around a single user.

Searching for Users With Hashtags

With the hashtags shown above, the next step is to find users who have also posted using these hashtags. This again was fairly straightforward to do, and is something which I’ve covered before.

The general idea is that for each of those hashtags, I find statuses which feature the hashtag and make a record by storing it as an edge in my network with the user and hashtag.

I didn’t want to perform an exhaustive search, so I limited the results to the most recent 40 (the maximum value with the Mastodon API) otherwise, this could go on for a while.

Much like before, this was done with the following code…

...
for t in toots:
    timestamp = pd.Timestamp(t['created_at'])
    user = t['account']['acct']
        
    if '@' not in user:
        user = f'{user}@{instance}'
    G.add_edge(user, tag, timestamp=timestamp)
...

Even with this the 40 status limit, this produced a total of 649 users! That’s quite a few considering that I used very few hashtags!

The figure below is what the final network looked like. Users are coloured in blue and their hashtags are marked in yellow.

Filtering for Matches

Now, it’s fair to say that of those 649 total users, there will be some users that will have very little in common (e.g. only one shared hashtag). For this reason, we need to find a way of measuring and ranking users that are similar in terms of their shared hashtags.

To do this, I used the Jaccard index to compute the similarity between myself and the user based upon the number of mutual connections. This is calculated by determining the number of hashtags in common (intersection) divided by the number of total hashtags (union). The closer the Jaccard index is to 1, the greater the similarity between the two users.

By ranking all users according to the Jaccard index, I discovered that the highest score for a user is 0.222. To make things a little easier to manage, I will only take the top five users ranked according to their index. Without revealing their username, this is what final network looks like…

Conclusions

In this blog post, I showed you how I made use of basic principles taken from network science to find potential users of interest to follow based upon mutual hashtags. While this example used a relatively small dataset, this serves as a proof of concept of what could be used on a larger scale for recommending new people to follow.

As I mentioned before, I thought I’d test this out on my own profile, but my only flaw is that I haven’t used enough hashtags to produce reliable findings. The results meant that it was harder to find possible matches and that more hashtags were needed.

Do you want to see more of this sort of stuff? Why not follow me on Mastodon @jrashf.