How to Build Tag Clouds From Mastodon Hashtags

How to Build Tag Clouds From Mastodon Hashtags

Hashtags are an important part of microblogging and are used to reach a wider audience of interested people. Much like Twitter, hashtags are also widely used on Mastodon and it’s common for users to include as many hashtags as possible in their posts to maximum the audience reach. Also, as shown in a previous post, hashtags co-occur with others as a user can tweet / toot posts with multiple hashtags.

Introducing Tag Clouds

To get a basic high-level overview of what hashtags are used with others, a tag cloud (also known as a word cloud) can be used to visually depict similar terms. Each word / tag can be colour coordinated and sized according to their frequency (largest, as most frequent). This technique is used to get a feel for what people are talking about with respect to a particular topic or hashtag.

This blog post covers the basics for generating tag clouds using Mastodon hashtags with the help of wordcloud – a simple Python package for generating fancy tag cloud visualisations. This process involves scraping a hashtag timeline centred around a single hashtag of interest. For the purposes of this tutorial, I will keep things simple and use #coffee as the “seed” hashtag because, well, who doesn’t like coffee.

The Code

To get started, you’ll need to make sure that the wordcloud package is installed. It’s as simple as …

pip install wordcloud

Using code taken from a previous blog post for scraping Mastodon timelines, we’ll need to begin with a few imports and variables.

Set up and Initialisation

# For scraping
import json
import requests
import time
import pandas as pd

# For visualisation
import matplotlib.pyplot as plt
from wordcloud import WordCloud

# Set tag name
tag = "coffee"

# Set instance domain
instance = #mastodon.social"

# URL and parameters
URL = f'https://{instance}/api/v1/timelines/tag/{tag}'
params = {
    'limit': 40
}
...

Scraping Mastodon

With the package is imported, and the variable is defined, we can now begin the process of scraping. In this case, we are going through the most recent 24 hours of toots and storing all the hashtags in one big global list.

...
# Store hashtags
hashtags = []
# Set time limit
since = pd.Timestamp('now', tz='utc') - pd.DateOffset(hours=24)
is_end = False
while True:
    r = requests.get(URL, params=params)#, headers=headers)
    try:
        toots = json.loads(r.text)
    except Exception as e:
        print(e)
        print(r.text)
        break
    if len(toots) == 0:
        break
    
    for t in toots:
        timestamp = pd.Timestamp(t['created_at'], tz='utc')
        if timestamp <= since:
            is_end = True
            break
        
        # Collect all hashtags and append to list
        tags = [f"#{ht['name']}" for ht in t['tags']]
        hashtags.extend(tags)
    
    if is_end:
        break
    
    max_id = toots[-1]['id']
    params['max_id'] = max_id
    
    time.sleep(1)
...

Now that all the hashtags have been collected, we can start building our tag cloud. In order to do this, we need to reconstruct the list of hashtags as a string as if it were structured as a continuous sentence.

...
hashtags_str = ' '.join(hashtags)
...

Building (Hash)Tag Clouds

We can now build the tag cloud with some help from matplotlib. Feel free to adjust the width, height and background_color to your liking.

...
wordcloud = WordCloud(width=1600, height=800, background_color='white').generate(hashtags_str)
plt.figure(figsize=(20,10))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()

Using #coffee as my starting point, this is what I got in return.

A simple tag cloud based upon Mastodon toots which mention #coffee and others

It’s interesting to see all the different hashtags which emerge from a single hashtag. It looks like #caffeine and #goodmoring frequently appear alongside #coffee.

Similarly, I thought I would try #twitter as a seed hashtag to see what comes up. This always appears to be a taking point on Mastodon.

As expected, hashtags associated with Elon Musk appear quite a lot with a few relating to the ongoing #twitterexodus.

Conclusion

Overall, this blog post provides a basic overview for building simple tag clouds from Mastodon hashtags as a technique for finding similar hashtags and to build a bigger picture with regard to context. Moving forward, this code could be modified with different features. For example, hashtags could be colour-coordinated according to average sentiment and tag clouds could be generated according to what hashtags are trending on a given instance. There’s a lot to play with so feel free to come up with something creative.