How to Build Tag Clouds From Mastodon Hashtags
Hashtags are an important part of microblogging and are used to reach a wider audience of interested people. Much like Twitter, hashtags are also widely used on Mastodon and it’s common for users to include as many hashtags as possible in their posts to maximum the audience reach. Also, as shown in a previous post, hashtags co-occur with others as a user can tweet / toot posts with multiple hashtags.
Introducing Tag Clouds
To get a basic high-level overview of what hashtags are used with others, a tag cloud (also known as a word cloud) can be used to visually depict similar terms. Each word / tag can be colour coordinated and sized according to their frequency (largest, as most frequent). This technique is used to get a feel for what people are talking about with respect to a particular topic or hashtag.
This blog post covers the basics for generating tag clouds using Mastodon hashtags with the help of wordcloud
– a simple Python package for generating fancy tag cloud visualisations. This process involves scraping a hashtag timeline centred around a single hashtag of interest. For the purposes of this tutorial, I will keep things simple and use #coffee as the “seed” hashtag because, well, who doesn’t like coffee.
The Code
To get started, you’ll need to make sure that the wordcloud
package is installed. It’s as simple as …
pip install wordcloud
Using code taken from a previous blog post for scraping Mastodon timelines, we’ll need to begin with a few imports and variables.
Set up and Initialisation
# For scraping
import json
import requests
import time
import pandas as pd
# For visualisation
import matplotlib.pyplot as plt
from wordcloud import WordCloud
# Set tag name
tag = "coffee"
# Set instance domain
instance = #mastodon.social"
# URL and parameters
URL = f'https://{instance}/api/v1/timelines/tag/{tag}'
params = {
'limit': 40
}
...
Scraping Mastodon
With the package is imported, and the variable is defined, we can now begin the process of scraping. In this case, we are going through the most recent 24 hours of toots and storing all the hashtags in one big global list.
...
# Store hashtags
hashtags = []
# Set time limit
since = pd.Timestamp('now', tz='utc') - pd.DateOffset(hours=24)
is_end = False
while True:
r = requests.get(URL, params=params)#, headers=headers)
try:
toots = json.loads(r.text)
except Exception as e:
print(e)
print(r.text)
break
if len(toots) == 0:
break
for t in toots:
timestamp = pd.Timestamp(t['created_at'], tz='utc')
if timestamp <= since:
is_end = True
break
# Collect all hashtags and append to list
tags = [f"#{ht['name']}" for ht in t['tags']]
hashtags.extend(tags)
if is_end:
break
max_id = toots[-1]['id']
params['max_id'] = max_id
time.sleep(1)
...
Now that all the hashtags have been collected, we can start building our tag cloud. In order to do this, we need to reconstruct the list of hashtags as a string as if it were structured as a continuous sentence.
...
hashtags_str = ' '.join(hashtags)
...
Building (Hash)Tag Clouds
We can now build the tag cloud with some help from matplotlib
. Feel free to adjust the width
, height
and background_color
to your liking.
...
wordcloud = WordCloud(width=1600, height=800, background_color='white').generate(hashtags_str)
plt.figure(figsize=(20,10))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()
Using #coffee as my starting point, this is what I got in return.
It’s interesting to see all the different hashtags which emerge from a single hashtag. It looks like #caffeine and #goodmoring frequently appear alongside #coffee.
Similarly, I thought I would try #twitter as a seed hashtag to see what comes up. This always appears to be a taking point on Mastodon.
As expected, hashtags associated with Elon Musk appear quite a lot with a few relating to the ongoing #twitterexodus.
Conclusion
Overall, this blog post provides a basic overview for building simple tag clouds from Mastodon hashtags as a technique for finding similar hashtags and to build a bigger picture with regard to context. Moving forward, this code could be modified with different features. For example, hashtags could be colour-coordinated according to average sentiment and tag clouds could be generated according to what hashtags are trending on a given instance. There’s a lot to play with so feel free to come up with something creative.