On Mastodon, users have the ability to reply to another user’s post by leaving comments as a response. This feature is designed to encourage users to engage and build connections with others. With this function, users can initiate a conversation and participate with others which forms what is known as a discussion thread.
From a social and network science perspective, discussion threads are useful for following a conversation and studying how the discussion builds over time as more people participate. Similar work has been done on this blog before using Reddit.
A discussion thread can be treated as a tree where the original post starts at the “root” and the preceding posts are represented as nodes which are connected by edges representing replies. An example of this looks like this…
The Code in Two Lines
Before starting, please read my previous post on scraping timelines as this code follows on from that blog post. Once that is done, modelling discussion threads from Mastodon can be achieved using the
in_reply_to_id attribute. To focus on the discussion threads only we need to filter the
pandas data frame for posts where this is not null.
df_edges = df[~df['in_reply_to_id'].isnull()]
Applying this filter will reduce the data frame down to only include posts that are replies. We can now build a reply tree directly off the data frame using the
from_pandas_edgelist function from
networkx specifying the source and target nodes which correspond to the
id of the status and parent (
G = nx.from_pandas_edgelist(df_edges, source='id', target='in_reply_to_id', create_using=nx.DiGraph)
As a result, these two lines of code turned a discussion thread into a reply tree, which can be visualised based upon the reply activity on a specific mastodon instance within the past few weeks.
Please forgive the poor design, as I’m sure there are more creative ways to display replies, but this was the best I can do for a proof-of-concept.
This blog post reveals how easy it is to generate a reply tree using data pulled from the Mastodon API. With this model, you can then begin to query certain things like discussion length and number of threads to suggest a few. This could also be used to provide the basis for understanding how the conversation develops over time.