Modelling Discussion Threads From Mastodon Timelines Using Python and Networkx

On Mastodon, users have the ability to reply to another user’s post by leaving comments as a response. This feature is designed to encourage users to engage and build connections with others. With this function, users can initiate a conversation and participate with others which forms what is known as a discussion thread.

From a social and network science perspective, discussion threads are useful for following a conversation and studying how the discussion builds over time as more people participate. Similar work has been done on this blog before using Reddit.

A discussion thread can be treated as a tree where the original post starts at the “root” and the preceding posts are represented as nodes which are connected by edges representing replies. An example of this looks like this…

A simple example of a reply tree where the root “initiator” node is shown in red.

The Code in Two Lines

Before starting, please read my previous post on scraping timelines as this code follows on from that blog post. Once that is done, modelling discussion threads from Mastodon can be achieved using the in_reply_to_id attribute. To focus on the discussion threads only we need to filter the pandas data frame for posts where this is not null.

df_edges = df[~df['in_reply_to_id'].isnull()]

Applying this filter will reduce the data frame down to only include posts that are replies. We can now build a reply tree directly off the data frame using the from_pandas_edgelist function from networkx specifying the source and target nodes which correspond to the id of the status and parent (in_reply_to_id) respectively.

G = nx.from_pandas_edgelist(df_edges, source='id', target='in_reply_to_id', create_using=nx.DiGraph)

As a result, these two lines of code turned a discussion thread into a reply tree, which can be visualised based upon the reply activity on a specific mastodon instance within the past few weeks.

In this example reply trees mostly look like simple chain-like connections with a few splitting off into separate threads.

Please forgive the poor design, as I’m sure there are more creative ways to display replies, but this was the best I can do for a proof-of-concept.

Conclusions

This blog post reveals how easy it is to generate a reply tree using data pulled from the Mastodon API. With this model, you can then begin to query certain things like discussion length and number of threads to suggest a few. This could also be used to provide the basis for understanding how the conversation develops over time.