Modelling Discussion Threads From Mastodon Timelines Using Python and Networkx
On Mastodon, users have the ability to reply to another user’s post by leaving comments as a response. This feature is designed to encourage users to engage and build connections with others. With this function, users can initiate a conversation and participate with others which forms what is known as a discussion thread.
From a social and network science perspective, discussion threads are useful for following a conversation and studying how the discussion builds over time as more people participate. Similar work has been done on this blog before using Reddit.
A discussion thread can be treated as a tree where the original post starts at the “root” and the preceding posts are represented as nodes which are connected by edges representing replies. An example of this looks like this…
The Code in Two Lines
Before starting, please read my previous post on scraping timelines as this code follows on from that blog post. Once that is done, modelling discussion threads from Mastodon can be achieved using the in_reply_to_id
attribute. To focus on the discussion threads only we need to filter the pandas
data frame for posts where this is not null.
df_edges = df[~df['in_reply_to_id'].isnull()]
Applying this filter will reduce the data frame down to only include posts that are replies. We can now build a reply tree directly off the data frame using the from_pandas_edgelist
function from networkx
specifying the source and target nodes which correspond to the id
of the status and parent (in_reply_to_id
) respectively.
G = nx.from_pandas_edgelist(df_edges, source='id', target='in_reply_to_id', create_using=nx.DiGraph)
As a result, these two lines of code turned a discussion thread into a reply tree, which can be visualised based upon the reply activity on a specific mastodon instance within the past few weeks.
Please forgive the poor design, as I’m sure there are more creative ways to display replies, but this was the best I can do for a proof-of-concept.
Conclusions
This blog post reveals how easy it is to generate a reply tree using data pulled from the Mastodon API. With this model, you can then begin to query certain things like discussion length and number of threads to suggest a few. This could also be used to provide the basis for understanding how the conversation develops over time.