Have you ever used Reddit for learning a programming language? There are many subreddit detected to specific programming languages. They are great for finding project ideas, learning new topics and getting inspired. Many (if not most) programming subreddits have an active community of users who are willing to provide support to others who post submissions opening up discussions.
What are reply networks?
In this blog, we introduced the notion of Reddit reply networks where simple user-to-user interactions are captured based upon replies between users. The concept is simple. A directed edge is formed between two users where one user replies to the other.
Using reply networks can help us understand meaningful connections within the network. For example, reply networks provide a window for helping us understand conversational dynamics such as the following:
- Who replies the most? (out degree)
- Who receives the most replies? (in degree)
- Who are the most influential commenters? (node centrality)
- How often do users reply to each other? (reciprocity)
For each subreddit, a reply network was produced by aggregating all of the comments from the total 500 recent submissions. If you’re interested in how these are created, I put together a blog post explaining how this is done using PRAW.
The subreddit reply networks have a combined average of 1185 nodes and 3410 edges. The complete data can be found at the end of the post.
As a result, the reply networks used in this study are quite complex as they feature many interactions. Below is an example of the r/python subreddit where nodes are coloured according to modularity and size according to eigenvector centrality.
- Reciprocity: Helpful users are likely to reply to questions meaning that two-way reciprocated connections are important for discussion
- Density: Used to understand how well-connected discussion is.
- Transitivity: Small communities of users (triads) are helpful for discerning strong conversations among multiple users.
As mentioned earlier, reciprocity is one of the most important metrics for determining two-way conversations. The results from the study indicate that r/racket, r/matlab, r/visualbasic and r/Rlanguage are among the most highly-ranked subreddits for reciprocity.
With respect to density, r/forth, r/Delphi r/perl and r/d_language had the highest proportion of occupied edges within the network. Bearing in mind these numbers are quite small (which is often expected for density). There is a strong likelihood that these communities are much smaller than the popular ones like r/Python.
Similar to density, r/forth r/perl and r/Delphi right quite highly for transitive ties which suggests that you’re more likely to find triad-like communities in these subreddits than elsewhere. These are big clues for detecting cliques of users.
As mentioned earlier, it appears to be the smaller communities that appear to be the most popular. This makes sense considering that if you reduce the size of the community there is a strong possibility that you’re going to engage with the same user meaning that reciprocated ties and transitivity is going to be quite high.
The design of the network may have an impact on the results based upon how the discussions are modelled. Considering that we are collapsing hierarchical discussion trees as user-to-user interactions, there is a possibility that we may be missing important data which could allude to different types of conversation. For example, a reply network doesn’t consider debates between users whereas a reply tree would show the depth of the discussion.
I remember reading somewhere how a very small subset of users actually engage with content produced on Reddit. A very small percentage of users actually contribute towards leaving meaningful replies for users with questions. I think this might be a factor to consider when studying these networks.
As an experiment, I thought that these results are interesting, but I think it’s important to keep an open mind on how we model these networks going ahead. After all, this is how science advances forward.
If you’re interested in the numbers and the subreddits used, these are as follows…
|Subreddit||No. Nodes||No. Edges||Density||Reciprocity||Transitivity|