Will They Reply? Analysing the Reply Networks of 32 Programming Language Subreddits

Have you ever used Reddit for learning a programming language? There are many subreddit detected to specific programming languages. They are great for finding project ideas, learning new topics and getting inspired. Many (if not most) programming subreddits have an active community of users who are willing to provide support to others who post submissions opening up discussions.

I’ve always wondered if it is possible to find out which subreddits are better than others in terms of user engagement and by how much. I don’t know about you, but I would like to know if there is a strong chance that someone is going to reply to my query. This is quite important as this can help both moderators and discussion starters alike understand how effective communities are at engaging with others. As it turns out, I feel there is a way and the answer is discovered through reply networks.

What are reply networks?

In this blog, we introduced the notion of Reddit reply networks where simple user-to-user interactions are captured based upon replies between users. The concept is simple. A directed edge is formed between two users where one user replies to the other.

Using reply networks can help us understand meaningful connections within the network. For example, reply networks provide a window for helping us understand conversational dynamics such as the following:

  • How often do users reply to each other? (reciprocity)

The Data

To get things going, 500 of the most recent submission were collected from a total of 32 programming-related subjects. A few of these include Go, Python, Java, JavaScript, C++ and many others.

For each subreddit, a reply network was produced by aggregating all of the comments from the total 500 recent submissions. If you’re interested in how these are created, I put together a blog post explaining how this is done using PRAW.

The subreddit reply networks have a combined average of 1185 nodes and 3410 edges. The complete data can be found at the end of the post.

The Results

As a result, the reply networks used in this study are quite complex as they feature many interactions. Below is an example of the r/python subreddit where nodes are coloured according to modularity and size according to eigenvector centrality.

  • Reciprocity: Helpful users are likely to reply to questions meaning that two-way reciprocated connections are important for discussion
  • Density: Used to understand how well-connected discussion is.
  • Transitivity: Small communities of users (triads) are helpful for discerning strong conversations among multiple users.

Reciprocity

As mentioned earlier, reciprocity is one of the most important metrics for determining two-way conversations. The results from the study indicate that r/racket, r/matlab, r/visualbasic and r/Rlanguage are among the most highly-ranked subreddits for reciprocity.

With respect to density, r/forth, r/Delphi r/perl and r/d_language had the highest proportion of occupied edges within the network. Bearing in mind these numbers are quite small (which is often expected for density). There is a strong likelihood that these communities are much smaller than the popular ones like r/Python.

Similar to density, r/forth r/perl and r/Delphi right quite highly for transitive ties which suggests that you’re more likely to find triad-like communities in these subreddits than elsewhere. These are big clues for detecting cliques of users.

Overall I believe that the results speak for themselves. If I’m honest, I wasn’t expecting that the lesser-known subreddits such as r/forth and Delphi would have such strong results compared to the popular ones such as r/Python or r/JavaScript etc. I think this is down to a few reasons which I’ve summarised below…

Small communities

As mentioned earlier, it appears to be the smaller communities that appear to be the most popular. This makes sense considering that if you reduce the size of the community there is a strong possibility that you’re going to engage with the same user meaning that reciprocated ties and transitivity is going to be quite high.

Network design

The design of the network may have an impact on the results based upon how the discussions are modelled. Considering that we are collapsing hierarchical discussion trees as user-to-user interactions, there is a possibility that we may be missing important data which could allude to different types of conversation. For example, a reply network doesn’t consider debates between users whereas a reply tree would show the depth of the discussion.

Limited engagement

I remember reading somewhere how a very small subset of users actually engage with content produced on Reddit. A very small percentage of users actually contribute towards leaving meaningful replies for users with questions. I think this might be a factor to consider when studying these networks.

As an experiment, I thought that these results are interesting, but I think it’s important to keep an open mind on how we model these networks going ahead. After all, this is how science advances forward.

Subreddits

If you’re interested in the numbers and the subreddits used, these are as follows…

Subreddit No. Nodes No. Edges Density Reciprocity Transitivity
r/racket 452 1368 0.006711 0.589181 0.072255
r/matlab 749 1627 0.002904 0.583897 0.021305
r/visualbasic 519 1761 0.006550 0.579216 0.029903
r/Rlanguage 794 2003 0.003181 0.574139 0.018270
r/scheme 501 1604 0.006403 0.566085 0.065525
r/forth 417 1781 0.010267 0.563728 0.115033
r/delphi 373 1144 0.008245 0.552448 0.072368
r/ocaml 564 1635 0.005149 0.539450 0.032446
r/asm 734 1843 0.003426 0.538253 0.025814
r/fortran 790 2532 0.004062 0.526066 0.031309
r/d_language 384 1057 0.007187 0.524125 0.066078
r/lisp 903 3886 0.004771 0.513124 0.066695
r/rstats 1040 2357 0.002181 0.509122 0.022743
r/perl 615 3094 0.008194 0.504848 0.101755
r/clojure 730 2125 0.003993 0.491294 0.040505
r/latex 963 2417 0.002609 0.489863 0.016770
r/lua 810 2389 0.003646 0.489745 0.022349
r/haskell 1315 4939 0.002858 0.484714 0.041025
r/erlang 483 913 0.003922 0.484118 0.025435
r/fsharp 613 2087 0.005563 0.482990 0.065214
r/Kotlin 1170 2850 0.002084 0.480702 0.028571
r/sql 1417 3349 0.001669 0.472977 0.018622
r/ruby 940 2322 0.002631 0.472007 0.036358
r/scala 990 3359 0.003431 0.465019 0.038037
r/c_programming 1575 5069 0.002045 0.464786 0.031794
r/swift 1134 2382 0.001854 0.464316 0.015414
r/rust 2348 6125 0.001111 0.463020 0.023427
r/golang 1849 4245 0.001242 0.457008 0.018304
r/python 2370 4101 0.000730 0.453548 0.031596
r/php 2542 9350 0.001448 0.449198 0.050137
r/csharp 2137 5346 0.001171 0.436214 0.021638
r/cpp 2938 10521 0.001219 0.422393 0.028237
r/java 2458 9054 0.001499 0.412194 0.047729
r/javascript 2668 5311 0.000746 0.378836 0.019625