My Initial Thoughts on Lemmy – A Reddit Alternative for the Fediverse

I’m a huge fan of Reddit. In fact, I use it every day. There’s something about the community-oriented structure of the platform which makes it easy to reach like-minded people and to discuss things which are of interest to me. As of April 2023, Reddit receives around 1.7 billion visits per month however, recent changes … Read more

How to Find Datasets: Top 5 Popular Data Repositories in 2022

Finding the ideal dataset is not always an easy job. One option may include putting together your own dataset to suit the needs of others and yourself using techniques such as web and API scraping. This might be necessary depending on your circumstances however, it does come at a cost. Collecting data of your own … Read more

Will They Reply? Analysing the Reply Networks of 32 Programming Language Subreddits

Have you ever used Reddit for learning a programming language? There are many subreddit detected to specific programming languages. They are great for finding project ideas, learning new topics and getting inspired. Many (if not most) programming subreddits have an active community of users who are willing to provide support to others who post submissions … Read more

Creating Reply Networks from Reddit Comment Threads

In the previous blog post , we learned how to use PRAW to scrape and process data from Reddit. We finished off by looking at how to collect top-level comments from a post submission and briefly mentioned how to collect replies. However, due to the complexity of modelling nested replies, I thought it would be … Read more

How to Collect Data From Reddit – Introducing PRAW

If you’re a nerd like me, you’ll probably be very familiar with Reddit . Reddit describes itself as “the front page of the internet” which is certainly true for me and many others. I use it pretty much on a daily basis. Anything from tech news digest, to niche topics and to tech support. There’s … Read more