Card image cap
Building User-Hashtag Co-occurrence Network from Timelines on Mastodon

Hashtags are an incredibly useful feature for annotating posts and have an active role on microblogging platforms such as Twitter and Mastodon. They are particularly helpful for discovering new people to follow and finding interesting posts. In a separate post, I used Twitter to build hashtag co-occurrence networks based upon a specific hashtag. The general […]

Read More… from Building User-Hashtag Co-occurrence Network from Timelines on Mastodon

Card image cap
How to Scrape Mastodon Timelines Using Python and Pandas

Over the past few months, Mastodon, the Federated microblogging alternative to Twitter, has gained a lot of attraction in light of the events going on surrounding Elon Musk’s purchase of Twitter. Upon further inspection, it turns out most instances have a public facing REST API for allowing users to interact with their services using third-party […]

Read More… from How to Scrape Mastodon Timelines Using Python and Pandas

Card image cap
Interacting With REST APIs in Python With 5 Lines of Code

An essential skill for any web scraper or data scientist is to know how to collect information from a publicly available REST API. In short, a REST API is a very simple web service where simple HTTP requests (just like a web browser) are used to collect data usually in the form of a JSON […]

Read More… from Interacting With REST APIs in Python With 5 Lines of Code

Card image cap
Best Web Scraping Practices for 2022

There is an abundance of freely available datasets on the internet for pretty much anything and everything you can think of. In a previous blog post, we covered five different platforms for finding useful datasets. However, as mentioned in the post, there are times when you may need to collect data where it may be […]

Read More… from Best Web Scraping Practices for 2022

Essential Python Packages for the Web Scraping Toolbox in 2022

Python is known for being able to do many different things due to the versatility of the programming language. One thing Python is particularly well-known for is its ability to receive data over the Web using the requests package (more information can be found here). The requests package enables users to perform basic HTTP requests […]

Read More… from Essential Python Packages for the Web Scraping Toolbox in 2022

How to Scrape and Extract Hyperlink Networks with BeautifulSoup and NetworkX

If you use the internet often enough (which, if you’re reading this blog post, I imagine you), you will soon realise that hyperlinks are everywhere you go. They are a fundamental complete to the world wide web. They allow us to navigate through web pages with a series of simple clicks. If you’ve been on […]

Read More… from How to Scrape and Extract Hyperlink Networks with BeautifulSoup and NetworkX