Interacting With REST APIs in Python With 5 Lines of Code
An essential skill for any web scraper or data scientist is to know how to collect information from a publicly available REST API. In short, a REST API is a very simple web service where simple HTTP requests (just like a web browser) are used to collect data usually in the form of a JSON or XML object.
With the help of Python, collecting data and interacting with REST APIs is surprisingly easy to do and can be done with very few lines of code.
Getting Started
To get started, you’ll need to make sure that the json
and requests
libraries. The json
library is installed by default, but you’ll need to install requests
if you haven’t done so already.
$ pip install requests
Basic example
As an example, say I wanted to find a Wikipedia article programmatically using their search function. Using Wikipedia’s REST API, this can be done with the following…
import requestsimport jsonr = requests.get('https://en.wikipedia.org/w/api.php?action=opensearch&search=Coffee')data = json.loads(r.text)
What does this do? The get
method is used from the requests
library to perform a quick HTTP GET request to the URL provided as a parameter. The details of the request are stored in the variable r
. Using the loads
method from json
, the raw text from the body of the requests (JSON data) is converted into a Python dictionary object.
Let’s check the contents of the file.
>>> data['Coffee', ['Coffee', 'Coffeehouse', 'Coffee Lake', 'Coffee preparation', 'Coffee bean', 'Coffee culture', 'Coffee with Kadhal', 'Coffee production in Indonesia', 'Coffee and Cigarettes', 'Coffea arabica'], ['', '', '', '', '', '', '', '', '', ''], ['https://en.wikipedia.org/wiki/Coffee', 'https://en.wikipedia.org/wiki/Coffeehouse', 'https://en.wikipedia.org/wiki/Coffee_Lake', 'https://en.wikipedia.org/wiki/Coffee_preparation', 'https://en.wikipedia.org/wiki/Coffee_bean', 'https://en.wikipedia.org/wiki/Coffee_culture', 'https://en.wikipedia.org/wiki/Coffee_with_Kadhal', 'https://en.wikipedia.org/wiki/Coffee_production_in_Indonesia', 'https://en.wikipedia.org/wiki/Coffee_and_Cigarettes', 'https://en.wikipedia.org/wiki/Coffea_arabica']]
Looks like it works!
With Parameters
Now that we’ve got a very simple script for interacting with the Wikipedia API, let’s make a few improvements.
You may have noticed that the URL used as part of the get
method takes two parameters which have been formatted specifically for the request.
?action=opensearch&search=Coffee
To simply this, we can represent this information in a formal eloquent manner as a key-value dictionary. This can be done as so…
params = { 'action': 'opensearch', 'search': 'Coffee'}
This makes it much easy to add new parameters and adjust existing ones. We can do using the params
variable this as follows.
r = requests.get('https://en.wikipedia.org/w/api.php', params=params)data = json.loads(r.text)
This will give the same result.
With headers (using a user agent)
Sometimes the REST API may need additional information like bearer tokens and user agents which aren’t included as URL parameters. This can be achieved by adding custom header parameters which are part of the request.
To demonstrate the use of headers, let’s set an alterative user agent to convince the web server that we are actually a web browser running Safari on an iPad. Similar to URL parameters, adding header parameters can be done using a key-value dictionary.
headers = { 'User-Agent': 'Mozilla/5.0 (iPad; U; CPU OS 3_2_1 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Mobile/7B405'}r = requests.get('https://en.wikipedia.org/w/api.php', params=params, headers=headers)data = json.loads(r.text)
Conclusions
Overall, this post demonstrates how easy it is to collect data from a REST API with very few lines of code using just two libraries. This code can be scaled with relative ease and can be extended to perform other, more complex, tasks. It really is that simple! Furthermore, the requests
library is not limited to the GET method, but, instead can be used to perform other HTTP requests such as POST, PUT and DELETE.