How to Stream from the Mastodon API via HTTP in Python

As I’ve mentioned in many of my blog posts before this one, Mastodon has a publicly available API which can be used to scrape statuses, collect hashtags and much more.

One feature that I have yet to mention in this blog is the ability to stream statuses as Mastodon features a streaming API for processing statuses in real-time. More documentation on the streaming API can be found here.

This is useful for a number of reasons as you may wish to:

  • Watch an instance
  • Track certain hashtags
  • Monitor an account

Note: It’s worth mentioning that streaming may not be enabled on some Mastodon instances.

In this post, I’ll cover the basics for streaming Mastodon statuses using basic Python tools such as the requests package which features is the ability to stream data via HTTP.

Streaming in Python

Scroll down to the end for the complete code.

To get started, we need to import a few packages to make HTTP requests to the Mastodon API and to convert the JSON response into a workable Python dictionary.

import requests
import json
...

Create a separate function for processing the stream. In this case, we pass in a few variables. instance is used to determine the Mastodon instanced to stream from. path the specific URL path used to determine what to stream (more on this here and mentioned later on).

...
def stream(instance, path, params=None):
	...

Inside stream we need to initiate the HTTP request using the streaming API endpoint of our chosen Mastodon instance.

    ...
    s = requests.Session()
    url = f'http://{instance}/api/v1/streaming{path}'
    ...

We also need to set a few headers too. connection is set to keep-alive to ensure that the connection doesn’t end on our side. content-type is set to application/json to ensure that we are expected a JSON object in return. And, transfer-encoding is set to chunked which is an instruction for HTTP to process the request as a continuous stream.

    ...
    headers = {'connection': 'keep-alive', 
               'content-type': 'application/json', 
               'transfer-encoding': 'chunked'}
    ...

And now we make the request by passing the URL and headers.

    ...
    req = requests.Request("GET", url,
                           headers=headers,
                           params=params).prepare()

    resp = s.send(req, stream=True)
    ...

A simple for loop can be used to process each new line of data as it’s received from the server. We also, need to convert the data into a readable format from a stream of bytes to a string.

This would look something like this:

for line in resp.iter_lines():
    line = line.decode('UTF-8')

Since we are using the Mastodon API, this is where things start to get a bit involved as we need to process the data from the stream in two stages.

The Mastodon Streaming API returns two fields for each post. The event field records the event type (a complete set of event types can be found here) and the data field stores the JSON object for the status.

An example of the Mastodon stream output looks like this…

:)
event: update
data: {...}

...

Using the for loop, we need to iterate through each field and combine the results. The JSON data from the data field is converted into a Python dictionary which is retuned using the yield statement.

I’m sure there’s a better way of sorting this, but this is what I put together.

    ...
    event_type = None
    
    for line in resp.iter_lines():
        line = line.decode('UTF-8')
        
        key = 'event: '
        if key in line:
            line = line.replace(key, '')
            event_type = line
        
        key = 'data: '
        if key in line:
            line = line.replace(key, '')
            data = dict()
            data['event'] = event_type
            data['data'] = json.loads(line)
            yield data

And so, putting this all together gives us this…

import requests
import json

def stream(instance, path, params=None):
    s = requests.Session()
    url = f'http://{instance}/api/v1/streaming{path}'    
    
    headers = {'connection': 'keep-alive', 
               'content-type': 'application/json', 
               'transfer-encoding': 'chunked'}
    
    req = requests.Request("GET", url,
                           headers=headers,
                           params=params).prepare()

    resp = s.send(req, stream=True)
    event_type = None
    
    for line in resp.iter_lines():
        line = line.decode('UTF-8')
        
        key = 'event: '
        if key in line:
            line = line.replace(key, '')
            event_type = line
        
        key = 'data: '
        if key in line:
            line = line.replace(key, '')
            data = dict()
            data['event'] = event_type
            data['data'] = json.loads(line)
            yield data

Example

Using the code above, we can stream activity by printing the username, text and timestamp of each status as it arrives. This is what it could look like…

import re
            
for line in stream(instance='mstdn.social', path='/public/remote'):
    if 'update' in line['event']:
        t = line['data']
        acct = t['account']['acct']
        text = t['content']
        text = re.sub('<[^<]+?>', '', text)
        timestamp = t['created_at']
        
        print(f"@{acct}: {text} ({timestamp}) \n")

In this example, we’re streaming from the mstdn.social instance using the /public/remote path for streaming statuses from the federated timeline as they appear in real-time.

If I wanted to focus on streaming statuses just from a single server, then you would replace the path with /public/local. A complete list of streaming functions can be found here.

You could also slot in a simple if statement into the loop to monitor for a certain keyword or phrase.

What next?

To conclude, as I’m sure you can see, there are a lot of different ways you can use Mastodon’s streaming API for your needs.

A few applications might include:

  • Building an interactive dashboard to monitor posts
  • Save posts to a file or store in a database
  • Monitor for certain keywords with alerts