How to Stream from the Mastodon API via HTTP in Python
As I’ve mentioned in many of my blog posts before this one, Mastodon has a publicly available API which can be used to scrape statuses, collect hashtags and much more.
One feature that I have yet to mention in this blog is the ability to stream statuses as Mastodon features a streaming API for processing statuses in real-time. More documentation on the streaming API can be found here.
This is useful for a number of reasons as you may wish to:
- Watch an instance
- Track certain hashtags
- Monitor an account
Note: It’s worth mentioning that streaming may not be enabled on some Mastodon instances.
In this post, I’ll cover the basics for streaming Mastodon statuses using basic Python tools such as the requests
package which features is the ability to stream data via HTTP.
Streaming in Python
Scroll down to the end for the complete code.
To get started, we need to import a few packages to make HTTP requests to the Mastodon API and to convert the JSON response into a workable Python dictionary.
import requests
import json
...
Create a separate function for processing the stream. In this case, we pass in a few variables. instance
is used to determine the Mastodon instanced to stream from. path
the specific URL path used to determine what to stream (more on this here and mentioned later on).
...
def stream(instance, path, params=None):
...
Inside stream
we need to initiate the HTTP request using the streaming API endpoint of our chosen Mastodon instance.
...
s = requests.Session()
url = f'http://{instance}/api/v1/streaming{path}'
...
We also need to set a few headers too. connection
is set to keep-alive
to ensure that the connection doesn’t end on our side. content-type
is set to application/json
to ensure that we are expected a JSON
object in return. And, transfer-encoding
is set to chunked
which is an instruction for HTTP to process the request as a continuous stream.
...
headers = {'connection': 'keep-alive',
'content-type': 'application/json',
'transfer-encoding': 'chunked'}
...
And now we make the request by passing the URL and headers.
...
req = requests.Request("GET", url,
headers=headers,
params=params).prepare()
resp = s.send(req, stream=True)
...
A simple for
loop can be used to process each new line of data as it’s received from the server. We also, need to convert the data into a readable format from a stream of bytes to a string.
This would look something like this:
for line in resp.iter_lines():
line = line.decode('UTF-8')
Since we are using the Mastodon API, this is where things start to get a bit involved as we need to process the data from the stream in two stages.
The Mastodon Streaming API returns two fields for each post. The event
field records the event type (a complete set of event types can be found here) and the data
field stores the JSON object for the status.
An example of the Mastodon stream output looks like this…
:)
event: update
data: {...}
...
Using the for
loop, we need to iterate through each field and combine the results. The JSON data from the data
field is converted into a Python dictionary which is retuned using the yield
statement.
I’m sure there’s a better way of sorting this, but this is what I put together.
...
event_type = None
for line in resp.iter_lines():
line = line.decode('UTF-8')
key = 'event: '
if key in line:
line = line.replace(key, '')
event_type = line
key = 'data: '
if key in line:
line = line.replace(key, '')
data = dict()
data['event'] = event_type
data['data'] = json.loads(line)
yield data
And so, putting this all together gives us this…
import requests
import json
def stream(instance, path, params=None):
s = requests.Session()
url = f'http://{instance}/api/v1/streaming{path}'
headers = {'connection': 'keep-alive',
'content-type': 'application/json',
'transfer-encoding': 'chunked'}
req = requests.Request("GET", url,
headers=headers,
params=params).prepare()
resp = s.send(req, stream=True)
event_type = None
for line in resp.iter_lines():
line = line.decode('UTF-8')
key = 'event: '
if key in line:
line = line.replace(key, '')
event_type = line
key = 'data: '
if key in line:
line = line.replace(key, '')
data = dict()
data['event'] = event_type
data['data'] = json.loads(line)
yield data
Example
Using the code above, we can stream activity by printing the username, text and timestamp of each status as it arrives. This is what it could look like…
import re
for line in stream(instance='mstdn.social', path='/public/remote'):
if 'update' in line['event']:
t = line['data']
acct = t['account']['acct']
text = t['content']
text = re.sub('<[^<]+?>', '', text)
timestamp = t['created_at']
print(f"@{acct}: {text} ({timestamp}) \n")
In this example, we’re streaming from the mstdn.social
instance using the /public/remote
path for streaming statuses from the federated timeline as they appear in real-time.
If I wanted to focus on streaming statuses just from a single server, then you would replace the path
with /public/local
. A complete list of streaming functions can be found here.
You could also slot in a simple if
statement into the loop to monitor for a certain keyword or phrase.
What next?
To conclude, as I’m sure you can see, there are a lot of different ways you can use Mastodon’s streaming API for your needs.
A few applications might include:
- Building an interactive dashboard to monitor posts
- Save posts to a file or store in a database
- Monitor for certain keywords with alerts