There is no doubt that Wikipedia as a service is super, super useful. I
don’t know how many times I’ve used it to learn about a new topic.
Wikipedia is perhaps best well-known for its service as a collaborative
platform where users (or editors) of any background can come and
contribute towards sharing their knowledge and ensure that a given
Wikipedia article maintains a high quality.

I feel that little appreciation goes towards the editors themselves.
After all, they are the ones who are making the contributions!

To understand a little bit more about how editors interact on Wikipedia,
this post lists the basic methods used to derive a network graph to
model the collaborative interactions of Wikipedia editors from a
time-series format to a static graph. While this approach is used for
research purposes, further alterations and suggestions are encouraged to
form a more accurate representation of the task at hand.

Problem definition

The data is represented as a network structure meaning that the
underlining principles of graph theory are in use. In this example, a
Wikipedia editor is represented as a vertex and a directed edge is used
to pair editors together as A edits B. In this context, a graph is
modelled based from the order users edit a Wikipedia article. In this
case, the most recent editor edits the previous revision.


To query data from Wikipedia, data can be accessed from a single
endpoint URL. This can be accessed at Additionally, parameters are
needed in the GET request to further refine results. A complete list can
be accessed here. For example, the URL for accessing the most recent
revision for the article “Coffee” can be located at:

Data Formats

The revision of any Wikipedia article can be reproduced as either as an
XML file or JSON document, accessed by a single URL endpoint API. For
this exercise, the JSON format will be selected as this offers a more
convenient option for loading data structures into the Python
programming language. Bellow illustrates a sample revision document from
the API.

    "continue": {
        "rvcontinue": "20181017170220|864501192",
        "continue": "||"
    "query": {
        "pages": {
            "604727": {
                "pageid": 604727,
                "ns": 0,
                "title": "Coffee",
                "revisions": [
                        "revid": 864501853,
                        "parentid": 864501313,
                        "user": "Heroeswithmetaphors",
                        "timestamp": "2018-10-17T17:07:50Z",
                        "comment": "/* Caffeine content */links"

The Algorithm

The Python programming environment conveniently provides the packages
needed to model the data as a network graph. The package networkx
offers convenient features used to generate and manipulate network
graphs. In addition to this, the library matplotlib can be used to
plot the resulting visualisation. Providing that the user has an article
title they wish to analyse, the pseudo-code for the algorithm can be
described as follows:

  1. Get article title
  2. Set URL parameters and load JSON
  3. For each revision entry in document:
    1. Get user and store as node
    2. Get next user in list and store as node
    3. If user pair does not exist: store edge between two users
  4. Output edge pairs as CSV
  5. Plot user graph


While the algorithm provided in this solution is far from complete, it
provides the foundations to analyse the basic graph structure of
Wikipedia editors by means of capturing their interactions. Furthermore,
the output of the resulting graph only considers interactions that
accumulate over time within a single article. Contributors are welcome
to modify the algorithm to model interactions in a more meaningful