There is no doubt that Wikipedia as a service is super, super useful. I
don’t know how many times I’ve used it to learn about a new topic.
Wikipedia is perhaps best well-known for its service as a collaborative
platform where users (or editors) of any background can come and
contribute towards sharing their knowledge and ensure that a given
Wikipedia article maintains a high quality.
I feel that little appreciation goes towards the editors themselves.
After all, they are the ones who are making the contributions!
To understand a little bit more about how editors interact on Wikipedia,
this post lists the basic methods used to derive a network graph to
model the collaborative interactions of Wikipedia editors from a
time-series format to a static graph. While this approach is used for
research purposes, further alterations and suggestions are encouraged to
form a more accurate representation of the task at hand.
Problem definition
The data is represented as a network structure meaning that the
underlining principles of graph theory are in use. In this example, a
Wikipedia editor is represented as a vertex and a directed edge is used
to pair editors together as A edits B. In this context, a graph is
modelled based from the order users edit a Wikipedia article. In this
case, the most recent editor edits the previous revision.
Parameters
To query data from Wikipedia, data can be accessed from a single
endpoint URL. This can be accessed athttps://en.wikipedia.org/w/api.php
. Additionally, parameters are
needed in the GET request to further refine results. A complete list can
be accessed here. For example, the URL for accessing the most recent
revision for the article “Coffee” can be located at:
<pre class="wp-block-code">```
https://en.wikipedia.org/w/api.php?action=query&prop=revisions&format=json&&rvlimit=max&titles=Coffee
#### Data Formats
The revision of any Wikipedia article can be reproduced as either as an
XML file or JSON document, accessed by a single URL endpoint API. For
this exercise, the JSON format will be selected as this offers a more
convenient option for loading data structures into the Python
programming language. Bellow illustrates a sample revision document from
the API.