One of the features which makes networkx really useful is the ability to import and export data from a variety of sources. This is especially useful as data can come in all different shapes and sizes which may not always be consistent. The purpose of this blog post is to walk through some of the standard techniques for reading and writing graphs using networkx and pandas.
To allow for more flexibility and control, networkx supports the ability to convert to and from pandas data frames. When combined this allows for more options when reading and writing data. Using pandas alone produces a total of 399 distinct combinations.
Before we get into how to import/export data it’s worth going through some of the ways in which graphs can be represented in data. As mentioned in the previous blog posts, networks (also known as graphs) are a collection of nodes and edges. We are essentially representing two things – an entity (node) and a relationship (an edge).
An edge list is exactly what it says – it’s a list of edges. Simple. They usually come in the form of a table with two columns. One column for source, and one for the target. Depending on the type of graph it might feature multiple columns which contain attributes relating to an edge. This may include things like a timestamp.
An adjacency matrix is an n-by-n square matrix used to indicate the presence of an edge between nodes. For example, by reading the graph by row then column, a ‘1’ is used to indicate an edge between the corresponding row then column.
As shown above, there are quite a few ways for importing and exporting networks. To keep things simple we will go through some of the most popular functions. To begin, we’ll look at some of the features which are integrated into networkx.
Example 1: Reading and Writing Edge Lists
By far the easiest and simplest approach is to store data in a simple text file. This can be achieved using the
write_edgelist functions within networkx. To save an edge list to file, the
write_edgelist function takes a graph as input, and the path of the output file (’
example.edgelist ‘). Here’s a simple example using the graph above.
G = nx.DiGraph() G.add_edge('A', 'B') G.add_edge('A', 'C') G.add_edge('C', 'D') G.add_edge('D', 'C') nx.write_edgelist(G, 'example.edgelist', data=False)
Note : This function also takes other parameters to control for things such as edge attributes. In our case, we set
data=False as we don’t need to save the edge attributes as we don’t have any. You can also just things such as the delimiters too. By default, columns are separated by a space.
The output of this graph looks something like this…
A B A C C D D C
Now that the data has been saved, we can read this using the
read_edgelist function. This is as simple as doing the following.
>>> H = nx.read_edgelist('example.edgelist', create_using=nx.DiGraph) >>> H.edges() OutEdgeView([('A', 'B'), ('A', 'C'), ('C', 'D'), ('D', 'C')])
Note : When reading in a graph it’s important to ensure that you’ve got the right graph type defined. By default, networkx uses a simple undirected graph
nx.Graph whereas in our case we explicitly mention that this is a directed graph by setting
Example 2: GEXF
In some cases when you’re exporting a graph you’re doing so with the intention of analysing it with other software. For example, many users use Gephi to visualise their networks as this provides a whole suite of tools to allow them to create presentable graphs quickly and easily. Gephi has made an appearance on this blog. See below:
Networkx allows us to import / export graphs directly to a compatible file format for Gephi using the
Example 3: JSON
One of the more complex ways for exporting graphs is to use JSON as a way of serialising a network. This approach is typically used for those who wish to use graphs on the Internet either through an API or an interactive visualisation package such as the D3.js .
As mentioned previously, pandas provide multiple ways of import/export data. Pandas is primarily used to provide interactive data frames within a Python environment. These data frames are represented as tabular data. This is particularly ideal considering we are working with edge lists. To export a graph to a pandas data frame, it’s as simple as using
>>> G = nx.DiGraph() >>> G.add_edge('A', 'B') >>> G.add_edge('A', 'C') >>> G.add_edge('C', 'D') >>> G.add_edge('D', 'C') >>> df = nx.to_pandas_edgelist(G) >>> df source target 0 A B 1 A C 2 C D 3 D C
Why use pandas? May want to do additional processing such as filtering and querying.
Now that we’ve got a panda data frame, we can do additional processing such as filtering and querying our edge list. For example, if we wanted to examine edges where ‘A’ is the target…
>>> df[df['source'] == 'A'] source target 0 A B 1 A C
By using pandas, you can perform more complex operations but for the purpose of this example, we will keep things simple. Let’s say we want to read this edge list back into a networkx graph, all we need to do is use
from_pandas_edgelist . Note : As mentioned before, it’s important to make sure we get the graph type correct hence why we’re using
>>> df_new = df[df['source'] == 'A'] >>> G = nx.from_pandas_edgelist(df_new, create_using=nx.DiGraph) >>> G.edges() OutEdgeView([('A', 'B'), ('A', 'C')])
As we can see, we now have a new graph which we modified using pandas data frames. Also, it’s worth pointing out that by using pandas we’ve also opened up our opportunities to export our graphs into many other formats (see above).
Final Thoughts and Conclusions
In this blog post, we explored a few ways in which graphs can be imported and exported to different formats. We also covered some of the ways in which graphs can be represented using edge lists and adjacency matrices.
This blog post provides a very basic overview of how to import and export data with a few simple transformations with the aid of pandas. By using this approach, there are many more operations we can perform as shown in the figures is above.