Gephi, is an amazing piece of software for building and visualising different types of network. It is considered the go-to software for network analytics and visualisation and is widely used by those who study social networks, such as myself. I use Gephi almost on a daily basis.
As well as producing some amazing visuals, Gephi is loaded with many features a few of which include performing statical analysis, multiple layout algorithms and, most importantly, the ability to import and export different formats.
This blog post will cover the basics for importing a network into Gephi using data stored in a spreadsheet. Why a spreadsheet? Spreadsheets are useful for storing tabulated data, in this case, an edge list. In its basic form, an edge list can be represented with two columns – a source column and a target column.
In my experience, this is a really useful feature as is it allows me to import from an external source such as data produced from a Python script using
pandas. Also, Gephi works with most spreadsheet formats, including Excel and CSV files.
To get things going, you’ll need to ensure that your spreadsheet has at least two columns named “source” and “target”. Without these names, Gephi won’t be able to figure out which are the source and target nodes. You can also include additional columns to represent things like a timestamp, weight or any other edge attributes.
With your spreadsheet, go ahead and open Gephi (assuming that you have already installed it). You should be presented with a welcome box (just close it for now) and a blank screen similar to the following.
Step 1: Locate Spreadsheet
Next, go and locate “File” from the top menu and select “Import spreadsheet…”
This will open up a new dialogue prompting you to locate the spreadsheet file you wish to import. Go ahead and click “Open”. In my case, I’ll be importing a retweet network based upon tweets which mention the word “London”. The source and target will represent anonymized users.
Step 2: Preview Data
Once the file has been selected, it will take a few minutes to load in Gephi. This may take a while depending on the size of the network. Once it has finished, you’ll get a nice little preview window to check to make sure that things look right. Make sure that under “Import as:” it’s selected as “Edges table”.
Step 3: Specify Columns
If you have multiple columns like me, you will be presented with the opportunity to select which columns you would like to include and their data type. In my case, because I had a column for timestamp, it has automatically detected it as a “Timestamp Set”. Click “Finish” and wait as it loads in.
You’ll then be presented with another dialogue informing bunch of warnings. Just ignore them for now. You will also e presented with some basic stats such as No. nodes and edges (Wow, my network was bigger than what I thought).
At this stage, you can specify if you wish to use a directed or undirected graph under “Graph Type: “. You can also specify an “edge merge strategy” which is used to handle duplicate edges. In this example, I selected “Sum” to count all duplicate edges.
Once done, go ahead and click “OK”. You have now successfully loaded a network into Gephi from a spreadsheet.
As stated earlier, Gephi can do many things, but, by far the most valuable feature is the ability to import data from spreadsheets. This makes it possible to visualise and analyse custom datasets, including things like retweet networks. As you have seen, importing data from a spreadsheet is fairly straightforward and can be extended through additional edge attributes to form some rather interesting graphs.