There are so many research tools available these data these days to make mundane tasks like data visualisation and collection much easier. This post touches on some of the essential pieces of software / tools you need to know for this type of research. This is broken down into Programming Languages, Modelling Tools and Visualisation.
You can’t really go wrong with python. It is one of the most versatile programming languages and I use it on daily basis. Python can to everything from the initial data collection to output. Python is also a widely supported programming language so a-lot of the tools that I use are regular maintained (more on this later). There are many packages out there to get the job done but here are a few that I coun’t live without.
- networkx: A comprehensive package for network modelling
- pandas: Used for importing and exporting data frames in multiple formats including, CSV, TXT, HTML, Excel pickle and many more.
- matplotlib: Provides the necessary tools to make great data visualisations programmatically
- scikit-learn: Contains tools needed to perform machine learning and advanced data analytics
I haven’t used an awful lot of R but it is another respected language for data analysis and exploration. It works well for many things, however I feel that the market is predominantly occupied by the Python community.
As before, there are quite a lot of tools out there but knowing the right ones will certainly help. Just to clarify, when I talk about modelling tools, I am primarily focused on ones that I use within Python. I’m sure there are others out there but these are the ones which are used the most:
As mentioned before, networkx is perhaps one of the most well-known packages for modelling networks. It provides multiple ways for generating and configuring networks as well as exporting them too. This is my preferred choice above all else as it offers the most in terms of ease of use and functionality.
From what I know, graph-tool was designed with speed and efficiency in mind as its core code is written in C++ meaning that it is extremely good at running intensive task. It also make uses of parallel processing too.
Much like graph-tool, igraph makes the most of speed and efficiency as it is also written in a low-level programming language using C. It is also packaged for both R and Python making it a great choice for those who are familiar with both.
No matter what task you are doing, visualisation is an import aspect of any data analytic task. Knowing how to represent the data is just as important too. Here are a few options.
As mentioned before, matplotlib is the leading data visualisation package for Python. networkx integrates with matplotlib to provide basic network drawing tools.
Gephi is a dedicated piece of software for generating excellent network visualisations. It also contains various tools for calculating basic metrics like degree, centrality e.t.c. It does however underperform for larger graphs.
yEd is very similar to Gephi but it does allow for greater customisation in comparison to Gephi with an improved layout engine too. Unfortunately it lacks the capability of running basic stats like Gephi.
graphvis is a CLI (command line interface) tool for running and output graph visualisation. It does so using custom files known as DOT files (.dot) It’s much faster (for some graphs) with multiple layout algorithms and customisation.