The Python programming language has become the go-to language for any data science-related projects. It supports a vast number of third-party packages to perform everything from data collection to data visualisation. It may be a little confusing knowing where to start.
In a series of blog posts, this post will be the first of a simple guide to get people started with analysing networks with Python.
Two of the most important packages (in terms of network analysis) are networkx and pandas. These packages were briefly touched on in another blog post but little information has been provided regarding their utility
Why Networkx and Pandas?
There are many reasons why I prefer networkx and pandas out of all the other alternatives but the simple reason is down to community. Seeing as these are the most popular packages, they also have the largest community surrounding them which means seeking help elsewhere and finding support can be done with relative ease.
Here is a simple breakdown of why you should use them:
networkx
- Comprehensive : Provides a wide range of tools and algorithms built-in.
- Performance : Analysing large graphs can be achieved in real time.
- Ease-of-use : Tools are well laid out and categorised such that it’s easy to find what you need.
pandas
- Modelling tabular data : Pandas is ideal for modelling network edge lists
- Import from external sources : External data can be imported using URLs from third-party sources.
- Exchangeable data types : Data can be manipulated and exported into multiple formats
Installation
Installing these packages are incredibly straightforward. There are essentially two methods for installation; via pip
or conda
.
If you wish to install the packages directly onto the built-in python environment, pip
is the way to go. This is as easy as…
For pandas:
pip install pandas
For networkx
pip install networkx
If you would like to use virtual environments, I would very much encourage you to use Anaconda as this comes with bundles of useful features – especially for data analytics. By default, this already includes pandas by default.
Once you have got your virtual environment setup, I would recommend installing the scipy
package as this includes pandas and a bunch of other useful tools too.
conda install scipy
Now for networkx…
conda install networkx
Conclusion
Overall, the installation procedure for implementing the relevant packages is relatively straightforward. Obviously, this only touches the surfaces when it comes to installing packages. There are many other ways in which you can install them. For the scope of this post, I’ve tried to keep them as simple as possible.
In the next post, we will start to create some basic graphs.