Best Data Visualisation Packages for Python in 2022 - The Pros, Cons and Cross Comparison

Data visualisation is an incredibly important component of data science. By using visual representations of data, we can help provide meaningful insights which are easy to understand. Conveniently, for those who use Python as their main programming language of choice, there are many potential options to explore however, not all of them may suit our needs.

There are many options to consider depending on your needs. For example, will it be used in a Jupyter Notebook or Web APP? Does it need to be interactive? In this blog post, we will explore five of the most popular data visualisation packages in Python exploring the pros and cons of each, followed by a cross-comparison of features.

Matplotlib (GitHub)

Matplotlib is perhaps one of the most well-known Python packages in general and is widely used among the community for producing many different types of data visualisations and exporting them to different formats such as PNG and SVG. Furthermore, Matplotlib is versatile and can do anything from basic line charts to complex network diagrams.

Pros

  • Popular choice and widely used among the community
  • Figures can be explored in many different formats
  • Versatile and can be used to plot different visualisation types

Cons

  • Requires a lot of knowledge to build complex plots
  • Documentation can be hard to navigate
  • Fairly low-level

Seaborn (GitHub)

Seaborn inherits many of the features of matplotlib as it describes itself as “a Python data visualization library based on matplotlib.”. Seaborne attempts to make it easier to produce “attractive and informative” statistical graphics by providing a high-level interface that matplotlib lacks

Pros

  • East to get started
  • Features a wider selection of data visualisations
  • Requires little customisation and is ready out of the box

Cons

  • Only really used for complex data and not for simple tasks
  • In some cases, the data may need to be pre-processed before plotting
  • Lack of customisation

Plotly (GitHub)

Plot.ly is a service which provides a python package for creating complex, web-based, interactive data visualisations. They provide an open-source Python package for creating dashboards and charts and maps which can be integrated with their online service.

Pros

  • Used across multiple programming languages, not just Python
  • Allows users to create interactive plots
  • Customisable plots

Cons

  • Potentially persuaded into taking a paid service
  • A little challenging to use and requires a lot of research
  • Requires creating lots of code

Bokeh (GitHub)

Bokeh is a Python package which specialises in “creating interactive visualisations for modern web browsers” using Javascript. Bokeh allows you to create anything from simple plots to complex dashboards using a variety of different datasets.

Pros

  • Can be used to create many different data visualisations
  • Heavily customisable
  • Supports many different web browsers

Cons

  • Not as well known
  • Complex framework for modelling data
  • Resulting plots don’t look as nice as other packages

Streamlit (GitHub)

Much like Plotly, Streamlit is a Python package and service designed to help create interactive data visualisation apps which can be shared with others within a web browser. It provides full GUI controls allowing users to adjust data sets and plot multiple figures in real-time using a single Python script.

Pros

  • Easy and simple to use
  • Share with others
  • Much faster at runtime (due to caching)

Cons

  • Only designed for web-based interactions
  • Sharing data visualisation apps requires creating an account
  • No easy option for self-hosting

Conclusions

There are many options to consider depending on your needs, some may be better suited to you than others. While packages like matplotlib and seaborn are considered to be nice all-rounders, they certainly lack the interactive features seen in Plotly, Bokeh and Streamlit. Ultimately the decision is yours! This is just one opinion of many within the data science community and I’m sure you are likely to disagree on some points however, I have had the opportunity to use all five at some point in my career so these simply serve as recommendations.