A newbie’s information to information visualization utilizing Python and Seaborn

Data visualization is a technique that data scientists can use to turn raw data into charts and graphs that provide valuable insights. Charts reduce the complexity of the data and make it easier for everyone to understand.

There are many data visualization tools available, such as: B. Tableau, Power BI, ChartBlocks, and more, which are no-code tools. They are very powerful tools and they have audiences. However, if you are working with raw data that requires transformation and a good playground for data, Python is an excellent choice.

Python, while more complicated, requires programming skills, but allows you to manipulate, transform, and visualize your data. It’s ideal for data scientists.

There are many reasons Python is the best choice for data science, but one of the most important is the library ecosystem. There are many great libraries available for Python to work with data such as numpy, Pandas, matplotlib, Tensorflow.

Matplotlib is probably the best known plot library available for Python and other programming languages ​​such as R. It’s the level of customization and usability that put it in the first place. However, some actions or adjustments can be difficult to work with when used.

Developers have created a new library based on matplotlib called Seaborn. Seaborn is as powerful as matplotlib At the same time, an abstraction is provided to simplify diagrams and provide some unique functionality.

In this article, we’ll focus on how to partner with Seaborn to create world-class diagrams. If you’d like to get involved, you can create your own project or just check out mine Seaborn Guide Project on GitHub.

What is Seaborn?

Seaborn is a library for creating statistical graphics in Python. It builds up matplotlib and integrates closely with Pandas data structures .

With Seaborn Design, you can quickly explore and understand your data. Seaborn captures entire frames of data or arrays that contain all of your data and performs all of the internal functions necessary for semantic mapping and statistical aggregation to turn data into informative charts.

It abstracts the complexity and allows you to design your plots according to your requirements.

[Read: Meet the 4 scale-ups using data to save the planet]

Install Seaborn

To install Seaborn is as simple as installing a library using your favorite Python package manager. At the installation Seaborn will install the library including its dependencies matplotlib, Pandas, numpy and scipy.

Then let’s install Seaborn and of course the package as well Notebook to get access to our data playground.

Pipenv install Seaborn Notebook

Additionally, we will import some modules before we start.

import Seaborn as sns
import Pandas as pd
import numpy as e.g.
import matplotlib

Build your first lots

Before we can draw anything, we need data. The beauty of Seaborn is that it works right with Pandas Data frame which makes it super convenient. The library also includes some built-in datasets that you can now load from code without having to manually download files.

Let’s see how this works by loading a dataset that contains information about flights.

Scatter plot

A scatter plot is a graph that displays points based on two dimensions of the data set. Creating a scatter plot in the Seaborn library is so easy with just one line of code.

sns.scatterplot (data = flight data, x = “year”, y = “passengers”)

Scatter plot of the sample

Very easy right? The function Scatter plot expects the data set we want to draw and the columns that represent that x and and Axis.

Line diagram

This chart draws a line that represents the continuous or categorical data revolution. It’s a popular and well-known type of chart, and it’s super easy to create. We use the function similarly to before Line plot with the data set and columns containing the x and and Axis. Seaborn will do the rest.

sns.Line chart (date= Flights_data, x=“Year”, and=“Passengers”)

Example line diagram

Bar plot

It’s probably the most popular type of chart, and as you may have predicted, we can draw this type of chart with us Seaborn We use the line and scatter plot function in the same way Bar graph.

sns.Barplot (data= Flights_data, x=“Year”, and=“Passengers”)

Example bar chart

It’s very colorful, I know we’ll learn how to customize it later in the tutorial.

Extension with matplotlib

Seaborn is building matplotlib extends its functionality and abstracts the complexity. Even so, it does not limit its capabilities. Any Seaborn Diagram can be customized with functions from the matplotlib Library. It can be useful for certain operations and enables seaborns to harness the power of matplotlib without having to rewrite all of its functions.

For example, suppose you want to draw several diagrams at the same time soulborn; then you could use that Subplot Function of matplotlib.

diamonds_data = sns.load_dataset (“Diamonds”) plt.Subplot (1, 2, 1) sns.Countplot (x=‘Carat’, Data=diamonds_data) plt.Subplot (1, 2, 2) sns.Countplot (x=‘Depth’, Data=diamonds_data)

Example plot with subplots

Use of Subplot Function we can draw more than one diagram on a single diagram. The function takes three parameters: the first is the number of rows, the second is the number of columns, and the last is the plot number.

We render a Seaborn Diagram in each subplot, shuffling matplotlib With Seaborn Functions.

Seaborn loves pandas

We’ve talked about it before, but Seaborn loves Pandas to such an extent that all of its functions are based on it Pandas Data frame. So far we have seen examples of use Seaborn with data preinstalled, but what if we want to draw a graph from data that we’ve already loaded? Pandas?

Drinks_df = pd.read_csv (“data / drinks.csv”) sns.Bar plot (x=“Country”, and=“Beer_Servations”, Data=Drinks_df)

Example plot with pandas

Do beautiful acts with styles

Seaborn gives you the ability to change the user interface of your diagrams and offers five different styles right away: Darkgrid, Whitegrid, dark, White, and Ticks.

sns.set_style (“darkgrid”) sns.Line chart (date = Data, x = “Year”, and = “Passengers”)

Example plot in darkgrid style

Here is another example

sns.set_style (“Whitegrid”) sns.Line chart (date= Flights_data, x=“Year”, and=“Passengers”)

Example of a Whitegrid-style plot

Cool use cases

We know the basics of Seaborn, now let’s put it into practice by making multiple graphs over the same data set. In our case we use the data set tips, which you can download directly Seaborn.

First, load the data set.

I like to print the first few lines of the data set to get a feel for the columns and the data itself. I usually use some Pandas Features to fix some data problems like zero Values ​​and add information to the dataset that may be helpful. You can read more about this at Instructions for working with pandas .

Create an additional column to the record with the percentage that represents the tip amount over the total of the bill.

Next we can draw some diagrams.

Understand betting percentages

First, let’s try to understand the percentage distribution of tips. For that we can use Histplot This creates a histogram.

sns.Histplot (tips_df[“tip_percentage”]Bin width=0.05)

Understanding the Peak Percentage Chart

That’s good, we had to adjust that binwidth Property to make it more readable, but now we can quickly appreciate our understanding of the data. Most customers would tip between 15 and 20%, and we have a few marginal cases where the tip is over 70%. These values ​​are anomalies and it is always worth investigating to see if the values ​​are errors or not.

It would also be interesting to know if the tip percentage changes depending on the time of day.

sns.Histplot (date=tips_df, x=“tip_percentage”Bin width=0.05, Hue=“Time”)

Understanding tip percentages by timing diagram

This time we loaded the chart with the full data set instead of just one column and then set the property hue to the column Time. This forces the chart to use different colors for each value time and add a legend.

Total tips per weekday

Another interesting metric is knowing how much money staff can expect in tips based on the day of the week.

sns.Barplot (data=tips_df, x=“Day”, and=“Tip”, Estimator=e.g..Total)

Understanding drinking percentages per day

It looks like Friday is a good day to stay at home.

Influence of table size and day on the tip

Sometimes we want to understand how variables interact to determine the output. For example, how do the day of the week and table size affect the tip percentage?

To draw the next diagram, we combine that pan Function of pandas to preprocess the information and then draw a heat map diagram.

pan = tips_df.pivot_table (index=[“day”], Columns=[“size”]values=“tip_percentage”, aggfunc=e.g..Average) sns.Heatmap (Pivot)

Understanding Tip Percentages Per Day and Chart Size Chart

Conclusion

Of course we can do a lot more Seaborn, and you can learn more use cases by visiting official documentation. I hope you enjoyed this article as much as I enjoyed writing it.

These items was originally published on Live code stream by Juan Cruz Martinez (Twitter: @bajcmartinez), Founder and publisher of Live Code Stream, entrepreneur, developer, author, speaker and maker of things.

Live code stream is also available as a free weekly newsletter. Sign up for updates on everything related to programming, AI and computer science in general.

Comments are closed.