Tag: matplotlib

Visualizing data – overlaying charts in python

Visualizing data is vital to analyzing data.  If you can’t see your data – and see it in multiple ways – you’ll have a hard time analyzing that data.  There are quite a few ways to visualize data and, thankfully, with pandas, matplotlib and/or seaborn, you can make some pretty powerful visualizations during analysis.

One of the things I like to do when I get a new dataset is try to visualize data points against each other to see if there’s anything that jumps out at me.   To do this, I like to overlay charts against each other to find any patterns in the data / charts. With matplotlib, this is pretty easy to do but working with dual-axis can be a bit confusing at first.

Want  to learn more about data visualization and/or matplotlib? Here are a few books / websites with good info on the topic.

One chart that I like to look at for data that I know has a relationship – like sales revenue and number of widgets sold – is the dual overlay of revenue vs quantity.  An example of one of my go-to approaches for visualizing data is in Figure 1 below.

Visualizing data - revenue vs number of items
Figure 1: Visualizing data — Revenue vs Quantity chart overlay

In this chart, we have Monthly Sales Revenue (blue line) chart overlay-ed against the Number of Items Sold chart (multi-colored bar chart). This type of chart lets me quickly see if there are any easy patterns in the revenue vs # of items.

I’ve not found a quick/easy way to build the multi-colored bar chart without hacking the data and building each colored section manually…so if you know a better way that what I share below, let me know.

An example

Here’s my code for building this chart using this data.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline # needed for jupyter notebooks
plt.rcParams['figure.figsize']=(20,10) # set the figure size
plt.style.use('fivethirtyeight') # using the fivethirtyeight matplotlib theme
sales = pd.read_csv('examples/sales.csv') # Read the data in
sales.Date = pd.to_datetime(sales.Date) #set the date column to datetime
sales.set_index('Date', inplace=True) #set the index to the date column
# now the hack for the multi-colored bar chart: 
# create fiscal year dataframes covering the timeframes you are looking for. In this case,
# the fiscal year covered October - September.
# --------------------------------------------------------------------------------
# Note: This should be set up as a function, but for this small amount of data,
# I just manually built each fiscal year. This is not very pythonic and would
# suck to do if you have many years of data, but it isn't bad for a few years of data. 
# --------------------------------------------------------------------------------
fy10_all = sales[(sales.index >= '2009-10-01') & (sales.index < '2010-10-01')]
fy11_all = sales[(sales.index >= '2010-10-01') & (sales.index < '2011-10-01')]
fy12_all = sales[(sales.index >= '2011-10-01') & (sales.index < '2012-10-01')]
fy13_all = sales[(sales.index >= '2012-10-01') & (sales.index < '2013-10-01')]
fy14_all = sales[(sales.index >= '2013-10-01') & (sales.index < '2014-10-01')]
fy15_all = sales[(sales.index >= '2014-10-01') & (sales.index < '2015-10-01')]
# Let's build our plot
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()  # set up the 2nd axis
ax1.plot(sales.Sales_Dollars) #plot the Revenue on axis #1
# the next few lines plot the fiscal year data as bar plots and changes the color for each.
ax2.bar(fy10_all.index, fy10_all.Quantity,width=20, alpha=0.2, color='orange')
ax2.bar(fy11_all.index, fy11_all.Quantity,width=20, alpha=0.2, color='gray')
ax2.bar(fy12_all.index, fy12_all.Quantity,width=20, alpha=0.2, color='orange')
ax2.bar(fy13_all.index, fy13_all.Quantity,width=20, alpha=0.2, color='gray')
ax2.bar(fy14_all.index, fy14_all.Quantity,width=20, alpha=0.2, color='orange')
ax2.bar(fy15_all.index, fy15_all.Quantity,width=20, alpha=0.2, color='gray')
ax2.grid(b=False) # turn off grid #2
ax1.set_title('Monthly Sales Revenue vs Number of Items Sold Per Month')
ax1.set_ylabel('Monthly Sales Revenue')
ax2.set_ylabel('Number of Items Sold')
# Set the x-axis labels to be more meaningful than just some random dates.
labels = ['FY 2010', 'FY 2011','FY 2012', 'FY 2013','FY 2014', 'FY 2015']

This is just one way of visualizing data with python. Hopefully its a good example of a different approach that you may not have thought about.