Forecasting Time-Series data with Prophet – Part 2

In Forecasting Time-Series data with Prophet – Part 1, I introduced Facebook’s Prophet library for time-series forecasting.   In this article, I wanted to take some time to share how I work with the data after the forecasts. Specifically, I wanted to share some tips on how I visualize the Prophet forecasts using matplotlib rather than relying on the default prophet charts (which I’m not a fan of).

Just like part 1, I’m going to be using this retail sales example csv file find on github.

For this work, we’ll need to import matplotlib and set up some basic parameters to be format our plots in a nice way (unlike the hideous default matplotlib format).

With this chunk of code, we import fbprophet, numpy, pandas and matplotlib. Additionally, since I’m working in jupyter notebook, I want to add the %matplotlib inline instruction to view the charts that are created during the session. Lastly, I set my figuresize and sytle to use the ‘ggplot’ style.

Since I’ve already described the analysis phase with Prophet, I’m not going to provide commentary on it here. You can jump back to Part 1 for a walk-through.

At this point, your data should look like this:

sample output of sales forecast


Now, let’s plot the output using Prophet’s built-in plotting capabilities.



Plot from fbprophet

While this is a nice chart, it is kind of ‘busy’ for me.  Additionally, I like to view my forecasts with original data first and forecasts appended to the end (this ‘might’ make sense in a minute).

First, we need to get our data combined and indexed appropriately to start plotting. We are only interested (at least for the purposes of this article) in the ‘yhat’, ‘yhat_lower’ and ‘yhat_upper’ columns from the Prophet forecasted dataset.  Note: There are much more pythonic ways to these steps, but I’m breaking them out for each of understanding.

You don’t need to delete the ‘y’and ‘index’ columns, but it makes for a cleaner dataframe.

If you ‘tail’ your dataframe, your data should look something like this:

final dataframe for visualization

You’ll notice that the ‘y_orig’ column is full of “NaN” here. This is due to the fact that there is no original data for the ‘future date’ rows.

Now, let’s take a look at how to visualize this data a bit better than the Prophet library does by default.

First, we need to get the last date in the original sales data. This will be used to split the data for plotting.

To plot our forecasted data, we’ll set up a function (for re-usability of course). This function imports a couple of extra libraries for subtracting dates (timedelta) and then sets up the function.

This function does a few simple things. It finds the 2nd to last row of original data and then creates a new set of data (predict_df) with only the ‘future data’ included. It then creates a plot with confidence bands along the predicted data.

The ploit should look something like this:

Actual Sales vs Forecasted Sales

Hopefully you’ve found some useful information here. Check back soon for Part 3 of my Forecasting Time-Series data with Prophet.

Visualizing data – overlaying charts in python

Visualizing data is vital to analyzing data.  If you can’t see your data – and see it in multiple ways – you’ll have a hard time analyzing that data.  There are quite a few ways to visualize data and, thankfully, with pandas, matplotlib and/or seaborn, you can make some pretty powerful visualizations during analysis.

One of the things I like to do when I get a new dataset is try to visualize data points against each other to see if there’s anything that jumps out at me.   To do this, I like to overlay charts against each other to find any patterns in the data / charts. With matplotlib, this is pretty easy to do but working with dual-axis can be a bit confusing at first.

Want  to learn more about data visualization and/or matplotlib? Here are a few books / websites with good info on the topic.

One chart that I like to look at for data that I know has a relationship – like sales revenue and number of widgets sold – is the dual overlay of revenue vs quantity.  An example of one of my go-to approaches for visualizing data is in Figure 1 below.

Visualizing data - revenue vs number of items
Figure 1: Visualizing data — Revenue vs Quantity chart overlay

In this chart, we have Monthly Sales Revenue (blue line) chart overlay-ed against the Number of Items Sold chart (multi-colored bar chart). This type of chart lets me quickly see if there are any easy patterns in the revenue vs # of items.

I’ve not found a quick/easy way to build the multi-colored bar chart without hacking the data and building each colored section manually…so if you know a better way that what I share below, let me know.

An example

Here’s my code for building this chart using this data.

This is just one way of visualizing data with python. Hopefully its a good example of a different approach that you may not have thought about.

Forecasting Time-Series data with Prophet – Part 1

This is part 1 of a series where I look at using Prophet for Time-Series forecasting in Python

A lot of what I do in my data analytics work is understanding time series data, modeling that data and trying to forecast what might come next in that data. Over the years I’ve used many different approaches, library and modeling techniques for modeling and forecasting with some success…and a lot of failure.

Recently, I’ve been looking for a simpler approach for my initial modeling and think I’ve found a very nice library in Facebook’s Prophet (available for both python and R). While this particular library isn’t terribly robust, it is quick and gives some very good results for that initial pass at modeling / forecasting time series data.  An added bonus with Prophet for those that like to understand the theory behind things is this white paper with a very good description of the math / statistical approach behind Prophet.

If you are interested in learning more about time-series forecasting, check out the books / websites below.

Installing Prophet

To get started with Prophet, you’ll first need to install it (of course).

Installation instructions can be found here, but it should be as easy as doing the following (if you have an existing system that has the proper compilers installed):

For those running conda, you can install prophet via conda-forge using the following command:

Note: Prophet requres pystan, so you may need to also do the following (although in my case, it was installed as a requirement of fbprophet):

Pystan documentation can be found here.

Getting started

Using Prophet is extremely straightforward. You import it, load some data into a pandas dataframe, set the data up into the proper format and then start modeling / forecasting.

First, import the module (plus some other modules that we’ll need):

Now, let’s load up some data. For this example,  I’m going to be using the retail sales example csv file find on github.

Now, we have a pandas dataframe with our data that looks something like this:

Prophet pandas dataframe example

Note the format of the dataframe. This is the format that Prophet expects to see. There needs to be a ‘ds’ column  that contains the datetime field and and a ‘y’ column that contains the value we are wanting to model/forecast.

Before we can do any analysis with this data, we need to log transform the ‘y’ variable to a try to convert non-stationary data to stationary. This also converts trends to more linear trends (see this website for more info). This isn’t always a perfect way to handle time-series data, but it works often enough that it can be tried initially without much worry.

To log-tranform the data, we can use np.log() on the ‘y’ column like this:

Your dataframe should now look like the following:

log transformed data for Prophet

Its time to start the modeling.  You can do this easily with the following command:

If you are running with monthly data, you’ll most likely see the following message after you run the above commands:

You can ignore this message since we are running monthly data.

Now its time to start forecasting. With Prophet, you start by building some future time data with the following command:

In this line of code, we are creating a pandas dataframe with 6 (periods = 6) future data points with a monthly frequency (freq = ‘m’).  If you’re working with daily data, you wouldn’t want include freq=’m’.

Now we forecast using the ‘predict’ command:

If you take a quick look at the data using .head() or .tail(), you’ll notice there are a lot of columns in the forecast_data dataframe. The important ones (for now) are ‘ds’ (datetime), ‘yhat’ (forecast), ‘yhat_lower’ and ‘yhat_upper’ (uncertainty levels).

You can view only these columns in a .tail() by running the following command.

Your dataframe should look like:

prophet forecasted data

Let’s take a look at a graph of this data to get an understanding of how well our model is working.

fbprophet forecast graph

That looks pretty good. Now, let’s take a look at the seasonality and trend components of our /data/model/forecast.

Prophet component plot for seasonality and trend

Since we are working with monthly data, Prophet will plot the trend and the yearly seasonality but if you were working with daily data, you would also see a weekly seasonality plot included.

From the trend and seasonality, we can see that the trend is a playing a large part in the underlying time series and seasonality comes into play more toward the beginning and the end of the year.

So far so good.  With the above info, we’ve been able to quickly model and forecast some data to get a feel for what might be coming our way in the future from this particular data set.

Before we go on to tweaking this model (which I’ll talk about in my next post), I wanted to share a little tip for getting your forecast plot to display your ‘original’ data so you can see the forecast in ‘context’ and in the original scale rather than the log-transformed data. You can do this by using np.exp() to get our original data back.

Let’s take a look at the forecast with the original data:

fbprophet forecast data with original data

Something looks wrong (and it is)!

Our original data is drawn on the forecast but the black dots (the dark line at the bottom of the chart) is our log-transform original ‘y’ data. For this to make any sense, we need to get our original ‘y’ data points plotted on this chart. To do this, we just need to rename our ‘y_orig’ column in the sales_df dataframe to ‘y’ to have the right data plotted. Be careful here…you want to make sure you don’t continue analyzing data with the non-log-transformed data.

And…plot it.

fbprohpet original corrected

There we go…a forecast for retail sales 6 months into the future (you have to look closely at the very far right-hand side for the forecast). It looks like the next six months will see sales between 450K and 475K.

Check back soon for my next post on using Prophet for forecasting time-series data where I talk about how to tweak the models that come out of prophet.