Forecasting Time-Series data with Prophet – Part 2

In Forecasting Time-Series data with Prophet – Part 1, I introduced Facebook’s Prophet library for time-series forecasting.   In this article, I wanted to take some time to share how I work with the data after the forecasts. Specifically, I wanted to share some tips on how I visualize the Prophet forecasts using matplotlib rather than relying on the default prophet charts (which I’m not a fan of).

Just like part 1, I’m going to be using this retail sales example csv file find on github.

For this work, we’ll need to import matplotlib and set up some basic parameters to be format our plots in a nice way (unlike the hideous default matplotlib format).

With this chunk of code, we import fbprophet, numpy, pandas and matplotlib. Additionally, since I’m working in jupyter notebook, I want to add the %matplotlib inline instruction to view the charts that are created during the session. Lastly, I set my figuresize and sytle to use the ‘ggplot’ style.

Since I’ve already described the analysis phase with Prophet, I’m not going to provide commentary on it here. You can jump back to Part 1 for a walk-through.

At this point, your data should look like this:

sample output of sales forecast

 

Now, let’s plot the output using Prophet’s built-in plotting capabilities.

 

 

Plot from fbprophet

While this is a nice chart, it is kind of ‘busy’ for me.  Additionally, I like to view my forecasts with original data first and forecasts appended to the end (this ‘might’ make sense in a minute).

First, we need to get our data combined and indexed appropriately to start plotting. We are only interested (at least for the purposes of this article) in the ‘yhat’, ‘yhat_lower’ and ‘yhat_upper’ columns from the Prophet forecasted dataset.  Note: There are much more pythonic ways to these steps, but I’m breaking them out for each of understanding.

You don’t need to delete the ‘y’and ‘index’ columns, but it makes for a cleaner dataframe.

If you ‘tail’ your dataframe, your data should look something like this:

final dataframe for visualization

You’ll notice that the ‘y_orig’ column is full of “NaN” here. This is due to the fact that there is no original data for the ‘future date’ rows.

Now, let’s take a look at how to visualize this data a bit better than the Prophet library does by default.

First, we need to get the last date in the original sales data. This will be used to split the data for plotting.

To plot our forecasted data, we’ll set up a function (for re-usability of course). This function imports a couple of extra libraries for subtracting dates (timedelta) and then sets up the function.

This function does a few simple things. It finds the 2nd to last row of original data and then creates a new set of data (predict_df) with only the ‘future data’ included. It then creates a plot with confidence bands along the predicted data.

The ploit should look something like this:

Actual Sales vs Forecasted Sales


Hopefully you’ve found some useful information here. Check back soon for Part 3 of my Forecasting Time-Series data with Prophet.

Eric D. Brown , D.Sc. has a doctorate in Information Systems with a specialization in Data Sciences, Decision Support and Knowledge Management. He writes about utilizing python for data analytics at pythondata.com and the crossroads of technology and strategy at ericbrown.com

Stockstats – Python module for various stock market indicators

I’m always working with stock market data and stock market indicators. During this work, there’s times that I need to calculate things like Relative Strength Index (RSI), Average True Range (ATR), Commodity Channel Index (CCI) and other various indicators and stats.

My go-to for this type of work is TA-Lib and the python wrapper for TA-Lib but there’s times when I can’t install and configure TA-Lib on a computer. When this occurs, I then have to go find the various algorithms to calculate the various indicators / stats that I need.  When this happens, I usually end up making mistakes and/or taking longer than I really should to get these algo’s built to use in a project.  Of course I re-use what I can when I can but many times I’ve forgotten that I built an RSI function in the past and recreate.

I found myself in this situation today. I need an RSI calculation for some work I’m doing.  I couldn’t get TA-Lib installed and working on the machine I was working on (no clue what was wrong either) so I decided to write my own indicator.  While looking around the web for a good algorithm to use, I ran across a new module that I hadn’t see before called stockstats.

Stockstats is a wrapper for pandas dataframes and provides the ability to calculate many different stock market indicators / statistics.  The fact that it is a simple wrapper around pandas is ideal since I do 99% of my work within pandas.

To use stockstats, you simply to to ‘convert’ a pandas dataframe to a stockstats dataframe. This can be done like so:

Then, to calculate the RSI for this dataframe, all you need to do is pass a command into the stockstats dataframe.

The above calculates the 14-day RSI for the entire dataframe.

Let’s look at a full example using data from yahoo.

First, import the modules we’ll need:

Pull down all the historical data for the S&P 500 ETF (SPY):

Taking a look at the ‘tail’ of the data gives us something like the data in Table 1.

SPY historical data - stock market indicators
Table 1: SPY Historical Data

To calculate RSI, retype the pandas dataframe into a stockstats dataframe and then calculate the 14-day RSI.

With this approach, you end up with some extra columns in your dataframe. These can easily be removed with the ‘del’ command.

With these extra columns removed, you now have the 14-day RSI values a column titled “rsi”.

SPY Historical Data with RSI  - stock market indicators
Table 2: SPY Historical Data with RSI

One caveat on this approach – stockstats seems to take the ‘close’ column. This might or might not be an issue for you if you are wanting to use the Adj Close column provided by yahoo. This is a simple fix (delete the ‘close’ and rename ‘adj close’ to ‘close’).

Stockstats currently has about 26 stats and stock market indicators included. Definitely not as robust as TA-Lib, but it does have the basics. If you are working with stock market data and need some quick indicators / statistics and can’t (or don’t want to) install TA-Lib, check out stockstats.

Eric D. Brown , D.Sc. has a doctorate in Information Systems with a specialization in Data Sciences, Decision Support and Knowledge Management. He writes about utilizing python for data analytics at pythondata.com and the crossroads of technology and strategy at ericbrown.com

pandas Cheat Sheet (via yhat)

pandas cheat sheetThe folks over at yhat just released a cheat sheet for pandas.  You can download the cheat sheet in PDF for here.

There’s a couple important functions that I use all the time missing from their cheat sheet (actually….there are a lot of things missing, but its a great starter cheat sheet).

A few things that I use all the time with pandas dataframes that are worth collecting in one place are provided below.

Renaming columns in a pandas dataframe:

Iterating over a pandas dataframe:

Splitting pandas dataframe into chunks:

The function plus the function call will split a pandas dataframe (or list for that matter) into NUM_CHUNKS chunks. I use this often when working with the multiprocessing libary.

Accessing the value of a specific cell:

This will give you the value of the last row’s “COLUMN” cell.  This may not be the ‘best’ way to do it, but it gets the value

Getting rows matching a condition:

The below will get all rows in a pandas dataframe that match the criteria.  In addition to finding equality, you can do all the logical operators.

Getting rows matching multiple conditions:

This gets rows that match a criteria in COLUMN1 and those that match another criteria in COLUMN2

Eric D. Brown , D.Sc. has a doctorate in Information Systems with a specialization in Data Sciences, Decision Support and Knowledge Management. He writes about utilizing python for data analytics at pythondata.com and the crossroads of technology and strategy at ericbrown.com