This is the fourth in a series of posts about using Forecasting Time Series data with Prophet. The other parts can be found here:
- Forecasting Time Series data with Prophet – Part 1
- Forecasting Time Series data with Prophet – Part 2
- Forecasting Time Series data with Prophet – Part 3
In those previous posts, I looked at forecasting monthly sales data 24 months into the future using some example sales data that you can find here.
In this post, I want to look at the output of Prophet to see how we can apply some metrics to measure ‘accuracy’. When we start looking at ‘accuracy’ of forecasts, we can really do a whole lot of harm by using the wrong metrics and the wrong data to measure accuracy. That said, its good practice to always try to compare your predicted values with your actual values to see how well or poorly your model(s) are performing.
For the purposes of this post, I’m going to expand on the data in the previous posts. For this post we are using fbprophet version 0.2.1. Also – we’ll need scikit-learn and scipy installed for looking at some metrics.
Note: While I’m using Prophet to generate the models, these metrics and tests for accuracy can be used with just about any modeling approach.
Since the majority of the work has been covered in Part 3, I’m going to skip down to the metrics section…you can see the entire code and follow along with the jupyter notebook here.
In the notebook, we’ve loaded the data. The visualization of the data looks like this:
Our prophet model forecast looks like:
Again…you can see all the steps in thejupyter notebook if you want to follow along step by step.
Now that we have a prophet forecast for this data, let’s combine the forecast with our original data so we can compare the two data sets.
metric_df = forecast.set_index('ds')[['yhat']].join(df.set_index('ds').y).reset_index()
The above line of code takes the actual forecast data ‘yhat’ in the forecast dataframe, sets the index to be ‘ds’ on both (to allow us to combine with the original data-set) and then joins these forecasts with the original data. lastly, we reset the indexes to get back to the non-date index that we’ve been working with (this isn’t necessary…just a step I took).
The new dataframe looks like this:
You can see from the above, that the last part of the dataframe has “NaN” for ‘y’…that’s fine because we are only concerned about checking the forecast values versus the actual values so we can drop these “NaN” values.
metric_df.dropna(inplace=True)
Now, we have a dataframe with just the original data (in the ‘y’ column) and forecasted data (in the yhat column) to compare.
Now, we are going to take a look at a few metrics.
Metrics for measuring modeling accuracy
If you ask 100 different statisticians, you’ll probably get at least 50 different answers on ‘the best’ metrics to use for measuring accuracy of models. For most cases, using either R-Squared, Mean Squared Error and Mean Absolute Error (or a combo of them all) will get you a good enough measure of the accuracy of your model.
For me, I like to use R-Squared and Mean Absolute Error (MAE). With these two measures, I feel like I can get a really good feel for how well (or poorly) my model is doing.
Python’s ScitKit Learn has some good / easy methods for calculating these values. To use them, you’ll need to import them (and have scitkit-learn and scipy installed). If you don’t have scitkit-learn and scipy installed, you can do so with the following command:
pip install scikit-learn scipy
Now, you can import the metrics with the following command:
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
To calculate R-Squared, we simply do the following:
r2_score(metric_df.y, metric_df.yhat)
For this data, we get an R-Squared value of 0.99. Now…this is an amazing value…it can be interpreted to mean that 99% of the variance in this data is explained by the model. Pretty darn good (but also very very naive in thinking). When I see an R-Squared value like this, I immediately think that the model has been overfit. If you want to dig into a good read on what R-Squared means and how to interpret it, check out this post.
Now, let’s take a look at MSE.
mean_squared_error(metric_df.y, metric_df.yhat)
The MSE turns out to be 11,129,529.44. That’s a huge value…an MSE of 11 million tells me this model isn’t all that great, which isn’t surprising given the low number of data points used to build the model. That said, a high MSE isn’t a bad thing necessarily but it give you a good feel for the accuracy you can expect to see.
Lastly, let’s take a look at MAE.
mean_absolute_error(metric_df.y, metric_df.yhat)
For this model / data, the MAE turns out to be 2,601.15, which really isn’t all that bad. What that tells me is that for each data point, my average magnitude of error is roughly $2,600, which isn’t all that bad when we are looking at sales values in the $300K to $500K range. BTW – if you want to take a look at an interesting comparison of MAE and RMSE (Root Mean Squared Error), check out this post.
Hopefully this has been helpful. It wasn’t the intention of this post to explain the intricacies of these metrics, but hopefully you’ve seen a bit about how to use metrics to measure your models. I may go into more detail on modeling / forecasting accuracies in the future at some point. Let me know if you have any questions on this stuff…I’d be happy to expand if needed.
Note: In the jupyter notebook, I show the use of a new metrics library I found called ML-Metrics. Check it out…its another way to run some of the metrics.
If you want to learn more about time series forecating, here’s a few good books on the subject. These are Amazon links…I’d appreciate it if you used them if you purchase these books as the little bit of income that comes from these links helps pay for the server this blog runs on.
- Introduction to Time Series and Forecasting
- Time Series Analysis: Forecasting and Control
- Applied Predictive Modeling