This is the fourth in a series of posts about using Forecasting Time Series data with Prophet. The other parts can be found here:

- Forecasting Time Series data with Prophet – Part 1
- Forecasting Time Series data with Prophet – Part 2
- Forecasting Time Series data with Prophet – Part 3

In those previous posts, I looked at forecasting monthly sales data 24 months into the future using some example sales data that you can find here.

In this post, I want to look at the output of Prophet to see how we can apply some metrics to measure ‘accuracy’. When we start looking at ‘accuracy’ of forecasts, we can really do a whole lot of harm by using the wrong metrics and the wrong data to measure accuracy. That said, its good practice to always try to compare your predicted values with your actual values to see how well or poorly your model(s) are performing.

For the purposes of this post, I’m going to expand on the data in the previous posts. For this post we are using fbprophet version 0.2.1. Also – we’ll need scikit-learn and scipy installed for looking at some metrics.

Note: While I’m using Prophet to generate the models, these metrics and tests for accuracy can be used with just about any modeling approach.

Since the majority of the work has been covered in Part 3, I’m going to skip down to the metrics section…you can see the entire code and follow along with the jupyter notebook here.

In the notebook, we’ve loaded the data. The visualization of the data looks like this:

Our prophet model forecast looks like:

Again…you can see all the steps in thejupyter notebook if you want to follow along step by step.

Now that we have a prophet forecast for this data, let’s combine the forecast with our original data so we can compare the two data sets.

metric_df = forecast.set_index('ds')[['yhat']].join(df.set_index('ds').y).reset_index()

The above line of code takes the actual forecast data ‘yhat’ in the forecast dataframe, sets the index to be ‘ds’ on both (to allow us to combine with the original data-set) and then joins these forecasts with the original data. lastly, we reset the indexes to get back to the non-date index that we’ve been working with (this isn’t necessary…just a step I took).

The new dataframe looks like this:

You can see from the above, that the last part of the dataframe has “NaN” for ‘y’…that’s fine because we are only concerned about checking the forecast values versus the actual values so we can drop these “NaN” values.

metric_df.dropna(inplace=True)

Now, we have a dataframe with just the original data (in the ‘y’ column) and forecasted data (in the yhat column) to compare.

Now, we are going to take a look at a few metrics.

### Metrics for measuring modeling accuracy

If you ask 100 different statisticians, you’ll probably get at least 50 different answers on ‘the best’ metrics to use for measuring accuracy of models. For most cases, using either R-Squared, Mean Squared Error and Mean Absolute Error (or a combo of them all) will get you a good enough measure of the accuracy of your model.

For me, I like to use R-Squared and Mean Absolute Error (MAE). With these two measures, I feel like I can get a really good feel for how well (or poorly) my model is doing.

Python’s ScitKit Learn has some good / easy methods for calculating these values. To use them, you’ll need to import them (and have scitkit-learn and scipy installed). If you don’t have scitkit-learn and scipy installed, you can do so with the following command:

pip install scikit-learn scipy

Now, you can import the metrics with the following command:

from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error

To calculate R-Squared, we simply do the following:

r2_score(metric_df.y, metric_df.yhat)

For this data, we get an R-Squared value of 0.99. Now…this is an amazing value…it can be interpreted to mean that 99% of the variance in this data is explained by the model. Pretty darn good (but also very very naive in thinking). When I see an R-Squared value like this, I *immediately* think that the model has been overfit. If you want to dig into a good read on what R-Squared means and how to interpret it, check out this post.

Now, let’s take a look at MSE.

mean_squared_error(metric_df.y, metric_df.yhat)

The MSE turns out to be 11,129,529.44. That’s a huge value…an MSE of 11 million tells me this model isn’t all that great, which isn’t surprising given the low number of data points used to build the model. That said, a high MSE isn’t a bad thing necessarily but it give you a good feel for the accuracy you can expect to see.

Lastly, let’s take a look at MAE.

mean_absolute_error(metric_df.y, metric_df.yhat)

For this model / data, the MAE turns out to be 2,601.15, which really isn’t all that bad. What that tells me is that for each data point, my average magnitude of error is roughly $2,600, which isn’t all that bad when we are looking at sales values in the $300K to $500K range. BTW – if you want to take a look at an interesting comparison of MAE and RMSE (Root Mean Squared Error), check out this post.

Hopefully this has been helpful. It wasn’t the intention of this post to explain the intricacies of these metrics, but hopefully you’ve seen a bit about how to use metrics to measure your models. I may go into more detail on modeling / forecasting accuracies in the future at some point. Let me know if you have any questions on this stuff…I’d be happy to expand if needed.

Note: In the jupyter notebook, I show the use of a new metrics library I found called ML-Metrics. Check it out…its another way to run some of the metrics.

If you want to learn more about time series forecating, here’s a few good books on the subject. *These are Amazon links…I’d appreciate it if you used them if you purchase these books as the little bit of income that comes from these links helps pay for the server this blog runs on.*

- Introduction to Time Series and Forecasting
- Time Series Analysis: Forecasting and Control
- Applied Predictive Modeling

[…] {‘Planet Python’: {1: {‘title’: ‘Django Weblog: Django bugfix releases: 2.0.1 and 1.11.9’, ‘link’: ‘https://www.djangoproject.com/weblog/2018/jan/01/bugfix-releases/’}, 2: {‘title’: ‘Python Data: Forecasting Time Series data with Prophet – Part 4’, ‘link’: ‘http://142.93.195.192/forecasting-time-series-data-prophet-part-4/’}, …} […]

Thanks for your posts on Prophet! they are very illustrative.

Now, how do you actually get the coefficients for each predictor, and the type of model (linear, logistic, etc) used for each one?

Also, how do you get the dates for the changepoints?

Prophet is a self-contained modeling library so you aren’t going to get things like coefficients easily. Mostly, you are going to get the actual predicted values along with a lot of other info like seasonality, etc. The model is built on an additive regression model and is linear by default but you can run a logistic version of it.

Regarding changepoints, you can read more about that here: http:/pythondata.com/forecasting-time-series-data-prophet-trend-changepoints/

Phenomenal series of posts. Thank you for taking your time with this and going into detail and preferences! This is especially applicable on this last post, where you get into the concept of overfitting. Can you share a resource or your ideas on how to account for and avoid overfitting when fitting this model? I have seen some toggle with the ‘checkpoints_prior_scale’ parameter. Do you recommend an iterative pass through a series of values there and subsequent calcing of the MAE and R-squared?

[…] or other types of series), people look to things like basic regression, ARIMA, ARMA, GARCH, or even Prophet but don’t discount the use of Random Forests for forecasting […]

[…] or other types of series), people look to things like basic regression, ARIMA, ARMA, GARCH, or even Prophet but don’t discount the use of Random Forests for forecasting […]

Hello Eric, you have described how to get fbprophet running, how to use the tool, how to visualize the data and how to measure accuracy. That is a great help. I am curious how to compare the fbprophet results of multiple time series programmatically regarding their seasonal behavior by considering accuracy of the model and the significance of the result (or the strength of the seasonal effect). For example, you have a shop and you are selling different products. You want to find out if the sales of the products have a significant seasonality during the year without considering their… Read more »

The normalized seasonal strength would be a tough one to make recommendations on since that can mean so many different things to different people. The process itself would be straightforward (I haven’t tested this, but just thinking it through): Using a for loop: run the prophet forecast measure accuracy using your preferred method(s) [e.g., MSE, r-squared, etc) Calculate seasonal strength parameters per your preferred method/approach add product name, accuracy value, seasaonl strengths into a list or dataframe – I prefer using lists and then convert the list into a dataframe for further use after the ‘for loop’ repeat for all… Read more »

Hello Eric, thank you for the quick response. My question is actually less regarding the programming process and more about determining if a calculated result is useful or not. FBProphet will give me a result in any case if it makes sense or not. In tutorial 3 you have described possibilities to determine the accuracy of a result. This information could be used as a filter for a comparison of different time series. I am struggling with the part how to figure out the significance of a seasonal pattern and how to compare the result of a single time series… Read more »

Gotcha. Sorry about that misunderstanding.

I like this approach.

The z-score is something worth considering. In fact, it might be useful to do standard deviations (lowerbound, etc) and z-socres as measures of specific month differences vs rest of year. I use z-score values at lot of times when looking at this type of stuff.

Hello Eric, I will give it a try, thank you very much. If you do not mind I would like to ask an additional question. You can check the accuracy of a model following your 3rd tutorial of the FBProphet series. FBProphet decomposes a series by default into a trend, a weekly and a yearly seasonality component. I would assume if the history of the data is sufficient and the accuracy of the model is high enough the yearly seasonality component will be OK if a significant pattern can be observed. However, what about the weekly component? How do you… Read more »

I think the first thing I would do is look at seasonality and trend at the weekly level to see what you can see. There are a couple of good tests you can do for that in my most recent post (https://pythondata.com/stationary-data-tests-for-time-series-forecasting/). Specifically, running a histogram plot and lag plot will let you see what type of distribution and correlation you have on the weekly timeframe. If the histogram looks good and lag plot shows correlation, you could then be comfortable that the weekly data isn’t noise and is providing value and can be comfortable that your weekly modeling is… Read more »

I appreciate your effort and help, Eric. Thanks a lot for all the information.