Note: There’s been some questions (and some issues with my original code). I’ve uploaded a jupyter notebook with corrected code for Part 1 and Part 2. The notebook can be found here.
This is part 1 of a series where I look at using Prophet for Time-Series forecasting in Python
A lot of what I do in my data analytics work is understanding time series data, modeling that data and trying to forecast what might come next in that data. Over the years I’ve used many different approaches, library and modeling techniques for modeling and forecasting with some success…and a lot of failure.
Recently, I’ve been looking for a simpler approach for my initial modeling and think I’ve found a very nice library in Facebook’s Prophet (available for both python and R). While this particular library isn’t terribly robust, it is quick and gives some very good results for that initial pass at modeling / forecasting time series data. An added bonus with Prophet for those that like to understand the theory behind things is this white paper with a very good description of the math / statistical approach behind Prophet.
If you are interested in learning more about time-series forecasting, check out the books / websites below.
- Introduction to Time Series and Forecasting
- Modeling Techniques in Predictive Analytics with Python and R: A Guide to Data Science
- Forecasting: principles and practice
- Time-Critical Decision Making for Business Administration
To get started with Prophet, you’ll first need to install it (of course).
Installation instructions can be found here, but it should be as easy as doing the following (if you have an existing system that has the proper compilers installed):
pip install fbprophet
For those running conda, you can install prophet via conda-forge using the following command:
conda install -c conda-forge fbprophet
Note: Prophet requres pystan, so you may need to also do the following (although in my case, it was installed as a requirement of fbprophet):
pip install pystan
Pystan documentation can be found here.
Using Prophet is extremely straightforward. You import it, load some data into a pandas dataframe, set the data up into the proper format and then start modeling / forecasting.
First, import the module (plus some other modules that we’ll need):
from fbprophet import Prophet import numpy as np import pandas as pd
Now, let’s load up some data. For this example, I’m going to be using the retail sales example csv file find on github.
sales_df = pd.read_csv('../examples/retail_sales.csv')
Now, we have a pandas dataframe with our data that looks something like this:
Note the format of the dataframe. This is the format that Prophet expects to see. There needs to be a ‘ds’ column that contains the datetime field and and a ‘y’ column that contains the value we are wanting to model/forecast.
Before we can do any analysis with this data, we need to log transform the ‘y’ variable to a try to convert non-stationary data to stationary. This also converts trends to more linear trends (see this website for more info). This isn’t always a perfect way to handle time-series data, but it works often enough that it can be tried initially without much worry.
To log-tranform the data, we can use np.log() on the ‘y’ column like this:
sales_df['y_orig'] = sales_df['y'] # to save a copy of the original data..you'll see why shortly. # log-transform y sales_df['y'] = np.log(sales_df['y'])
Your dataframe should now look like the following:
Its time to start the modeling. You can do this easily with the following command:
model = Prophet() #instantiate Prophet model.fit(sales_df); #fit the model with your dataframe
If you are running with monthly data, you’ll most likely see the following message after you run the above commands:
Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
You can ignore this message since we are running monthly data.
Now its time to start forecasting. With Prophet, you start by building some future time data with the following command:
future_data = model.make_future_dataframe(periods=6, freq = 'm')
In this line of code, we are creating a pandas dataframe with 6 (periods = 6) future data points with a monthly frequency (freq = ‘m’). If you’re working with daily data, you wouldn’t want include freq=’m’.
Now we forecast using the ‘predict’ command:
forecast_data = model.predict(future_data)
If you take a quick look at the data using .head() or .tail(), you’ll notice there are a lot of columns in the forecast_data dataframe. The important ones (for now) are ‘ds’ (datetime), ‘yhat’ (forecast), ‘yhat_lower’ and ‘yhat_upper’ (uncertainty levels).
You can view only these columns in a .tail() by running the following command.
forecast_data[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()
Your dataframe should look like:
Let’s take a look at a graph of this data to get an understanding of how well our model is working.
That looks pretty good. Now, let’s take a look at the seasonality and trend components of our /data/model/forecast.
Since we are working with monthly data, Prophet will plot the trend and the yearly seasonality but if you were working with daily data, you would also see a weekly seasonality plot included.
From the trend and seasonality, we can see that the trend is a playing a large part in the underlying time series and seasonality comes into play more toward the beginning and the end of the year.
So far so good. With the above info, we’ve been able to quickly model and forecast some data to get a feel for what might be coming our way in the future from this particular data set.
Before we go on to tweaking this model (which I’ll talk about in my next post), I wanted to share a little tip for getting your forecast plot to display your ‘original’ data so you can see the forecast in ‘context’ and in the original scale rather than the log-transformed data. You can do this by using np.exp() to get our original data back.
forecast_data_orig = forecast_data # make sure we save the original forecast data forecast_data_orig['yhat'] = np.exp(forecast_data_orig['yhat']) forecast_data_orig['yhat_lower'] = np.exp(forecast_data_orig['yhat_lower']) forecast_data_orig['yhat_upper'] = np.exp(forecast_data_orig['yhat_upper'])
Let’s take a look at the forecast with the original data:
Something looks wrong (and it is)!
Our original data is drawn on the forecast but the black dots (the dark line at the bottom of the chart) is our log-transform original ‘y’ data. For this to make any sense, we need to get our original ‘y’ data points plotted on this chart. To do this, we just need to rename our ‘y_orig’ column in the sales_df dataframe to ‘y’ to have the right data plotted. Be careful here…you want to make sure you don’t continue analyzing data with the non-log-transformed data.
sales_df['y_log']=sales_df['y'] #copy the log-transformed data to another column sales_df['y']=sales_df['y_orig'] #copy the original data to 'y'
There we go…a forecast for retail sales 6 months into the future (you have to look closely at the very far right-hand side for the forecast). It looks like the next six months will see sales between 450K and 475K.
Check back soon for my next post on using Prophet for forecasting time-series data where I talk about how to tweak the models that come out of prophet.
A very helpful post. Just tried Prophet out on some atmospheric measurements. Looks promising.
Glad it was helpful Geoff. I’ve been using Prophet more and more these days with really good results
Thank you for providing this tutorial. Can you tell me what is the black on the first chart?
The black dots are the sales values (actuals)
hi Eric, this is very helpful.
I would also like to know, how can we incorporate it with multiple SKUs, therefore I would like to get the forecast result for each SKU, so I have 3 columns, [SKU], [ds] and [y] in one single CSV.
You will need to run a forecast for each SKU. Currently (as far as I’m aware), Prophet is univariate only.
I wonder if we make a function definition (def), we might be able run multiple SKUs through loop. But still not aware how to do this as I am new to python.
You’d need to run a for loop to run a prophet forecast for each sku. You could do that with a function.
Hi Eric, thanks for the very nice tutorial! Extremely helpful!
One problem I seem to have is that when I transform back to the original data and plot the forecast, I do not get your final plot, rather still get the data points for the log-transformed data.
Do you have any idea about why that should be the case?
Hi Alberto –
I can’t really say what the issue is there. It could be a matter of over-riding your original data but I can’t say for certain. I’ve started taking the approach outlined in this post to view Prophet data -> http://pythondata.wpengine.com/forecasting-time-series-data-with-prophet-part-2/.
Great post! Just one small issue. Finally while transforming the original data from log to actual scale you modify the input data frame: sales_df.
sales_df[‘y_log’]=sales_df[‘y’] #copy the log-transformed data to another column
sales_df[‘y’]=sales_df[‘y_orig’] #copy the original data to ‘y’.
However, shouldn’t you modify forecast_data_orig? I think that’s the reason Alberto’s code is not displaying the original y values in normal scale.
I think you are right Matt. I’ll find some time to go back through and review/revise this at some point.
Quick typo correction, at
> Let’s take a look at the forecast with the original data:
`forecast_data = m.predict(future_data)`
I think it should be:
`forecast_data = model.predict(future_data)`
Also, I don’t fully understand the need to log transform the data first. It appears I get nearly the same results on my dataset (Weekly Gross Sales, 3 years of back data).
Any ideas why this could be or where my naiveté is?
The log transform is a way to de-trend time series data. If your data doesn’t have any trends in it, taking a log transform may not anything noticeable.
Hi everyone – I found a few mistakes in this code. Take a look at this jupyter notebook for full working code (and explanations).
plot() was throwing error, any ideas on how to resolve this issue ( Environment : Jupyter with python 3 )
TypeError: ufunc ‘isfinite’ not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ”safe”
Check out my latest jupyter notebook here -> https://github.com/urgedata/pythondata/blob/master/fbprophet/fbprophet_part_one.ipynb. This worked with Python 2.7…haven’t tested it on 3.x
Does Prophet works with hourly data? I have hourly data for a building power consumption and want to forecast values for the next 24 hours. I have daily seasonality, and apparently daily_seasonality is not supported but I could be wrong..
I appreciate your feedback!
That’s a good question. I haven’t looked at hourly date in Prophet yet. I’ll play around with it and see what I can see.
I tried this:
model = Prophet()
future_data = model.make_future_dataframe(periods=24, freq =’H’)
but the results doesn’t look right…
It looks like the current release version doesn’t handle hourly date (https://github.com/facebookincubator/prophet/issues/118) but v0.2 will provide hourly data support (https://github.com/facebookincubator/prophet/issues/29).
Thanks, V0.2 was released on Sep 12 🙂
I gave it a try. I think it works but my forecast values doesn’t look accurate. They seem to follow the same pattern (trend) every day. Any idea what could be happening? Sorry I tried to add an image to my comment but looks that’s not supported.
Just sent you an email. Happy to take a look at the chart/image if you want to send it over.
What are the other options with “freq” in the below line :
future_data = model.make_future_dataframe(periods=6, freq = ‘m’)
Does w stands for weekly OR for annually is it “a” or “y”
Thanks in advance
it can be any valid frequency for the pandas ‘date_range’. d=daily, m=monthly, w=weekly, etc. You can read more here. http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases
how can the accuracy of predictions done by fbprophet be shown??
or does it have a standard accuracy methods
prophet has no built in metrics but you can still calculate MAE, etc following these steps http://184.108.40.206/forecasting-time-series-data-prophet-part-4/
If I log transform my data, the prediction values do not make any sense. But without log transformation, I have pretty good graphs, What could be the reason for that?
You have to re-transform the data back to your original scale to make sense of the data. If you log transform with log() then ‘untransform’ with exp() to get the original data back.
Your graphs might look nice without log transforming – and the forecasts might be correct – but you can’t know for sure without doing some additional statistical analysis whether you’ve considered stationarity, etc (Which log transforming helps with).
Is Prophet suitable for data sampled ever 5 seconds over a day. Need to forecast 60 to 300 seconds out
Prophet has the ability to set the frequency. https://facebook.github.io/prophet/docs/non-daily_data.html
I’ve not used it for sub-daily data but the functionality is there.
here fbprophet only gives results upon train cases and but what about test cases?
prophet allows you to run training data and test data. All you have to do is split your data appropriately and compare predicted values versus actual values.
I’m using daily data and would like to allow users to input future dates and ratios that would impact the forecast. Will Prophet allow for it? Do changepoints allow users to input the future dates?
You can run prophet with any inputs. You’d need to re-run Prophet if data changes. Changepoints can be entered by anyone but you’d need to have a front-end that would allow users to input data into the system to then have a forecast run via Prophet.
Does Prophet work on Multiple regression model?
Prophet is Univariate only. There’s been talk of adding multivariate to it but I haven’t seen anything on that yet.
The tutorial was very helpful, Sir how can I perform the multivariate time series analysis. If yes which model(Arima or VAR) to apply and how to do it. I need some information about this from your side.
I would suggest you go read up on the various approaches for time series forecasting. There’s not one ‘right’ answer (e.g., use Arima, etc). If I have some time, I might add some tutorials on the site on forecasting with various methods.
Does Prophet support use of additional predictor variables such a promotion discount, events and holidays that are national or regional?
you can use holidays, but you can only use one variable for predicting. It is not multi-variate (although I have heard people talking about the possibility of multi-variate coming)