# Month: December 2016

## Stockstats – Python module for various stock market indicators

I’m always working with stock market data and stock market indicators. During this work, there’s times that I need to calculate things like Relative Strength Index (RSI), Average True Range (ATR), Commodity Channel Index (CCI) and other various indicators and stats.

My go-to for this type of work is TA-Lib and the python wrapper for TA-Lib but there’s times when I can’t install and configure TA-Lib on a computer. When this occurs, I then have to go find the various algorithms to calculate the various indicators / stats that I need.  When this happens, I usually end up making mistakes and/or taking longer than I really should to get these algo’s built to use in a project.  Of course I re-use what I can when I can but many times I’ve forgotten that I built an RSI function in the past and recreate.

I found myself in this situation today. I need an RSI calculation for some work I’m doing.  I couldn’t get TA-Lib installed and working on the machine I was working on (no clue what was wrong either) so I decided to write my own indicator.  While looking around the web for a good algorithm to use, I ran across a new module that I hadn’t see before called stockstats.

Stockstats is a wrapper for pandas dataframes and provides the ability to calculate many different stock market indicators / statistics.  The fact that it is a simple wrapper around pandas is ideal since I do 99% of my work within pandas.

To use stockstats, you simply to to ‘convert’ a pandas dataframe to a stockstats dataframe. This can be done like so:

Then, to calculate the RSI for this dataframe, all you need to do is pass a command into the stockstats dataframe.

The above calculates the 14-day RSI for the entire dataframe.

Let’s look at a full example using data from yahoo.

First, import the modules we’ll need:

Pull down all the historical data for the S&P 500 ETF (SPY):

Taking a look at the ‘tail’ of the data gives us something like the data in Table 1.

To calculate RSI, retype the pandas dataframe into a stockstats dataframe and then calculate the 14-day RSI.

With this approach, you end up with some extra columns in your dataframe. These can easily be removed with the ‘del’ command.

With these extra columns removed, you now have the 14-day RSI values a column titled “rsi”.

One caveat on this approach – stockstats seems to take the ‘close’ column. This might or might not be an issue for you if you are wanting to use the Adj Close column provided by yahoo. This is a simple fix (delete the ‘close’ and rename ‘adj close’ to ‘close’).

Stockstats currently has about 26 stats and stock market indicators included. Definitely not as robust as TA-Lib, but it does have the basics. If you are working with stock market data and need some quick indicators / statistics and can’t (or don’t want to) install TA-Lib, check out stockstats.

Eric D. Brown , D.Sc. has a doctorate in Information Systems with a specialization in Data Sciences, Decision Support and Knowledge Management. He writes about utilizing python for data analytics at pythondata.com and the crossroads of technology and strategy at ericbrown.com

## Book Review – Machine Learning With Random Forests And Decision Trees by Scott Hartshorn

I just finished reading Machine Learning With Random Forests And Decision Trees: A Mostly Intuitive Guide, But Also Some Python (amazon affiliate link).

The short review

This is a great introductory book for anyone looking to learn more about Random Forests and Decision Trees. You won’t be an expert after reading this book, but you’ll understand the basic theory and and how to implement random forests in python.

The long(ish) review

This is a short book – only 76 pages. But…those 76 pages are full of good, introductory information on Random Forests and Decision Trees.  Even though I’ve been using random forests and other machine learning approaches in python for years, I can easily see value for people that are just starting out with machine learning and/or random forests. That said, there were a few things in the book that I had either forgotten or didn’t know (Entropy Criteria for example).

While the entire book is excellent, the section on Feature Importance is the best in the book.  This section provides a very good description of the ‘why’ and the ‘how’ of feature importance (and therefore, feature selection) for use in random forests and decision trees.  There are some very good points made in this section regarding how to get started with feature selection and cross validation.

Additionally, the book provides a decent overview of the idea of ‘out-of-sample’ (or ‘Out-of-bag’) data.  I’m a huge believer in keeping some data out of your initial training data set to use for validation after you’ve built your models.

If you’re looking for a good introductory book on random forests and decision trees, pick this one up ( (amazon affiliate link)) …its only \$2.99 for the kindle version.  Like I mentioned earlier, this book won’t make you an expert but it will provide a solid grounding to get started on the topic of random forests, decision trees and machine learning.

One negative comment I have on this book is that there is very little python in the book. The book isn’t marketed as strictly a python book, but I would have expected a bit more python in the book to help drive home some of the theory with runnable code. That said, this is a very small negative to the book overall.

Eric D. Brown , D.Sc. has a doctorate in Information Systems with a specialization in Data Sciences, Decision Support and Knowledge Management. He writes about utilizing python for data analytics at pythondata.com and the crossroads of technology and strategy at ericbrown.com

## Vagrant on Windows

There are many different ways to install python and work with python on Windows. You can install Canopy or Anaconda to have an entire python ecosystem self-contained or you can install python directly onto your machine and configure all the bits and bytes yourself. My current recommendation is to use Vagrant on Windows combined with Virtualbox to virtualize your development environment.

While I use a mac or the majority of my development, I do find myself using Windows 10 more and more, and may be moving to a Windows machine in the future for my day-to-day laptop.  I have and do use Canopy and/or Anaconda but I’ve recently moved the majority of my python development on Windows into Linux (Ubuntu) virtual machines using Vagrant and Virtualbox. You can use other products like VMWare’s virtual machine platform, but Virtualbox is free and does a good enough job for day-to-day development.1

One Caveat: if you’re doing processor / memory intensive development with python, this may not be the best option for you. That said, it can work for those types of development efforts if you configure your virtual machine with enough RAM and processors.

To get started, you’ll need to download and install Vagrant and Virtualbox for your machine.   I am using Vagrant 1.90 and Virtualbox 5.1.10 at the time of this post.

Feel free to ‘run’ either of the programs, but there’s no need to enter either program just yet.    To really use the Vagrant and the linux virtual machine, you’ll need to download a *nix emulator to allow you to do the things you need to with vagrant.

I use Git’s “bash” to interface with my virtual machines and Vagrant.  You could use putty or CygWin or any other emulator, but I’ve found Git’s bash to be the easiest and simplest to install and use.  Jump over and download git for your machine and install it. At the time of writing, I’m using Git 2.11.0.

While installing Git, I recommend leaving everything checked on the ‘select’ components window if you don’t currently have any git applications installed. If you want to use other git applications, you can uncheck the “associate .git* configuration files…” option.  There is one ‘gotcha’ when installing git that you should be aware of.

On the “adjusting your path” section (see figure 1), you’ll need to think about how you want to use git on the command line.

I selected the third option when installing git. I do not use git from the windows command line though…I use a git GUI along with git from the command line within my virtual environment.

Another screen to consider is the “Configuring the terminal emulator…” screen (figure 2).  I selected and use the MinTTY option because it gives me a much more *nix feel. This is personal preference. If you are going to be doing a lot of interactive python work in the console, you might want to select the 2nd option to use the windows default console window.

During the remainder of the installation, I left the rest of the options at the defaults.

Now that git (and bash) is installed, you can launch Git Bash to start working with Vagrant. You should see a window similar to Figure 3.

From this point, you can do your ‘vagrant init’, ‘vagrant up’ and ‘vagrant ssh’ to initialize, create and ssh into your vagrant machine.

### Setting up Vagrant on Windows

For those of you that haven’t used Vagrant in the past, here’s how I set it up and use it. I generally use vagrant in this way to run jupyter, so I’ll walk you through setting things up for jupyter, pandas, etc.

First, set up a directory for your project. At the Bash command line, change into the directory you want to work from and type “mkdir vagrant_project” (or whatever name you want to use). Now, initialize your vagrant project by typing:

This will create a Vagrantfile in the directory you’re in. This will allow you to set the configuration of your virtual machine and Vagrant. The Vagrantfile should look something like this:

Before we go any further, open up your Vagrantfile and change the following line:

change “base” to “ubuntu/xenial64” to run Ubuntu 16.04. The line should now read:

If you want to run other flavors of linux or other OS’s, you can find others at https://atlas.hashicorp.com/search.

Since I’m setting this VM up to work with jupyter, I also want to configure port forwarding in the Vagrantfile. Look for the line that reads:

and add a line directly below that line to read:

This addition creates a forwarded port on your system from port 8888 on your host (your windows machine) to port 8888 on your guest (the virtual machine). This will allow you to access your jupyter notebooks from your Windows browser.

At this point, you could also configure lots of other vagrant options, but these are the bare minimums that you need to get started.

At the Bash command line, you can now type “vagrant up” to build your virtual machine. Since this is the first time you’ve run the command on this directory, it will go out and download the ubuntu/xenial64 ‘box’ and then build the virtual machine with the defaults.  You might see a Windows alert asking to ‘approve’ vagrant to make some changes…go ahead and allow that.

Once the ‘vagrant up’ command is complete, you should see something similar to Figure 5 below.

Now, you can ‘vagrant ssh’ to get into the virtual machine.  You should then see something similar to Figure 6. Now your running vagrant on windows!

One of the really cool things that vagrant does by default is set up shared folders. This allows you to do your development work in your favorite IDE or editor and have the changes show up automatically in your vagrant virtual machine.

At the Bash command line, type:

You should see a directory listing that has your Vagrantfile and a log file. If you visit your project directory using windows explorer, you should see the same two files. Shared folders for the win! I know its just a small thing, but it makes things easier for initial setup.

You now have vagrant on windows!

### Configure the Python Environment

Time to set up your python environment.

First, install pip.

Even though you’ve set up a virtual machine for development, it is still a good idea to use virtualenv to separate multiple projects requirements.  Install install virtualenv  with the following command:

In your project directory, set up your virtual environment by typing:

Note: You may run unto an error while running this command. It will be something like like the message below:

If this happens, delete the ‘env’ folder and then add ‘–always-copy’ to the command and re-run it. See here for more details.

Activate your virtualenv by typing:

We’re ready to install pandas and jupyter using the command below. This will install both modules as well as their dependencies.

Now you’re ready to run jupyter.

In the above command, we start jupyter notebook with an extra config line of ‘–ip=0.0.0.0’. This tells jupyter to listen on any IP address. It may not always be necessary, but I find it cuts out a lot of issues when I’m running it in vagrant like this.

In your windows browser, visit ‘http://localhost:8888/tree’ and  – assuming everything went the way it should – you should see your jupyter notebook tree.

From here, you can create your notebooks and run them just like you would with any other platform.

Eric D. Brown , D.Sc. has a doctorate in Information Systems with a specialization in Data Sciences, Decision Support and Knowledge Management. He writes about utilizing python for data analytics at pythondata.com and the crossroads of technology and strategy at ericbrown.com