Vagrant on Windows

There are many different ways to install python and work with python on Windows. You can install Canopy or Anaconda to have an entire python ecosystem self-contained or you can install python directly onto your machine and configure all the bits and bytes yourself. My current recommendation is to use Vagrant on Windows combined with Virtualbox to virtualize your development environment.

While I use a mac or the majority of my development, I do find myself using Windows 10 more and more, and may be moving to a Windows machine in the future for my day-to-day laptop.  I have and do use Canopy and/or Anaconda but I’ve recently moved the majority of my python development on Windows into Linux (Ubuntu) virtual machines using Vagrant and Virtualbox. You can use other products like VMWare’s virtual machine platform, but Virtualbox is free and does a good enough job for day-to-day development.1

One Caveat: if you’re doing processor / memory intensive development with python, this may not be the best option for you. That said, it can work for those types of development efforts if you configure your virtual machine with enough RAM and processors.

To get started, you’ll need to download and install Vagrant and Virtualbox for your machine.   I am using Vagrant 1.90 and Virtualbox 5.1.10 at the time of this post.

Feel free to ‘run’ either of the programs, but there’s no need to enter either program just yet.    To really use the Vagrant and the linux virtual machine, you’ll need to download a *nix emulator to allow you to do the things you need to with vagrant.

I use Git’s “bash” to interface with my virtual machines and Vagrant.  You could use putty or CygWin or any other emulator, but I’ve found Git’s bash to be the easiest and simplest to install and use.  Jump over and download git for your machine and install it. At the time of writing, I’m using Git 2.11.0.

While installing Git, I recommend leaving everything checked on the ‘select’ components window if you don’t currently have any git applications installed. If you want to use other git applications, you can uncheck the “associate .git* configuration files…” option.  There is one ‘gotcha’ when installing git that you should be aware of.

On the “adjusting your path” section (see figure 1), you’ll need to think about how you want to use git on the command line.

git command line path
Figure 1: Adjusting your path

I selected the third option when installing git. I do not use git from the windows command line though…I use a git GUI along with git from the command line within my virtual environment.

Another screen to consider is the “Configuring the terminal emulator…” screen (figure 2).  I selected and use the MinTTY option because it gives me a much more *nix feel. This is personal preference. If you are going to be doing a lot of interactive python work in the console, you might want to select the 2nd option to use the windows default console window.

Configuring your terminal emulator
Figure 2: Configuring your terminal emulator

During the remainder of the installation, I left the rest of the options at the defaults.

Now that git (and bash) is installed, you can launch Git Bash to start working with Vagrant. You should see a window similar to Figure 3.

Git Bash
Figure 3: Git Bash

From this point, you can do your ‘vagrant init’, ‘vagrant up’ and ‘vagrant ssh’ to initialize, create and ssh into your vagrant machine.

Setting up Vagrant on Windows

For those of you that haven’t used Vagrant in the past, here’s how I set it up and use it. I generally use vagrant in this way to run jupyter, so I’ll walk you through setting things up for jupyter, pandas, etc.

First, set up a directory for your project. At the Bash command line, change into the directory you want to work from and type “mkdir vagrant_project” (or whatever name you want to use). Now, initialize your vagrant project by typing:

vagrant init

This will create a Vagrantfile in the directory you’re in. This will allow you to set the configuration of your virtual machine and Vagrant. The Vagrantfile should look something like this:

VagrantFile Example
Figure 4: VagrantFile Example

Before we go any further, open up your Vagrantfile and change the following line:

config.vm.box = "base"

change “base” to “ubuntu/xenial64” to run Ubuntu 16.04. The line should now read:

config.vm.box = "ubuntu/xenial64"

If you want to run other flavors of linux or other OS’s, you can find others at https://atlas.hashicorp.com/search.

Since I’m setting this VM up to work with jupyter, I also want to configure port forwarding in the Vagrantfile. Look for the line that reads:

# config.vm.network "forwarded_port", guest: 80, host: 8080

and add a line directly below that line to read:

config.vm.network "forwarded_port", guest: 8888, host: 8888

This addition creates a forwarded port on your system from port 8888 on your host (your windows machine) to port 8888 on your guest (the virtual machine). This will allow you to access your jupyter notebooks from your Windows browser.

At this point, you could also configure lots of other vagrant options, but these are the bare minimums that you need to get started.

At the Bash command line, you can now type “vagrant up” to build your virtual machine. Since this is the first time you’ve run the command on this directory, it will go out and download the ubuntu/xenial64 ‘box’ and then build the virtual machine with the defaults.  You might see a Windows alert asking to ‘approve’ vagrant to make some changes…go ahead and allow that.

Once the ‘vagrant up’ command is complete, you should see something similar to Figure 5 below.

Output of Vagrant Up
Figure 5: Output of ‘vagrant up’

Now, you can ‘vagrant ssh’ to get into the virtual machine.  You should then see something similar to Figure 6. Now your running vagrant on windows!

Vagrant SSH - Vagrant on Windows
Figure 6: Output of ‘vagrant ssh’

One of the really cool things that vagrant does by default is set up shared folders. This allows you to do your development work in your favorite IDE or editor and have the changes show up automatically in your vagrant virtual machine.

At the Bash command line, type:

cd /vagrant

You should see a directory listing that has your Vagrantfile and a log file. If you visit your project directory using windows explorer, you should see the same two files. Shared folders for the win! I know its just a small thing, but it makes things easier for initial setup.

You now have vagrant on windows!

Configure the Python Environment

Time to set up your python environment.

First, install pip.

sudo apt install python-pip

Even though you’ve set up a virtual machine for development, it is still a good idea to use virtualenv to separate multiple projects requirements.  Install install virtualenv  with the following command:

sudo apt install virtualenv

In your project directory, set up your virtual environment by typing:

virtualenv env

Note: You may run unto an error while running this command. It will be something like like the message below:

OSError: [Errno 71] Protocol error

If this happens, delete the ‘env’ folder and then add ‘–always-copy’ to the command and re-run it. See here for more details.

#### run this only if you had an error in the step above
virtualenv env --always-copy

Activate your virtualenv by typing:

source env/bin/activate

We’re ready to install pandas and jupyter using the command below. This will install both modules as well as their dependencies.

pip install pandas jupyter

Now you’re ready to run jupyter.

jupyter notebook --ip=0.0.0.0

In the above command, we start jupyter notebook with an extra config line of ‘–ip=0.0.0.0’. This tells jupyter to listen on any IP address. It may not always be necessary, but I find it cuts out a lot of issues when I’m running it in vagrant like this.

In your windows browser, visit ‘http://localhost:8888/tree’ and  – assuming everything went the way it should – you should see your jupyter notebook tree.

Jupyter via Vagrant VM
Figure 7: Jupyter via Vagrant VM

From here, you can create your notebooks and run them just like you would with any other platform.

 

pandas Cheat Sheet (via yhat)

pandas cheat sheetThe folks over at yhat just released a cheat sheet for pandas.  You can download the cheat sheet in PDF for here.

There’s a couple important functions that I use all the time missing from their cheat sheet (actually….there are a lot of things missing, but its a great starter cheat sheet).

A few things that I use all the time with pandas dataframes that are worth collecting in one place are provided below.

Renaming columns in a pandas dataframe:

df.rename(columns={'col1': 'Column_1', 'col2': 'Column_2'}, inplace=True)

Iterating over a pandas dataframe:

for index, row in df.iterrows():
 * DO STUFF

Splitting pandas dataframe into chunks:

The function plus the function call will split a pandas dataframe (or list for that matter) into NUM_CHUNKS chunks. I use this often when working with the multiprocessing libary.

# This function creates chunks and returns them
def chunkify(lst,n):
    return [ lst[i::n] for i in xrange(n) ]

chunks = chunkify(df, NUMCHUNKS)

Accessing the value of a specific cell:

This will give you the value of the last row’s “COLUMN” cell.  This may not be the ‘best’ way to do it, but it gets the value

df.COLUMN.tail(1).iloc[0]

Getting rows matching a condition:

The below will get all rows in a pandas dataframe that match the criteria.  In addition to finding equality, you can do all the logical operators.

df[df.COLUMN == Criteria]

Getting rows matching multiple conditions:

This gets rows that match a criteria in COLUMN1 and those that match another criteria in COLUMN2

 df[(df.COLUMN1 == Criteria) & (df.COLUMN2 == Criteria_2) ]

Getting the ‘next’ row of data in a pandas dataframe

I’m currently working with stock market trade data that is output from a backtesting engine (I’m working with backtrader currently) in a pandas dataframe.  The format of the ‘transcations’ data that is provided out of the backtesting engine is shown below.

amountpricevalue
date
2016-01-07 00:00:00+00:0079.017119195.33-15434.413883
2016-09-07 00:00:00+00:00-79.017119218.8417292.106354
2016-09-20 00:00:00+00:0082.217609214.41-17628.277649
2016-11-16 00:00:00+00:00-82.217609217.5617887.263119

The data provided gives four crucial pieces of information:

  • Date – The date of a transaction.
  • Amount – the number of shares purchased (positive number) or sold (negative number)  during the transaction.
  • Price – the price received or paid at the time of the sale.
  • Value – the cash value of the transaction.

Backtrader’s transactions dataframe is comprised of two rows make for one one transaction (the first is the ‘buy’ the second is the ‘sell).  For example, in the data above, the first two rows (Jan 7 2016 and Sept 7th 2016) are the ‘buy’ data and ‘sell’ data for one transaction. What I need to do with this data is transform it (using that term loosely) into one row of data for each transaction to store into  database for use in another analysis.

I could leave it in its current form, but I prefer to store transactions in one row when dealing with market backtests.

There are a few ways to attack this particular problem.  You could iterate over the dataframe and manually pick each row. That would be pretty straightforward, but not necessarily the best way.

While looking around the web for some pointers, I stumbled across this answer that does exactly what I need to do.   I added the following code to my script and — voila — I have my transactions transformed from two rows per transaction to one row.

trade_output=[]
from itertools import tee, izip

def next_row(iterable):
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)

for (buy_index, buy_data), (sell_index, sell_data) in next_row(transactions.iterrows()):
    if buy_data['amount']>0:
        trade_output.append((buy_index.date(), buy_data["amount"], buy_data['price'],\
                             sell_index.date(), sell_data["amount"], sell_data['price']))

Note: In the above, I only want to build rows that start with a positive amount in the ‘amount’ column because amount in the first row of a transactions is always positive. I then append each transformed transaction into an array to be used for more analysis down at a later point in the script.