Category: getting started

Jupyter with Vagrant

I’ve written about using vagrant for 99.9% of my python work on here before (see here and here for examples).   In addition to vagrant, I use jupyter notebooks on 99.9% of the work that I do, so I figured I’d spend a little time describing how I use jupyter with vagrant.

First off, you’ll need to have vagrant set up and running (descriptions for linux, MacOS, Windows).   Once you have vagrant installed, we need to make a few changes to the VagrantFile to allow port forwarding from the vagrant virtual machine to the browser on your computer. If you followed the Vagrant on Windows post, you’ll have already set up the configuration that you need for vagrant to forward the necessary port for jupyter.   For those that haven’t read that post, below are the tweaks you need to make.

My default VagrantFile is shown in figure 1 below.

VagrantFile Example
Figure 1: VagrantFile Example

You’ll only need to change 1 line to get port forwarding working.   You’ll need to change the line that reads:

# config.vm.network "forwarded_port", guest: 80, host: 8080

to the following:

This line will forward port 8888 on the guest to port 8888 on the host. If you aren’t using the default port of 8888 for jupyter, you’ll need to change ‘8888’ to the port you wish to use.

Now that the VagrantFile is ready to go, do a quick ‘vagrant up’ and ‘vagrant ssh’ to start your vagrant VM and log into it. Next, set up any virtual environments that you want / need (I use virtualenv to set up a virtual environment for every project).  You can skip this step if you wish, but it is recommended.

If you set up a virtual environment, go ahead and source into it so that you are using a clean environment and then run the command below to install jupyter. If you didn’t go then you can just run the below to install jupyter.

pip install jupyter

You are all set.  Jupyter should be installed and ready to go. To run it so it is accessible from your browser, just run the following command:

jupyter notebook --ip=0.0.0.0

This command tells jupyter to listen on any IP address.

In your browser,  you should be able to visit your new fangled jupyter (via vagrant) instance by visiting the following url:

http://0.0.0.0:8888/tree

Now you’re ready to go with jupyter with vagrant.


Note: If you are wanting / needing to learn Jupyter, I highly recommend Learning IPython for Interactive Computing and Data Visualization (amazon affiliate link). I recommend it to all my clients who are just getting started with jupyter and ipython.

 


 

Installing Python on OSX (and the necessary modules)

If you need help installing python on OSX, read on.

For the last three years, I’ve used a mac for all my development. I love the fact that everything ‘just works’ on the platform. That said, when you get into scientific computing and data analytics, especially with python, you can  run into some issues.

Just like linux, python is included with the operating system. Unlike linux, this can cause problems long-term for you due to upgrades and changes that Apple may make to the python ecosystem.

On OS X, I recommend those of you starting out to go with Anaconda or Enthought Canopy.  As I said in “Installing python on Windows“, I prefer Canopy over Anaconda for scientific computing / data analytics but either will work for you.  Installing Canopy on the mac is very similar to installing it on Windows…so I’ll let this post be your guide for installing Canopy.

If you want to get into the nitty-gritty and install and configure python and the modules yourself, you can easily do so, but be prepared to spend some time on the command line.

Before we get started installing python on your Mac, we need to install homebrew, which is a package manager for OS X (it acts similar to the ‘apt’ package manager on ubuntu / debian).

To install homebrew, open a terminal and paste the following:

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

This command installs the homebrew ecosystem onto your machine and preps your machine to be ready to install various packages, including python.

Installing Python on OSX

Step 1: Let’s get python installed via homebrew.  In your terminal, type:

brew install python

This will install a version of python onto your machine and set up your environment to use that version. This helps mitigate any issues you might have down the road if / when Apple makes changes to the system provided python.   Additionally, brew installs pip into the system to make it easy to get the necessary modules onto your machine.

From this point on, we are generally going to follow exactly the same steps that I outline in Installing Python on Linux except we don’t need to install any additional tools.

Step 2: Not required, but highly recommended – install a virtual environment.  I recommend virtualenv. Install it with this command:

pip install virtualenv

When you are ready to get started on a new project, type the below command to install python into a new virtual environment (the ‘env’ is the name of the environment). You only have to do this once per project. Note: You should use a folder per project to keep your virtual environments separated.

virtualenv env

Whenever you want to work on a specific project, change into that folder and type the following. This will set up your environment with all of your installed python modules:

source env/bin/activate

For the purpose of this walk-through let’s create a new directory, set up a new virtual environment and then install the necessary modules.

  • Create a folder in your home directory called ‘projects’.
  • Type “mkdir projects” to do this from the command line.
  • Change into that folder and then type “mkdir install_example” to create another folder inside the projects folder.
  • Type “virtualenv env” to create your virtual environment.
  • Type “source env/bin/activate” to begin using this environment

Now that we have our environment ready to go, we need to install some of the modules that are most often used when doing data work inside python. These modules are:

The above modules can be installed with one pip command.

pip install pandas scipy scikitlearn statsmodels sympy matplotlib jupyter

You’re ready to start working with python for data analysis on your mac. Just remember, for each virtualenv you create, you’ll need to reinstall these modules if you wish to use them.

Check back here often for more information on using the above modules to actually DO something.

Installing Python on Linux (and the necessary modules)

Need help installing python on linux?  I can hopefully help.  To get started installing python on linux, there are a couple of options for you. The first option – which is most likely the easiest with the least headaches – is to go download Anaconda or Enthought Canopy.  Either of these routes will get python installed and configured in such a way that will allow you to step right in and just use it.

As I said in “Installing python on Windows“, I prefer Canopy over Anaconda for scientific computing / data analytics but either will work for you.  Installing Canopy on linux is very similar to installing it on Windows…so I’ll let this post be your guide for installing Canopy.

The second option for python is to  install all the pieces yourself using the command line and/or the package manager provided by your linux distribution.  I’m a fan of the command line and will provide that overview here.

I’m going to assume that you are on a recent flavor of ubuntu for this (I’m using 16.04.1). If you are on another distribution, contact me and I can give you the instructions for those distros.

Before we get started, you should know that every linux distribution that I know of has python already installed, and most have python 2.7 installed.  I prefer 2.7 for data analytics so we’ll stick with that during this installation process.

Installing python on Linux (ubuntu) from the command line

Step 1 – open a terminal window and type “python”.  Tada…you’re done! (not really). As I said above, python 2.7 is installed on most (all?) linux distributions. There’s more to getting ready to use python for scientific applications / data science than just having python though.

Installing Python on linux - checkType “exit()” into the python interpreter if you haven’t already closed it.

Installing Python on linux - exitStep 2 – install various development tools for python that may not be installed.  These include the ‘build-essential’ tools for linux, python’s ‘pip’ tool (to make python module installations easier) and ‘python-dev’ (needed for python headers, etc). In your terminal, type the following (note the ‘-y’ tells the apt-get command to install the items without asking for confirmation):

sudo apt-get install build-essential python-pip python-dev -y

Step 3: Install a ‘virtual environment’.  This isn’t a requirement, but I strongly recommend it as it allows you to segregate the various types of installations and versions of your python modules. For example, lets say you do some development on python using pandas version 0.19 on all your projects.   In six months, pandas upgrades and deprecates something that causes your code to break. You downgrade to pandas 0.19 to keep your code working but then see that pandas 0.21 contains an absolute ‘must have’ for a new project. What do you do? Re-write all of your code to use 0.21 or stay with 0.19? With a virtual environment, you can do both.

I use and recommend ‘virtualenv’. There are other options out there (using docker, individual virtual machines, etc) but virtualenv is the simplest / quickest way to get things done.  With virtualenv installed, you can install specific versions of python modules for a project while using other versions of modules for other projects.

To install virtualenv type the following (note that we are using ‘pip’ now rather than apt-get):

sudo pip install virtualenv

Now, whenever you start a new project, type the following to install python into a new virtual environment (the ‘env’ is the name of the environment). You only have to do this once per project. Note: You should use a folder per project to keep your virtual environments separated.

virtualenv env

Whenever you want to work on a specific project, change into that folder and type the following. This will set up your environment with all of your installed python modules:

source env/bin/activate

For the purpose of this Installing python on linux walk-through perform the following commands:

  • Create a folder in your home directory called ‘projects’.
  • Type “mkdir projects” to do this from the command line.
  • Change into that folder and then type “mkdir install_example” to create another folder inside the projects folder.
  • Type “virtualenv env” to create your virtual environment.
  • Type “source env/bin/activate” to begin using this environment
  • You should see something similar to the below.

Installing Python on linux - virtualenv

Now that we have our environment ready to go, we need to install some of the modules that are most often used when doing data work inside python. These modules are:

The above modules can be installed with one pip command.

sudo pip install pandas scipy scikitlearn statsmodels sympy matplotlib jupyter

You’re ready to start working with python for data analysis. Just remember, for each virtualenv you create, you’ll need to reinstall these modules if you wish to use them.

Check back here often for more information on using the above modules to actually DO something.