Need help installing python on linux? I can hopefully help. To get started installing python on linux, there are a couple of options for you. The first option – which is most likely the easiest with the least headaches – is to go download Anaconda or Enthought Canopy. Either of these routes will get python installed and configured in such a way that will allow you to step right in and just use it.
As I said in “Installing python on Windows“, I prefer Canopy over Anaconda for scientific computing / data analytics but either will work for you. Installing Canopy on linux is very similar to installing it on Windows…so I’ll let this post be your guide for installing Canopy.
The second option for python is to install all the pieces yourself using the command line and/or the package manager provided by your linux distribution. I’m a fan of the command line and will provide that overview here.
I’m going to assume that you are on a recent flavor of ubuntu for this (I’m using 16.04.1). If you are on another distribution, contact me and I can give you the instructions for those distros.
Before we get started, you should know that every linux distribution that I know of has python already installed, and most have python 2.7 installed. I prefer 2.7 for data analytics so we’ll stick with that during this installation process.
Installing python on Linux (ubuntu) from the command line
Step 1 – open a terminal window and type “python”. Tada…you’re done! (not really). As I said above, python 2.7 is installed on most (all?) linux distributions. There’s more to getting ready to use python for scientific applications / data science than just having python though.
Type “exit()” into the python interpreter if you haven’t already closed it.
Step 2 – install various development tools for python that may not be installed. These include the ‘build-essential’ tools for linux, python’s ‘pip’ tool (to make python module installations easier) and ‘python-dev’ (needed for python headers, etc). In your terminal, type the following (note the ‘-y’ tells the apt-get command to install the items without asking for confirmation):
sudo apt-get install build-essential python-pip python-dev -y
Step 3: Install a ‘virtual environment’. This isn’t a requirement, but I strongly recommend it as it allows you to segregate the various types of installations and versions of your python modules. For example, lets say you do some development on python using pandas version 0.19 on all your projects. In six months, pandas upgrades and deprecates something that causes your code to break. You downgrade to pandas 0.19 to keep your code working but then see that pandas 0.21 contains an absolute ‘must have’ for a new project. What do you do? Re-write all of your code to use 0.21 or stay with 0.19? With a virtual environment, you can do both.
I use and recommend ‘virtualenv’. There are other options out there (using docker, individual virtual machines, etc) but virtualenv is the simplest / quickest way to get things done. With virtualenv installed, you can install specific versions of python modules for a project while using other versions of modules for other projects.
To install virtualenv type the following (note that we are using ‘pip’ now rather than apt-get):
sudo pip install virtualenv
Now, whenever you start a new project, type the following to install python into a new virtual environment (the ‘env’ is the name of the environment). You only have to do this once per project. Note: You should use a folder per project to keep your virtual environments separated.
virtualenv env
Whenever you want to work on a specific project, change into that folder and type the following. This will set up your environment with all of your installed python modules:
source env/bin/activate
For the purpose of this Installing python on linux walk-through perform the following commands:
- Create a folder in your home directory called ‘projects’.
- Type “mkdir projects” to do this from the command line.
- Change into that folder and then type “mkdir install_example” to create another folder inside the projects folder.
- Type “virtualenv env” to create your virtual environment.
- Type “source env/bin/activate” to begin using this environment
- You should see something similar to the below.
Now that we have our environment ready to go, we need to install some of the modules that are most often used when doing data work inside python. These modules are:
- pandas
- numpy (installed when you install pandas with pip)
- scipy
- scikit-learn
- statsmodels
- jupyter (formerly known as iPython notebook)
- SymPy
- matplotlib
The above modules can be installed with one pip command.
sudo pip install pandas scipy scikitlearn statsmodels sympy matplotlib jupyter
You’re ready to start working with python for data analysis. Just remember, for each virtualenv you create, you’ll need to reinstall these modules if you wish to use them.
Check back here often for more information on using the above modules to actually DO something.