Setting Up Python for Science

22 August 2014

This is about how I set up python. In particular, I think that a language is useless without ready libraries to do almost anything under the sun. I also firmly believe that python is a tool. I don’t care what programming language I am using, as long as I can get things done fast. I want the programming language to get out of my way, and let me focus on the interesting things. Therefore, I think the listed libraries should be default parts of python, and I treat them as such here.

Here are the components of my “bare-bones python installation”:

Linux

Installing the above-mentioned components is easy on most linux systems. I don’t include details, since if you’re using linux, you probably already know how to use a terminal.

If you use Arch, pacman -S python2-matplotlib python2-ipython python2-scipy python2-ipython should get the basics out of the way, leaving python2-numba-git and python2-theano which can be found in the AUR. At the time of writing, python2-numba-git is broken, but should be fixed any day now.

On ubuntu or debian start with: sudo apt-get install python-numpy python-scipy python-matplotlib, then use the theano install instructions. I am not sure how to install numba, since I have not used debian-based systems for a couple of years now. Google can probably lead you to instructions.

Once stuff is installed, skip down to “Does it work” to check if stuff is happening the way it is supposed to…

Windows

First things first: you need to get yourself some python! Go to the python website, and download the “Python 2.7.x Windows X86-64 Installer”. I am assuming that you know how to install things. The default settings are by far the best, and changing the installation directory is not recommended. The only thing that might be useful is making sure that “add python to system search PATH” is checked (I forgot the exact words used).

Once python itself is installed, you will want to get the rest of your libraries. Fortunately, www.lfd.uci.edu/~gohlke/pythonlibs/ includes 64 bit installers for all of the libraries we want!

You can choose the installers, making sure that the files are of the form nameoflibrary.win-amd64-py2.7.exe.

We’re not done yet. That was the easy part. Next comes actually getting Theano to work with 64 bit python.

Getting Theano to work

The main problem with theano, which is also its main benefit, is that it actually compiles stuff, so you need a 64 bit MinGW build! I myself got it working after reading this, but I had to do several modifications, so I wrote them down here. Also, all the computers with nice GPUs to which I have access use linux, so I cannot give instructions on enabling CUDA support in windows.

Getting the MinGW compiler

The MinGW project provides the standard gnu compilers for C, C++ (the same ones that you use on linux). 64 bit python is being used here, so 64 bit MinGW is needed. The only real annoyance here is that MinGW-64 comes with its own python environment. I ended up installing it with python, and then deleting all of the files associated with python.
Lastly, make sure that the bin directory of minGW64 (the folder that has gcc.exe) is in your PATH. You do that by editing the PATH environmental variable (searching for “environment variables” in the start menu search should get you there)

Getting the Python library to work

I didn’t originally realize this, but the 64 bit development files are available here. Theano should work after installing them.

When importing theano, if it gives errors amounting to “not recognizing libpython”, then unfortunately, you might have to generate your own libpython .a file. This amounts to using dlltool. A tutorial can be found here.

Things that make life easy

Now it is time to set up python for easy running. By default, double-clicking on a file ending with .py will run it in the python interpreter. Unfortunately, if your program crashes, it also quits the interpreter, without letting you see the errors. And that gets really annoying, especially since in my experience, I spend more time debugging programs than actually running them!
The easiest solution involves downloading git, which by itself is super useful, but more importantly, during download you can set so that git bash is added to your windows explorer context menu. This will allow you to open a command line in the correct directory by right clicking on the background of a folder:

From the “git bash”, you can run python programs simply by typing in python filename.py. You can also get the previous command by using the up arrow key. This will make the error messages easily visible.

Next, add C:/Python27/Scripts to the Path environmental variable. This was already done for getting MinGW to work with python, and it is the same procedure.

You might want to log out then log back in for the changes to take effect. Then, open bash (right click on desktop, and the menu should contain Git Bash). Run the following command:

pip install pyreadline

If it works, you’re good to go, if not, you might not have the Python scripts directory in your path.

What text editor should I use?

I didn’t think this section was necessary, but I frequently find people who are new to programming on windows want to use Notepad. No. Just… no.
You want to use Notepad++. In particular, right clicking on a python file gives you the option of editing in notepad++, which is very useful. You can also use IDLE, which comes with default python (right click on a python file, and edit with IDLE is an option). IDLE was my first python editor, but opening files was too slow for me.

If you are a student (students get it free!), or happen to be rich enough to afford Visual Studio, then I highly recommend python tools for visual studio. In fact, since I am still a student, Visual Studio with pytools is my preferred development environment for python on windows.

Does it work?

So you’ve installed all of the libraries. How to make sure they work?
To start off, open bash, and type in ipython. You should get a python terminal. Just by importing everything you’ll be able to tell that things are probably installed correctly:

In [1]: import pylab
In [2]: import theano
In [3]: import numba
In [4]:

A sample program

Let’s try a program which will actually test if stuff works! Save this as test.py, and run it in bash using python test.py

from pylab import *
from numba import jit

import theano.tensor as T
from theano import function

#The @jit tells numba to compile the given method. Removing it will show how much numba can speed things up
@jit
def this_takes_forever_without_jit(a):
    z = zeros(1000)
    for i in range(a):
        for j in range(1000):
            z[j] += i + j
    return z
    
print this_takes_forever_without_jit(50000)

inp = T.vector("input")

#Creates a theano function, which takes the elementwise sigmoid of the input vector
sample_theano_function = function([inp],T.nnet.sigmoid(inp))

#Creates a vector with values from -10 to 10, going by 0.01
x = arange(-10,10,0.01)

y = sample_theano_function(x)

#Plots the results on a pretty plot!
plot(x,y)
show()

Run it, and see what happens!

If there are no errors, you can get back to doing science!

Final Thoughts

With the given setup, I have found myself with less reason than ever to use any other programming language. I still use R for some statistical workloads, and I write some stuff in C, but it became clear to me that no other language I know offers such an enormous boost in the amount I can get done with a small amount of time.