Installing Python and Pandas

How to setup Python and Pandas for data analysis.

Image credit: Analytics Vidhya

Python is commonly used for data analysis – not just in Computer Science, but across a large number of fields. This is partly because of the characteristics of the language: it’s a fairly easy language to learn, with broad support across platforms. However, the real strength of Python for us is the availability of top-notch scientific and math libraries that provide data manipulation (e.g. numpy, scipy) and graphing capabilities (e.g. matplotlib).

Below, I’ll describe my preferred way of installing Python with some common libraries (numpy, scipy, pandas) on Mac. There are distributions that bundle everything together (e.g. Enthought) but I prefer to have control over what’s installed and how it’s packaged (partly to avoid having multiple conflicting versions of python/libs installed, and partly because I’m a bit fussy about this stuff).

1. Install Homebrew

We’ll use Homebrew as the basis for our installation.

  • If you don’t already have Homebrew installed, install it using these instructions.
  • run brew doctor to make sure your installation is setup properly.
  • run brew update to update brew to the latest version, etc.
  • run brew upgrade to update existing brews/packages.

2. Install Python

You probably already have Python 2.6 preinstalled, but we’ll want to upgrade to Python 3.

  • brew install python3 to install the newest version of python
  • brew postinstall python3 to install the python pip3 package manager

3. Install Python Packages

Now we use pip to install the rest.

  • python3 -m pip install --upgrade pip
  • python3 -m pip install numpy
  • python3 -m pip install scipy
  • python3 -m pip install pandas
  • python3 -m pip install jupyter
  • python3 -m pip install matplotlib

4. Test

So, how do you use this? You can write standard Python code and just pull in these libraries:

# Load libraries
import pandas as pd
from pandas import Series, DataFrame
import matplotlib.pyplot as plt

# Use pandas to load data into a dataframe
colnames = ['DT', 'Participant', 'Order', 'Block', 'TaskID', 'ElapsedTime']
p01 = pd.read_csv('../data/p01/p01.log.20130220.130355.txt', sep=';', index_col='DT', header=False, names=colnames)

# Manipulate the data and so on

However, we also installed Jupyter above, which lets us run python code dynamically in a web browser.

From the command line, type jupyter notebook and a web browser will pop up and prompt you for a script to run.

Choose New, Python3 and a window will appear. Type any python code and press Shift-Enter to execute it. If everything worked, the code should execute!

You’re done, at least for the setup. There’s some great online resources for learning Python and Pandas: