Installing Python and Pandas
How to setup Python and Pandas for data analysis.
Python is commonly used for data analysis – not just in Computer Science, but across a large number of fields. This is partly because of the characteristics of the language: it's a fairly easy language to learn, with broad support across platforms. However, the real strength of Python for us is the availability of top-notch scientific and math libraries that provide data manipulation (e.g. numpy, scipy) and graphing capabilities (e.g. matplotlib).
Below, I'll describe my preferred way of installing Python with some common libraries (numpy, scipy, pandas) on Mac. There are distributions that bundle everything together (e.g. Enthought) but I prefer to have control over what's installed and how it's packaged (partly to avoid having multiple conflicting versions of python/libs installed, and partly because I'm a bit fussy about this stuff).
1. Install Homebrew
We'll use Homebrew as the basis for our installation.
- If you don't already have Homebrew installed, install it using these instructions.
brew doctorto make sure your installation is setup properly.
brew updateto update brew to the latest version, etc.
brew upgradeto update existing brews/packages.
2. Install Python
You probably already have Python 2.6 preinstalled, but we'll want to upgrade to Python 3.
brew install python3to install the newest version of python
brew postinstall python3to install the python pip3 package manager
3. Install Python Packages
Now we use pip to install the rest.
python3 -m pip install --upgrade pip
python3 -m pip install numpy
python3 -m pip install scipy
python3 -m pip install pandas
python3 -m pip install jupyter
python3 -m pip install matplotlib
So, how do you use this? You can write standard Python code and just pull in these libraries:
# Load libraries import pandas as pd from pandas import Series, DataFrame import matplotlib.pyplot as plt # Use pandas to load data into a dataframe colnames = ['DT', 'Participant', 'Order', 'Block', 'TaskID', 'ElapsedTime'] p01 = pd.read_csv('../data/p01/p01.log.20130220.130355.txt', sep=';', index_col='DT', header=False, names=colnames) # Manipulate the data and so on
However, we also installed Jupyter above, which lets us run python code dynamically in a web browser.
From the command line, type
jupyter notebook and a web browser will pop up and prompt you for a script to run.
Choose New, Python3 and a window will appear. Type any python code and press Shift-Enter to execute it. If everything worked, the code should execute!
You're done, at least for the setup. There's some great online resources for learning Python and Pandas: