Introduction
Python is an attractive option for data analysis: it's OSS and multiplatform, excellent for file and data manipulation and it many scientific and numerical computation libraries. The "big win" is that you can potentially use Python as your only language for data analysis (i.e. read in your data files, cleanup data, perform various statistical tests, and output graphs and other visualizations).
If you go this route, you'll want to be familiar with this stack
-
IPython, an interactive version of Python
-
Notebook, optionally, a web-based environment for IPython that combines markdown, code and output
-
Numpy, the primary scientific computing library for Python
-
Pandas, data analysis library for Python
-
Matplotlib, 2D plotting library for graphs and visualizations
Typically, you'll use Python to cleanup your data and format it into structures that Numpy and Pandas can work with directly. Matplotlib is used to generate graphs if needed.
Notebook is optional, but a great way to package together documentation, code, and results together in a single file. Great for sharing results.
Resources
In addition to the home pages of the respective package (above), I've found the following resources to be really helpful:
Wes
McKinney,
Python for Data Analysis. Written by the author of Pandas, it includes a high-level introduction to Python and Numpy for data analysis. Cheap, available electronically or paper, and very, very well-written.
YouTube Video:
Wes McKinney talking about Python for Data Analysis
YouTube Video:
Alfred Essa, Introduction to Python for Data Analysis (very short)
Getting Started
Each of the packages mentioned above can be downloaded and installed separately, but it's much simpler to download a single distribution that contains everything.
Enthought Canopy (formerly known as the Enthought Python Distribution, or EPD) - what I use
Anaconda - no experience
Canopy, for instance, is a single file download that bundles the appropriate version of IPython, all libraries, and configures your environment. It also includes a package manage to install other libraries, keep things up to date etc.
Note that Canopy is a commercial product but it has a
free academic license. Make sure to signup using your
@uwaterloo.ca email address.
Samples
See
Wes McKinney's blog
--
JeffAvery - 2013-07-05