Assignment 1 - CS743
Fall, 2014


Overview

This assignment has two parts:

  1. Create a new database and load it with Canadian census data.
  2. Write and run SQL queries against this database to answer to questions about these data.
This is an individual assignment.

Creating and Loading the Database

To create and load the database, you will need to take the following steps.

  1. Install a relational DBMS of your choice on your machine. You may work with any relational DBMS you choose. If you are not sure, PostgreSQL is a good choices. It is open source and free to use, and it runs on multiple platforms.
  2. Create an empty database.
  3. Create (empty) tables to hold the data.
  4. Load data into the tables from the supplied data files.

There are two data files that you should load into your database. Both of the files have been published by Statistics Canada at data.gc.ca

To load each file, you will need to create a table, or tables, in the database to hold the data from the file, and then load the data from the file into the table(s). It is up to you how to define these tables, and how to use your database system's load utility to load the tables. You may also wish to "massage" the data files a bit before loading them, to make them easier to load into the tables you define.

If you decide to work with PostgreSQL, please see the appendix for some somewhat more specific information about installing and running PostgreSQL for this assignment.

Database Queries

Once the database has been created and the data are loaded, it should be possible for you to run queries against the database. Write SQL queries to answer each of the following questions.

You may wish to define some views to simplify these queries. If you do, be sure to submit your view definitions in addition to your queries.

  1. Produce a listing of the names and 2011 populations of the 10 most populous census divisions in Ontario, in decreasing order of population.
  2. Which census division has the largest number of french-speaking children per square kilometer? For the puposes of this question, consider anyone under the age of 15 to be a child.
  3. In which census divisions is the use of french growing? For the purposes of this question, consider french to be growing if the percentage of french speakers among senior citizens (age 65 and up) is smaller than the percentage of french speakers among children (under age 15).
  4. What is the total 2011 population of each Canadian province or territory?
  5. Which census divisions have the highest percentage of young people? For the purposes of this question, consider young people to be those under 25 years old. List the top ten divisions, showing their names and percentages.

What to Hand In

You should submit the following for this assignment:

Please collect this material into a single, readable document, in PDF format, for submission. Be sure that your document includes your name. Submit your document by e-mail to the instructor. Use the subject line "CS743 A1 submission - NAME", replacing NAME with your name.

Appendix

If you are not sure database system to use, PostgreSQL is a good choice. Information about downloading PostgreSQL for Linux, BSD, OS X, and Windows can be found here. PostgreSQL has extensive on-line documentation. In particular, you can use the on-line manual for PostgreSQL 9.3.

On Ubuntu Linux systems, PostgreSQL can also be installed easily using the Ubuntu's software package manager. Here are some more specific instructions for installing and running PostgreSQL on an Ubuntu system. These are adapted from the Ubuntu-specific PostgreSQL instruction page.

  1. Install the PostgreSQL client and server:
    sudo apt-get install postgresql postgresql-contrib
    
  2. Create a PostgreSQL user with your userid
    sudo -u postgres createuser --superuser $USER
    
  3. Create a PostgreSQL database with the same name as your userid
    sudo -u postgres createdb $USER
    
  4. Once the above steps have been completed, you should be able to run the interactive PostgreSQL command shell, called psql. Run the client like this:
    psql
    
    By default, psql will connect to the database with the same name as your userid (which you just created).
  5. The PostgreSQL manual has more information about psql. You can also type \? at the psql prompt for help.
  6. Once you have created tables in your database, you can use psql's \copy command to import data from the .csv files into your database tables.