Assignment 1 - CS743
Fall, 2014
Overview
This assignment has two parts:
- Create a new database and load it with Canadian census data.
- Write and run SQL queries against this database to answer
to questions about these data.
This is an individual assignment.
Creating and Loading the Database
To create and load the database, you will need to take the following
steps.
- Install a relational DBMS of your choice on your machine.
You may work with any relational DBMS you choose. If you are
not sure, PostgreSQL is a good choices.
It is open source and free to use, and it runs on multiple platforms.
- Create an empty database.
- Create (empty) tables to hold the data.
- Load data into the tables from the supplied data files.
There are two data files that you should load into your database.
Both of the files have been published by Statistics Canada at
data.gc.ca
To load each file, you will need to create a table, or tables, in the
database to hold the data from the file, and then load the data from
the file into the table(s).
It is up to you how to define these tables, and how to use your
database system's load utility to load the tables.
You may also wish to "massage" the data files a bit before loading
them, to make them easier to load into the tables you define.
If you decide to work with PostgreSQL,
please see the appendix for some somewhat more
specific information about installing and running PostgreSQL for this
assignment.
Database Queries
Once the database has been created and the data are loaded,
it should be possible for you to run queries against the database.
Write SQL queries to answer each of the following questions.
You may wish to define some views to simplify these queries.
If you do, be sure to submit your view definitions in addition
to your queries.
- Produce a listing of the names and 2011 populations of the
10 most populous census divisions in Ontario, in decreasing order
of population.
- Which census division has the largest number of french-speaking
children per square kilometer? For the puposes of this question, consider
anyone under the age of 15 to be a child.
- In which census divisions is the use of french growing?
For the purposes of this question, consider french to be growing
if the percentage of french speakers among senior citizens (age 65
and up) is smaller than the percentage of french speakers among
children (under age 15).
- What is the total 2011 population of each Canadian province or
territory?
- Which census divisions have the highest percentage of young
people? For the purposes of this question, consider young people
to be those under 25 years old. List the top ten divisions, showing
their names and percentages.
What to Hand In
You should submit the following for this assignment:
- The SQL CREATE TABLE commands you used to create the
table(s) into which you loaded the data files.
-
An answer to the following question:
Is the schema that you created to hold the census data in
Boyce-Codd Normal Form?
Briefly justify your answer.
-
The SQL queries you wrote to answer each of the database
questions.
-
The result you obtained for each query.
Please collect this material into a single, readable document,
in PDF format, for submission.
Be sure that your document includes your name.
Submit your document by e-mail
to the instructor. Use the subject line "CS743 A1 submission -
NAME", replacing NAME with your name.
Appendix
If you are not sure database system to use, PostgreSQL is a good choice.
Information about downloading PostgreSQL for Linux, BSD, OS X,
and Windows can be
found here.
PostgreSQL has extensive on-line documentation. In particular,
you can use the
on-line manual for PostgreSQL 9.3.
On Ubuntu Linux systems, PostgreSQL can also be installed easily
using the Ubuntu's software package manager.
Here are some more specific instructions for installing and
running PostgreSQL on an Ubuntu system.
These are adapted from
the Ubuntu-specific
PostgreSQL instruction page.
- Install the PostgreSQL client and server:
sudo apt-get install postgresql postgresql-contrib
- Create a PostgreSQL user with your userid
sudo -u postgres createuser --superuser $USER
- Create a PostgreSQL database with the same name as your userid
sudo -u postgres createdb $USER
- Once the above steps have been completed, you should be
able to run the interactive PostgreSQL command shell, called psql.
Run the client like this:
psql
By default, psql will connect to the database with the
same name as your userid (which you just created).
- The PostgreSQL manual
has
more information about psql. You can also
type \? at the psql prompt for help.
- Once you have created tables in your database, you can use
psql's \copy command to import data from the .csv files
into your database tables.