CS 448 Database Systems Implementation


Watch a video introduction to this course on YouTube.

Objectives

The objective of this course is to introduce students to fundamentals of building a database management system (DBMS), in particular a relational one. It focuses on the database engine core technology by studying topics such as storage systems (data layout, disk-based data structures), indexing, query processing algorithms, query optimization, transactional concurrency control, logging and recovery. It complements CS348 by looking at the internals of relational DBMSs.

Intended Audience

This is a second course on databases that focuses on DBMS internals. It is a project-oriented course that will provide the students, upon successful completion, with an appreciation of the intricacies and complexities of a DBMS and enable them to be able to design and implement the major components of it. The course objective will be achieved by focusing on three fundamental sub-objectives:

  1. To understand the fundamentals of storage systems and disk-based data structures;
  2. To understand the process of query processing and optimization; and
  3. To learn the implementation of transactions.

Complementary to the above objectives, the course has a training component where the students will gain experience, within the context of a number of assignments, in building components of a DBMS and incorporating them into an open source system such as MySQL or PostgreSQL. The lectures may be complemented by guest lectures on real-life DBMS implementation issues given by colleagues from industry (Sybase, IBM Canada, Microsoft and others).

CS 448 is a course for CS major students, and is normally taken in a student's fourth year. This course will be of interest to students whose area of expertise includes large software systems. CS 338 is available for students in other plans.

Related Courses

Prerequisites: CS 348 and (CS 350 or SE 350). Computer Science students only.

References

Database Management Systems, 3rd ed., by R. Ramakrishnan and J. Gehrke, McGraw-Hill, 2003. Course notes are required (may be made available via the web).

Schedule

3 hours of lectures per week. Normally available in Winter.

Outline

Review of relational database systems (3 hours)

Fundamentals of relational databases, relational calculus, relational algebra, integrity issues.

Storage Management (9 hours)

Data layout, buffer systems, file management, indexing techniques (tree-based and hashing).

Query Processing and Optimization (13 hours)

Query processing methodology, view expansion, query translation, implementation of relational operators, external sorting, cost-based query optimization.

Transaction Management (12 hours)

Transaction models, concurrency control algorithms, database recovery.

Meta-data Management (2 hours)

Implementation of catalogs and integrity constraints.