CS 240 | SCS | UW

Revised (December 11, 2013)

CS 240: Data Structures and Data Management

General description

This course introduces widely used and effective methods of data organization, focusing on data structures, their algorithms, and the performance of these algorithms. Specific topics include priority queues, sorting, dictionaries, and data structures for text processing.

Students will learn to

Analyze, apply, and use various data structures and data-management techniques in a variety of applications
Perform rigorous complexity analyses of simple algorithms and data structures, which includes writing correct mathematical proofs on inductively-defined structures and solving simple recurrence relations
Compare different data-structuring techniques from the point of view of time, memory requirements, etc.
Select a good data structure to solve a specific algorithmic problem
Apply data structures correctly and efficiently in one or more high-level programming languages, including C++

Logistics

Audience

2B Computer Science students

Normally available

Fall, Winter, and Spring

Related courses

Predecessors: CS 245 (logic) or SE 212; CS 246 or 247 (programming); any of STAT 206, 230, 240 (probability)
Successors: Most third-year CS major courses
Conflicts: Other courses that seriously consider efficiency and correctness of fundamental data structures and their algorithms

For official details, see the UW calendar.

Software/hardware used

UNIX, C++

Typical reference(s)

R. Sedgewick, Algorithms in C, 3rd ed, Parts 1-4. Addison Wesley

Required preparation

At the start of the course, students should be able to

Analytical:

Define and explain order notation; perform complexity analyses of simple algorithms
Define "abstract data type" or ADT; explain the utility of this concept
Perform basic computations of discrete probability and expectation
Use mathematical induction on recursively defined structures, including finding simple errors in inductive arguments
Prove simple properties of program fragments correct through the use of pre-conditions and post-conditions for loops and recursive calls

Computational and algorithmic:

Design, implement, test, and debug C++ programs to solve problems requiring hundreds of lines of code
Define the ADTs for stacks and queues; write efficient implementations in C/C++
Describe tree structures, including binary search trees and multi-way trees; use these structures in C/C++ programs
Describe basic sorting algorithms (including Quicksort) and implement them in C/C++
Explain the notion of a hash table (students don't have to describe the algorithms or their efficiency)

Learning objectives

At the end of the course, students should be able to

Perform rigorous asymptotic analyses of simple algorithms and express the result using order notation; compare algorithms based on their asymptotic complexity; and prove formal results involving order notation
Apply the priority-queue ADT to solve various application problems, implement a priority queue using heaps, and analyze the complexity of common implementations of heap operations
Develop best-, worst- and average-case analyses of sorting algorithms, including Quicksort, and explain the ramifications of these analyses in practice; explain the basic principles of randomized algorithms and their potential advantages, specifically in the case of Quicksort; explain the difference between comparison-based sorting and non-comparison-based sorting algorithms, and when and why the latter may run faster; and apply sorting-based techniques to algorithmic problems, such as elimination of duplicates
Develop bounded-height search trees that accommodate efficient (i.e., O(log n)) implementations of search, insert, and delete; evaluate which search tree techniques are best suited to various application scenarios (e.g., B-trees are useful for large-scale data structures stored in external memory)
Explain the advantages and disadvantages of various hashing techniques; identify the best hashing techniques to use in a particular application scenario; and recognize when hashing techniques are preferable to other dictionary implementations
Design data structures for real-world data (where keys are often inherently multidimensional) in such a way that common operations (including range search) can be implemented efficiently
Design special data structures that can efficiently store and process words and strings, and select and apply a suitable technique for data compression in a specific application scenario

In addition to the above language-independent skills, students should be able to apply (code, debug, and test) any of the above structures and algorithms in C++, using appropriate design methodologies and language features. Students should be prepared to transfer these abilities to other languages (once learned).

Typical syllabus

Introduction and review (3 hours)

Basic computer model: the random-access machine
Runtime of an algorithm: worst-case, best-case, and average-case
Asymptotic analysis, order notation, growth rates, and complexity

Stacks, queues, and priority queues (3 hours)

Review of stacks and queues
Priority queue ADT and simple implementations
Heaps and Heapsort
Using heaps to solve the selection problem

Sorting and analysis of randomized algorithms (5 hours)

Quicksort (non-randomized): worst-case, best-case, and average-case complexity
Randomized quicksort and its analysis; application to selection and its analysis
Lower bound on comparison-based sorting
Non-comparison-based sorting algorithms (e.g., Counting Sort and Radix Sort)

Search trees (5 hours)

Dictionary ADT and simple implementations
Binary search trees (insert and delete operations and analysis)
Balanced search trees (insert and delete operations and analysis; instructors will normally choose two or more AVL trees, 2-3 trees, red-black trees, etc.)
2-3-4 trees and B-trees (search, insert, and delete operations and analysis)

Hashing (5 hours)

Key-indexed search, simple hash functions
Collision resolution: chaining and open addressing
Complexity of search, insertion, and deletion
Extendible hashing

Range search and multidimensional dictionaries (5 hours)

Range search in a binary search tree
Data structures for orthogonal range search: quad trees, Kd-trees, range trees

Algorithms and data structures for text processing (8 hours)

Dictionaries for text strings: radix trees, tries, compressed tries, suffix tries
String matching algorithms: brute force, finite automata, the Knuth-Morris-Pratt algorithm
Text compression: Huffman codes, Lempel-Ziv B, Burrows-Wheeler Transform (BWT)