Revised (December 11, 2013)

CS 240: Data Structures and Data Management


General description

This course introduces widely used and effective methods of data organization, focusing on data structures, their algorithms, and the performance of these algorithms. Specific topics include priority queues, sorting, dictionaries, and data structures for text processing.

Students will learn to

  • Analyze, apply, and use various data structures and data-management techniques in a variety of applications
  • Perform rigorous complexity analyses of simple algorithms and data structures, which includes writing correct mathematical proofs on inductively-defined structures and solving simple recurrence relations
  • Compare different data-structuring techniques from the point of view of time, memory requirements, etc.
  • Select a good data structure to solve a specific algorithmic problem
  • Apply data structures correctly and efficiently in one or more high-level programming languages, including C++

Logistics

Audience

  • 2B Computer Science students

Normally available

  • Fall, Winter, and Spring

Related courses

  • Predecessors: CS 245 (logic) or SE 212; CS 246 or 247 (programming); any of STAT 206, 230, 240 (probability)
  • Successors: Most third-year CS major courses
  • Conflicts: Other courses that seriously consider efficiency and correctness of fundamental data structures and their algorithms

For official details, see the UW calendar.

Software/hardware used

  • UNIX, C++

Typical reference(s)

  • R. Sedgewick, Algorithms in C, 3rd ed, Parts 1-4. Addison Wesley

Required preparation

At the start of the course, students should be able to

Analytical:

  • Define and explain order notation; perform complexity analyses of simple algorithms
  • Define "abstract data type" or ADT; explain the utility of this concept
  • Perform basic computations of discrete probability and expectation
  • Use mathematical induction on recursively defined structures, including finding simple errors in inductive arguments
  • Prove simple properties of program fragments correct through the use of pre-conditions and post-conditions for loops and recursive calls

Computational and algorithmic:

  • Design, implement, test, and debug C++ programs to solve problems requiring hundreds of lines of code
  • Define the ADTs for stacks and queues; write efficient implementations in C/C++
  • Describe tree structures, including binary search trees and multi-way trees; use these structures in C/C++ programs
  • Describe basic sorting algorithms (including Quicksort) and implement them in C/C++
  • Explain the notion of a hash table (students don't have to describe the algorithms or their efficiency)

Learning objectives

At the end of the course, students should be able to

  • Perform rigorous asymptotic analyses of simple algorithms and express the result using order notation; compare algorithms based on their asymptotic complexity; and prove formal results involving order notation
  • Apply the priority-queue ADT to solve various application problems, implement a priority queue using heaps, and analyze the complexity of common implementations of heap operations
  • Develop best-, worst- and average-case analyses of sorting algorithms, including Quicksort, and explain the ramifications of these analyses in practice; explain the basic principles of randomized algorithms and their potential advantages, specifically in the case of Quicksort; explain the difference between comparison-based sorting and non-comparison-based sorting algorithms, and when and why the latter may run faster; and apply sorting-based techniques to algorithmic problems, such as elimination of duplicates
  • Develop bounded-height search trees that accommodate efficient (i.e., O(log n)) implementations of search, insert, and delete; evaluate which search tree techniques are best suited to various application scenarios (e.g., B-trees are useful for large-scale data structures stored in external memory)
  • Explain the advantages and disadvantages of various hashing techniques; identify the best hashing techniques to use in a particular application scenario; and recognize when hashing techniques are preferable to other dictionary implementations
  • Design data structures for real-world data (where keys are often inherently multidimensional) in such a way that common operations (including range search) can be implemented efficiently
  • Design special data structures that can efficiently store and process words and strings, and select and apply a suitable technique for data compression in a specific application scenario

In addition to the above language-independent skills, students should be able to apply (code, debug, and test) any of the above structures and algorithms in C++, using appropriate design methodologies and language features. Students should be prepared to transfer these abilities to other languages (once learned).

Typical syllabus

Introduction and review (3 hours)

  • Basic computer model: the random-access machine
  • Runtime of an algorithm: worst-case, best-case, and average-case
  • Asymptotic analysis, order notation, growth rates, and complexity

Stacks, queues, and priority queues (3 hours)

  • Review of stacks and queues
  • Priority queue ADT and simple implementations
  • Heaps and Heapsort
  • Using heaps to solve the selection problem

Sorting and analysis of randomized algorithms (5 hours)

  • Quicksort (non-randomized): worst-case, best-case, and average-case complexity
  • Randomized quicksort and its analysis; application to selection and its analysis
  • Lower bound on comparison-based sorting
  • Non-comparison-based sorting algorithms (e.g., Counting Sort and Radix Sort)

Search trees (5 hours)

  • Dictionary ADT and simple implementations
  • Binary search trees (insert and delete operations and analysis)
  • Balanced search trees (insert and delete operations and analysis; instructors will normally choose two or more AVL trees, 2-3 trees, red-black trees, etc.)
  • 2-3-4 trees and B-trees (search, insert, and delete operations and analysis)

Hashing (5 hours)

  • Key-indexed search, simple hash functions
  • Collision resolution: chaining and open addressing
  • Complexity of search, insertion, and deletion
  • Extendible hashing

Range search and multidimensional dictionaries (5 hours)

  • Range search in a binary search tree
  • Data structures for orthogonal range search: quad trees, Kd-trees, range trees

Algorithms and data structures for text processing (8 hours)

  • Dictionaries for text strings: radix trees, tries, compressed tries, suffix tries
  • String matching algorithms: brute force, finite automata, the Knuth-Morris-Pratt algorithm
  • Text compression: Huffman codes, Lempel-Ziv B, Burrows-Wheeler Transform (BWT)