CS 742 Parallel and Distributed Database Systems


This course covers algorithms and architectures used in parallel and distributed database management systems. The focus is on relational systems, but the problems addressed in this course arise in all large-scale database systems.


Principles of Distributed Database Systems, by M. T. Ozsu and P. Valduriez, Prentice Hall, 1999.


3 hours of lectures per week.


Background and Introduction (3 hrs)

Why distribute a database management system? Architectural alternatives: shared-memory, shared-disk, shared-nothing architectures. Peer-to-peer and client/server models. The data integration problem, federated systems, and data warehouses. Underlying system model: operating systems and networks.

Parallel I/O (1 hr)

Disk striping and RAID systems, implications for database system design.

Parallel Query Processing(6 hrs)

Data partitioning and placement. Parallel query evaluation. Intra-operator and inter-operator parallelism. Algorithms for parallel implementation of relational operations.

Distributed Query Processing(6 hrs)

Data fragmentation (partitioning) in distributed systems. Query decomposition and optimization.

Federated Database Systems (6 hrs)

Data integration, schema integration, and system integration. Describing and exploiting source capabilities. Middleware architectures.

Distributed Transactions (3 hrs)

Database updates and the transaction model. Distributed concurrency control. Distributed agreement protocols; two-phase commit. Generalized transaction models.

Data Replication (6 hrs)

Overview of replication. The synchronization problem and algorithms. Lazy replication. Caching and dynamic replication. Failures and fault tolerance in systems with replicated data.

Current Events (5 hrs)

Hot topics.