This course covers algorithms and architectures used in parallel and distributed
database management systems. The focus is on relational systems, but the problems
addressed in this course arise in all large-scale database systems.
Principles of Distributed Database Systems, by M. T. Ozsu and P. Valduriez,
Prentice Hall, 1999.
3 hours of lectures per week.
Why distribute a database management system? Architectural alternatives: shared-memory, shared-disk, shared-nothing architectures. Peer-to-peer and client/server models. The data integration problem, federated systems, and data warehouses. Underlying system model: operating systems and networks.
Disk striping and RAID systems, implications for database system design.
Data partitioning and placement. Parallel query evaluation. Intra-operator and inter-operator parallelism. Algorithms for parallel implementation of relational operations.
Data fragmentation (partitioning) in distributed systems. Query decomposition and optimization.
Data integration, schema integration, and system integration. Describing and exploiting source capabilities. Middleware architectures.
Database updates and the transaction model. Distributed concurrency control. Distributed agreement protocols; two-phase commit. Generalized transaction models.
Overview of replication. The synchronization problem and algorithms. Lazy replication. Caching and dynamic replication. Failures and fault tolerance in systems with replicated data.