CS 448 Assignment #3

Assignments may be done individually or in teams of two.

Due by 10:00 am on Wednesday, March 14, 2001.

Consider the following relation scheme for compact discs

CD (Company, Date, CatalogNum, Composer, Track, Group, Artist, Title, Instrument, Duration)

together with the following functional dependencies:

{Artist} -> {Instrument, Group}

{Composer, Title, Company} -> {CatalogNum, Track, Duration}

{CatalogNum} -> {Company, Date}

{CatalogNum, Title} -> {Composer}

(a) Use Armstrong.s axioms to prove formally that this set of functional dependencies implies that {CatalogNum, Title} -> {Track}. Justify each step of your proof by listing the reason as given (i.e., the functional dependencies is one of the four given ones) or by naming which one of augmentation, transitivity, reflexivity, union, decomposition, or pseudotransitivity was applied.

(b) Give an example of a relation instance for CD having two rows and showing that the dependency {Title, Company} -> {CatalogNum} need not hold when the four given functional dependencies do.

(c) Calculate the closure of the attribute set {CatalogNum, Title} with respect to the given functional dependencies.

(d) Find two candidate keys for CD.

(e) In the presence of the given functional dependencies, determine whether the join of relations conforming to the schemes Recording (CatalogNum, Company, Composer, Title, Track, Duration, Group, Artist) and Performer (CatalogNum, Group, Artist, Instrument, Date) is lossless with respect to CD and explain why or why not.

Consider the relation scheme R (A, B, C, D, E, F, G) and the set of functional dependencies {AB.C, BC.D, DEF.G, DG.EF, DG.AB}, where AB is a shorthand notation representing the set {A, B} etc.

(a) Find a lossless join decomposition into relation schemes, all of which are in Boyce-Codd Normal Form.

(b) Is the resulting set of relation schemes dependency preserving with respect to the given functional dependencies? Explain why or why not.

Consider the relation scheme Cities (Name, Province, Population, KilometresFromWaterloo) and queries as follows:

Q1: select Name, Province

from Cities

where Population > 100000

Q2: select Name, Population, KilometresFromWaterloo

from Cities

where Province = .ON. and KilometresFromWaterloo < 100

Q3: select Province, count(city), sum(Population)

from Cities

group by Province

(a) Briefly explain how a B+-tree index on Population could be used during processing of Q1.

(b) Briefly explain how a clustering index on the pair of attributes (Province, KilometresFromWaterloo) could be used during processing of Q2.

(c) Briefly explain how a hash-table index could be used during processing of Q2.

(d) If separate B+-tree indexes were available on Province and on KilometresFromWaterloo, explain which would likely be more selective for processing Q2.

(e) Briefly explain why using a clustering index on Province would be more efficient than either a hash-table or B+-tree index during processing of Q3.