2016 technical reports

CS-2016-01
Title

Providing Serializability for Pregel-like Graph Processing Systems

Authors

Minyang Han and Khuzaima Daudjee

Abstract

We apply recent work on referring expression types to the issue of identification in conceptual modelling. In particular, we consider how such types yield a separation of concerns in a setting where an information system based on a conceptual schema is to be mapped to a relational schema plus SQL queries. We start from a simple object-centered representation (as in semantic data models), where naming is not an issue because everything is self-identified (possibly using surrogates). We then allow the analyst to attach to every class a preferred "referring expression type", and to specify uniqueness constraints in the form of generalized functional dependencies. We show (1) how a number of well-formedness conditions concerning an assignment of referring expressions can be efficiently diagnosed, and (2) how a concrete relational schema and SQL queries over this schema are derived from a combination of the conceptual schema and queries over it, once identification issues have been separately resolved as above.

Date February 1, 2016
Report Providing Serializability for Pregel-like Graph Processing Systems (PDF)
CS-2016-02
Title

Distributed Data Deduplication

Authors

Xu Chu, Ihab Ilyas and Paraschos Koutris

Abstract Data deduplication refers to the process of identifying tuples in a relation that refer to the same real world entity. The complexity of the problem is inherently quadratic with respect to the number of tuples, since a similarity value must be computed for every pair of tuples. In order to avoid comparing tuple pairs that are obviously non-duplicates, matching algorithms use blocking techniques that divide the tuples into blocks and compare only tuples within the same block. However, even with the use of blocking, data deduplication remains a costly problem for large datasets. In this paper, we show how to further speed up data deduplication by leveraging parallelism in a shared-nothing computing environment. Our main contribution is a distribution strategy, called \disdedup, that minimizes the maximum workload across all worker nodes and provides strong theoretical guarantees. We demonstrate the effectiveness of our proposed strategy by performing extensive experiments on both synthetic datasets with varying block size distributions, as well as real world datasets.
Date February 1, 2016
Report Distributed Data Deduplication (PDF)
CS-2016-03
Title

On Referring Expressions in Information
Systems derived from Conceptual Models

Authors

Alexander Borgida, David Toman and Grant Weddell

Abstract

We apply recent work on referring expression types to the issue of identification in conceptual modelling. In particular, we consider how such types yield a separation of concerns in a setting where an information system based on a conceptual schema is to be mapped to a relational schema plus SQL queries. We start from a simple object-centered representation (as in semantic data models), where naming is not an issue because everything is self-identified (possibly using surrogates). We then allow the analyst to attach to every class a preferred "referring expression type", and to specify uniqueness constraints in the form of generalized functional dependencies. We show (1) how a number of well-formedness conditions concerning an assignment of referring expressions can be efficiently diagnosed, and (2) how a concrete relational schema and SQL queries over this schema are derived from a combination of the conceptual schema and queries over it, once identification issues have been separately resolved as above.

Date April 28, 2016
Report On Referring Expressions in Information Systems derived from Conceptual Models (PDF)
CS-2016-04
Title

Feature-Oriented Modelling in BIP: A Case Study

Authors

Cecylia Bocovich and  Joanne Atlee

Abstract In this paper, we investigate the usage of Behaviour-Interaction-Priority version 2 (BIP2), a component-based modelling framework, for specifying feature-oriented systems. We evaluate BIP2 in the context of the Feature Interaction Problem and quantify the amount of work needed to add features to an existing system (i.e., in terms of rework to existing features, and work to identify and specify interactions). We present the results of a case study on a telephony system with five optional features where we found that the amount of work depends heavily on how features are interconnected. We identify a number of different strategies for interconnecting features, and propose one that reduces the amount of work and rework needed to add new features to an existing system.
Date September 20, 2016
Report Feature-Oriented Modelling in BIP: A Case Study (PDF)
CS-2016-05
Title

Improving Time-of-Use Electricity Pricing in Ontario

Authors

Adedamola Adepetu, Srinivasan Keshav

Abstract

Time-of-Use (ToU) electricity pricing is an electricity pricing scheme where consumers are charged at a rate that is dependent on the time of electricity consumption. This pricing scheme is often implemented to match the cost of generating and supplying electricity, and to make consumers defer appliance usage; this would reduce the daily electricity consumption peak that can both reduce the cost of generation and carbon footprints. We first critique the current ToU scheme in Ontario and make recommendations to improve it. Subsequently, we create an Agent-Based Model (ABM) to study ToU pricing and its effectiveness in reducing peak loads, which allows us to evaluate the benefit of our recommendations. We nd that while ToU is effective in incentivizing load deferral, improvements can be made in the Ontario ToU scheme.

Keywords: demand response, agent-nased model, electricity pricing

Date September 20, 2016
Report Improving Time-of-Use Electricity Pricing in Ontario (PDF)