# 2007 Technical Reports

 <2006 2008>
CS-2007-01
Title Reputation-Oriented Reinforcement Learning Strategies for Economically-Motivated Agents in Electronic Market Environments
Authors Thomas Thanh Tran
Abstract In this thesis, we propose a market model and learning algorithms for buying and selling agents in electronic marketplaces. We take into account the fact that multi- ple selling agents may offer the same good with different qualities, and that selling agents may alter the quality of their goods. We also consider the possible exis- tence of dishonest selling agents in the market. In our approach, buying agents learn to maximize their expected value of goods using reinforcement learning. In addition, they model and exploit the reputation of selling agents to avoid inter- action with the disreputable ones, and therefore to reduce the risk of purchasing low value goods. Our selling agents learn to maximize their expected profits by using reinforcement learning to adjust product prices, and also by altering product quality to provide more customized value to their goods. We experimentally eval- uate our model on both microscopic and macroscopic levels. On the micro level, we examine the individual benefit of agents, in particular their level of satisfaction. Our experimental results confirm that in both modest and large-sized marketplaces, buying and selling agents following our proposed algorithms achieve better satis- faction than buying and selling agents who only use reinforcement learning. On the macro level, we study how a marketplace populated with our buying and selling agents would behave as a whole. Our results show that such a marketplace can reach an equilibrium state where the agent population remains stable and that this equilibrium is beneficial for the participant agents. The market model and learn- ing algorithms presented in this thesis can therefore be used in designing desirable market environments and effective economically-motivated agents for e-commerce applications.
Date January 19, 2007
CS-2007-02
Title Optimal Top-Down Join Enumeration (extended version)
Authors David DeHaan, dedehaan@uwaterloo.ca
Frank Wm. Tompa, fwtompa@uwaterloo.ca
Abstract Most contemporary database systems perform cost-based join enumeration using some variant of System-R's bottom-up dynamic programming method. The notable exceptions are systems based on the top-down transformational search of Volcano/Cascades. As recent work has demonstrated, bottom-up dynamic programming can attain optimality with respect to the shape of the join graph; no comparable results have been published for transformational search. However, transformational systems leverage benefits of top-down search not available to bottom-up methods.

In this paper we describe a top-down join enumeration algorithm that is optimal with respect to the join graph. We present performance results demonstrating that a combination of optimal enumeration with search strategies such as branch-and-bound yields an algorithm significantly faster than those previously described in the literature. Although our algorithm enumerates the search space top-down, it does not rely on transformations and thus retains much of the architecture of traditional dynamic programming. As such, this work provides a migration path for existing bottom-up optimizers to exploit top-down search without drastically changing to the transformational paradigm.

Date March 14, 2007
Comments This is an extended version of work published in SIGMOD'07, June 11-14, 2007, Beijing, China. That work is copyright ACM, 2007.
CS-2007-03
Title Predictable Semiautomata
Authors Janusz Brzozowski and Nicolae Santean
Abstract

We introduce a new class of nondeterministic semiautomata: A nondeterministic semiautomaton S is predictable if there exists a positive integer k such that, if S knows the present input a and the next k inputs, then the transition under a is deterministic. Nondeterminism may occur only when the length of the unread input is less than k+1. We develop a comprehensive theory of predictable semiautomata. Using a novel semiautomaton, called the core, we present a test for predictability. We then introduce the predictor semiautomaton, based on a look-ahead semiautomaton, that is essentially deterministic. We describe two ways of using the predictor to simulate a nondeterministic semiautomaton. The first simulation predicts the set of states reachable by every prefix of the input word as long as the prefix is in the language of the semiautomaton. The second simulation is similar, but it stops as soon as it infers that the input word is not in the language of the semiautomaton. Moreover, the membership of a word in the language of a semiautomaton can be decided completely deterministically. Finally, we show that, if a semiautomaton with n states over a one-letter alphabet is k-predictable, k being the smallest such integer, then k is at most n-1, and this bound can be reached. For semiautomata over arbitrary alphabets, k is at most (n2-n)/2, and this bound can be reached for a suitable input alphabet.

Keywords: Automaton, delegator, look-ahead, nondeterminism, predictor, selector, semiautomaton, simulation

Date May 9, 2007
CS-2007-04
Title Software System Generation from an Enterprise Service Model
Authors D.D. Cowan, P.S.C. Alencar, D.B. Brown, H.D. Covvey, I. Gimenes, C.J.P Lucena Bill Malyk, D.W. Mulholland, A. Robins, K. Young University of Waterloo
Abstract

This paper introduces an approach to constructing information systems based on enterprise service architectures (ESAs). An ESA model is specified using declarative business process modeling or workflow commands in conjunction with service framework instances where the business process model and the instances are captured in a data structure. A service framework is an outline of different types such as reports, databases, agents, diagrams and maps and an instance is produced by completing a framework's adaptation points. Adaptation points are defined by an adaptation interface specified in a declarative language and presented through forms. The business process or workflow is specified similarly. An automatic transformation is then applied to convert the model into an operational information system. This approach can be viewed as model-driven software development without the necessity of writing code to complete the implementation. Thus, we avoid the symptoms of architectural drift that occur when applications evolve independently from their model. The control structure is declarative and separate, clearly delineating the concerns, thus making it easier to address future anticipated, unanticipated and crosscutting concerns. We have created tools to support this approach, and although many of the tools are general, we have concentrated our efforts on the Web because of its pervasive nature. Thus, we have produced the Web-based Informatics Development Environment (WIDE) technology, a set of tools and processes used to transform a model based on ESA principles into a web-based information system. We have designed and implemented over 30 operational web-based systems to validate our approach.

Date May 1 , 2007
CS-2007-05
Title OOMatch: Pattern Matching as Dispatch in Java
Authors Adam Richard and Ondrej Lhotak
Abstract

We present a new language feature, specified as an extension to Java. The feature is a form of dispatch which includes and subsumes multimethods, but which is not as powerful as general predicate dispatch. It is, however, intended to be more practical and easier to use than the latter. The extension, dubbed OOMatch, allows method parameters to be specified as patterns, which are matched against the arguments to the method call. When matches occur, the method applies; if multiple methods apply, the method with the more specific pattern overrides the others.

The pattern matching is very similar to that found in the "case" constructs of many functional languages (ML, for example), with an important difference: functional languages normally allow pattern matching over variant types (and other primitives such as tuples), while OOMatch allows pattern matching on Java objects. Indeed, the wider goal here is the study of the combination of functional and object-oriented programming paradigms. Of special importance, we ensure that matching can occur while retaining the complete control of class designers to prevent implementation details (such as private variables) from being exposed to clients of the class.

We here present both an informal "tutorial" description of OOMatch, as well as a formal specification of the language, and a proof that the conditions specified guarantee run-time safety.

Date March 19, 2007
CS-2007-06
Title Efficient Search for the Top-k Probable Nearest Neighbors in Uncertain Databases
Authors George Beskales, Mohamed A. Soliman and Ihab F. Ilyas
Abstract

Uncertainty pervades many domains in our lives. Current real-life applications, e.g., location tracking using GPS devices or cell phones, multimedia feature extraction, and sensor data management, deal with different kinds of uncertainty. Finding the nearest neighbor objects to a given query point is an important query type in these applications.
In this paper, we study the problem of finding objects with the highest marginal probability of being the nearest neighbors to a query object. We adopt a general uncertainty model allowing for data and query uncertainty. Under this model, we define new query semantics, and provide several efficient evaluation algorithms. We analyze the cost factors involved in query evaluation, and present novel techniques to address the trade-offs among these factors. We give multiple extensions to our techniques including handling dependencies among data objects, and answering threshold queries. We conduct an extensive experimental study to evaluate our techniques on both real and synthetic data.

Date October 20, 2008
CS-2007-07
Title Bounded Model Checking with Description Logic Reasoning
Authors Shoham Ben-David, Richard Trefler, Grant Weddell
Abstract

Model checking is a technique for verifying that a finite-state concur- rent system is correct with respect to its specification. In bounded model check- ing (BMC), the system is unfolded until a given depth, and translated into a CNF formula. A SAT solver is then applied to the CNF formula, to find a satisfying assignment. Such a satisfying assignment, if found, demonstrates an error in the model of the concurrent system. Description Logic (DL) is a family of knowledge representation formalisms, for which reasoning is based on tableaux techniques. We show how Description Logic can serve as a natural setting for representing and solving a BMC prob- lem. We formulate a bounded model checking problem as a consistency problem in the DL dialect ALC I . Our formulation results in a compact representation of the model, one that is linear in the size of the model description, and does not involve any unfolding of the model. Experimental results, using the DL reasoner FaCT++, significantly improve on a previous approach that used DL reasoning for model checking.

Date March 26, 2007
CS-2007-08
Title Designing Succinct Structural Alphabets
Authors Shuai Cheng Li , Jinbo Xu, Xin Gao, Dongbo Bu, Ming Li
Abstract

Abstract: The 3D structure of protein sequence A can be assembled by the substructures corresponding to small segments of A. A sequence segment does not take on all the structural fragments and thus it is desirable to build a short customized structural candidate list for each sequence segment. For each sequence segment, these substructures are its "specific structural alphabet". The smaller these candidate lists are, the faster the protein structure can be constructed; the more accurate these candidate lists are, the more accurate the final protein structure will be. A major obstacle in protein structure prediction is to construct a small set of structural candidates for each segment such that the native structure can be rebuilt from these structural candidates accurately.

Based on integer linear programming and incorporating extra structural information, a software package FragShaver is developed. We have made significant progress in overcoming the abovementioned obstacle:

1. Comparing our package to the Rosetta's fragment selection method, at threshold 1A and structural fragment length 9, the position coverage is improved from 56.4% to 79.1% for $\beta$-sheet, and from 55.5% to 67.9% for loop, while reducing the candidate list size from 25 to 10 simultaneously. By using candidate list size 25, our method improves Rosetta's position coverage of $\beta$-sheet from 56.5% to 89.6% and the position coverage of loop from 55.5% to 78.1%. At 1.5A, we achieve position coverage 96.7%.

2 Applying our method to Kolodny's independent library, our experiment indicates that our method is capable of identifying a small subset of structural candidates for a given sequence segment to achieve the same accuracry as using the whole library, reducing Kolodny's library size from 200 to 25 at the same accuracy.

Date April 4, 2007
CS-2007-09
Title The FireCollaborator: a Collaborative Approach for Attack Detection
Authors Jerome Francois, Adel El-Atawy, Ehab Al-Shaer, and Raouf Boutaba
Abstract

The distributed denial of service attacks are a ma jor threat on Internet and detecting this kind of attack as far as possible from the victim in order to save resources is a real challenge. We propose a new framework to deal with this problem which is based on Intrusion Prevention System (IPS) deployed on a Internet Service Provider (ISP) level. The key point is to use compressed metrics based on the routing rules in order to extract suspect traffics thanks to a collaboration between routers on the same path and finally confirm an attack by using the IPS on different path but on the same level (number of hops before the potential victim). The main metric we use is the frequency of a rule but we decide to use the entropy too. We are able to detect a change in the traffic and define what rules changes a lot and have a too higher frequency. After the share of the different belief of each IPS is needed to determine the very potential attack before confirm it by computing the packets rate. There are three main advantages of our proposition. The first is that because you select the rules, you can analyze precisely these rules and be sure that there is an attack. The second is that we save a lot of resources (network, CPU and storage) thanks to the choice of the metrics and the selection of rule. Finally because we determine the attack as far as possible the final host can save resources too because the attack traffic is blocked early and congestion can be avoided.

Date April 16, 2007
CS-2007-10
Title Addressing an Open Problem on Regex
Authors Cezar Campeanu and Nicolae Santean
Abstract

In this paper we revisit the semantics of extended regular expressions (regex), defined succinctly in the 90's (A.V. Aho) and rigorously in 2003 by Campeanu, Salomaa and Yu, when the authors reported an open problem, namely whether regex languages are closed under the intersection with regular languages. We give a positive answer to this question; and for doing so, we propose a new class of machines - regex automata systems (RAS) - which are equivalent to regex. Among others, these machines provide a consistent and convenient method of implementing regex in practice. We also prove, as a consequence of this closure property, that several anthological languages, such as the mirror language, the language of palindromes and the language of balanced words are not regex languages.

Keywords: Extended Regular Expression, Regex Automata System, Regex

Date April 11, 2007
CS-2007-11
Title Succinct Representation of Labeled Graphs
Authors Jeremy Barbay, Luca Castelli Aleardi, Meng He and J. Ian Munro
Abstract In many applications, properties of an object being modeled are stored as labels on vertices or edges of a graph. In this paper, we consider succinct representation of labeled graphs. Our main results are the succinct representations of labeled and multi-labeled graphs (we consider vertex labeled planar triangulations, as well as edge labeled planar graphs and the more general $k$-book embedded graphs) to support various label queries efficiently. The additional space cost to store the labels is essentially the information-theoretic minimum. As far as we know, our representations are the first succinct representations of labeled graphs.

We also have two preliminary results to achieve the main results. First, we design a succinct representation of unlabeled planar triangulations to support the rank/select of edges in ccw (counter clockwise) order in addition to the other operations supported in previous work. Second, we design a succinct representation for a $k$-book graph when $k$ is large to support various navigational operations more efficiently. In particular, we can test the adjacency of two vertices in $O(\lg k\lg\lg k)$ time, while previous work uses $O(k)$ time.
Date May 2, 2007
CS-2007-12
Title Implementation of Parallel Set Intersection for Keyword Search using RapidMind
Authors Fady Samuel, Jeremy Barbay and Michael McCool
Abstract The intersection of large ordered sets is a common problem in the context of the evaluation of boolean queries to a search engine. In this paper we propose several parallel algorithms for computing the intersection of sorted arrays, taking advantage of the parallelism provided by the new generation of multicore processors and of the programming library Rapidmind. We perform an experimental comparison of performance on a excerpt of web index and queries provided by Google, and show an improvement of performance proportional to the parallelism when using up to four processors. Our results confirm the intuition that the intersection problem is implicitly parallelisable at the set level, and not only at the query level as previously considered.
Date May 8, 2007
CS-2007-13
Title Faster Set Intersection Algorithms for Text Searching
Authors Jeremy Barbay, Alejandro Lopez-Ortiz, Tyler Lu and Alejandro Salinger.
Abstract

The intersection of large ordered sets is a common problem in the context of the evaluation of boolean queries to a search engine. In this paper we propose several improved algorithms for computing the intersection of sorted arrays, and in particular for searching sorted arrays in the intersection context. We perform an experimental comparison with the algorithms from the previous studies from Demaine, L&oacute;pez-Ortiz and Munro [ALENEX 2001], and from Baeza-Yates and Salinger [SPIRE 2005]; in addition, we implement and test the intersection algorithm from Barbay and Kenyon [SODA 2002] and its randomized variant [SAGA 2003].

We consider both the random data-set from Baeza-Yates and Salinger, the Google queries used by Demaine et al., a corpus provided by Google and a larger corpus from the TREC Terabyte 2006 efficiency query stream, along with its own query log. We measure the performance both in terms of the number of comparisons and searches performed, and in terms of the CPU time on two different architectures. Our results confirm or improve the results from both previous studies in their respective context (comparison model on real data and CPU measures on random data), and extend them to new contexts. In particular we show that value-based search algorithms perform well in posting lists in terms of the number of comparisons performed.

Date April 30, 2007
CS-2007-14
Title An Approximation Algorithm for Shortest Descending Paths
Authors Mustaq Ahmed and Anna Lubiw
Abstract A path from s to t on a polyhedral terrain is descending if the height of a point p never increases while we move p along the path from s to t. No efficient algorithm is known to find a shortest descending path (SDP) from s to t in a polyhedral terrain. We give a simple approximation algorithm that solves the SDP problem on general terrains. Our algorithm discretizes the terrain with O(n2 X / e) Steiner points so that after an O(n2 X / e * log(n X /e))-time preprocessing phase for a given vertex s, we can determine a (1+e)-approximate SDP from s to any point v in O(n) time if v is either a vertex of the terrain or a Steiner point, and in O( n X /e ) time otherwise. Here n is the size of the terrain, and X is a parameter of the geometry of the terrain.

Date May 9, 2007
CS-2007-15
Title Reasoning about Interaction in Mixed-Initiative Artificial Intelligence Systems
Authors Michael Fleming
Abstract This thesis presents computational models for the design of mixed-initiative artificial intelligence systems that can make rational decisions about interaction with potentially helpful users. Mixed-initiative systems are ones in which either the system or the user can take the initiative to direct the dialogue or the problem solving. These systems have been designed for such diverse applications as robotics, military planning, intelligent tutoring and trip scheduling. One challenge in designing these systems is to specify when the system should take the initiative to interact with the user. The main contribution of the thesis is to provide designers of mixed-initiative systems with a systematic approach for constructing systems that can reason in a principled way about interaction with the user, regardless of the area of application. Our approach is to model the user, the task and the dialogue simultaneously. Specific factors are proposed that must be modeled, and methods are developed for how to combine these factors in order to make rational decisions about interaction, based on whether the perceived benefits of communication exceed the expected costs. Some examples and experiments are described, to demonstrate the value of the models and to justify decisions that were made in determining the role of each factor in the computational models. In particular, we emphasize the value of making decisions about interaction based on a careful evaluation of the needs, preferences and abilities of the user, leading to mixed-initiative systems that are user-specific and therefore result in greater overall user satisfaction.
Date December 18, 2007
CS-2007-16
Title On Ordering Descriptions in a Description Logic
Authors Jeffrey Pound, Lubomir Stanchev, David Toman, and Grant Weddell
Abstract We introduce a description language for specifying partial ordering relations over concept descriptions in description logics, and show how the language can be used in combination with binary trees to effiently search a database that corresponds to a finite set of concept descriptions. The language consists of a pair of ordering constructors that support a form of exogenous indexing in which search criteria is independent of data, and a form of endogenous indexing in which the data itself provides search criteria. Our language can be refined in the same way as a description logic in that greater expressiveness and conse- quent richer search capability is achieved by adding additional ordering constructors.
Date June 7, 2007
CS-2007-17
Title Geometric Streaming Algorithms with a Sorting Primitive
Author Eric Y. Chen
Abstract We solve several fundamental geometric problems under a new streaming model recently proposed by Ruhl et al. [2, 12]. In this model, in one pass the input stream can be scanned to generate an output stream or be sorted based on a user-defined comparator; all intermediate streams must be of size O(n). We obtain the following geometric results for any fixed constant
Date May 23, 2007
CS-2007-18
Title Comprehending Object-Oriented Software Frameworks API Through Dynamic Analysis
Authors Abbas Heydarnoori, Thiago Tonelli Bartolomei and Krzysztof Czarnecki
Abstract A common practice followed by many application developers is to use existing framework applications as a guide to understand how to implement a framework-provided concept of interest. Unfortunately, finding the code that implements the concept of interest might be very difficult since the code might be scattered across and tangled with code implementing other concepts. To address this issue, this report presents an approach called FUDA (Framework API Understanding through Dynamic Analysis). The main idea of this approach is to extract implementation recipes of a given concept from runtime information collected when that concept is invoked in a number of different sample applications. For this purpose, we introduce a novel dynamic slicing approach named 'concept trace slicing' and combine it with clustering and data mining techniques. The experimental evaluation of FUDA suggests that this approach is effective in producing useful implementation recipes for a given concept with few false positives and false negatives.
Date March 10, 2008
CS-2007-19
Title CIME: some protocols for Computer In the Middle Education
Authors Jeremy Barbay
Abstract We introduce Computer In the Middle Education (CIME), a family of software providing access to some educational data in exchange of further addition to this data. We present several prototypes: VocaCIME, focused on gathering and teaching foreign vocabulary associated to images and CIMEQuiz, focused on multi-choice questions on simple mathematical concepts. In particular, we describe techniques for the quality control of the data submitted by the user, and techniques to insure the growth of the database when it is used. While this work is partially inspired by the human computation games ESP, Peekaboom, Phetch and Verbosity from Luis von Ahn and his collaborators, the approach illustrates new directions of research, such as bootstrapping databases, where a small initial database is grown into a larger one by its mere use, and computer in the middle techniques, where the computer provides a communication interface between users (rather than a game), but still recovers useful data from it. This technical report describes prototypes under development or yet to be developed.
Date June 12, 2007
CS-2007-20
Title Adaptive Planar Convex Hull Algorithm for a Set of Convex Hulls
Authors Jeremy Barbay and Eric Y. Chen
Abstract An adaptive algorithm is one which performance can be expressed more precisely than as a mere function of the size of the input: output-sensitive algorithms are a special case of adaptive algorithms. We consider the computation of the convex hull of a set of convex hulls, for instance in the case where the set of points has been composed from simpler objects from a library, for each of which the convex hull has been precomputed. We show that in this context an adaptive algorithm performs better if it takes advantage of other features than the size of the output.
Date November 28, 2007
CS-2007-21
Title PSALM: Accurate Sampling for Cardinality Estimation in a Multi-user Environment
Authors Huaxin Zhang h7zhang@cs.uwaterloo.ca
Ihab F. Ilyas ilyas@cs.uwaterloo.ca
Kenneth Salem kmsalem@cs.uwaterloo.ca
Abstract In database systems that support fine-grained access controls, each user has access rights that determine which tuples are accessible and which are inaccessible. Queries are answered as if the inaccessible tuples are not present in the database. Thus, users with different access rights may get different answers to a given query. To process queries efficiently in the presence of fine-grained access controls, the database system needs accurate estimates of the number of tuples that are both accessible according to the access rights of the submitting user and relevant according to the selection predicates in the query. In this paper we present sampling-based cardinality estimation techniques for use in the presence of fine-grained access controls. These techniques exploit the fact that access rights are relatively static and are common to all queries that are evaluated on behalf of a particular user. We show that the proposed techniques provide more accurate estimates than simpler techniques that do not exploit knowledge of access rights. We quantify these improvements analytically and through simulations.
Date June 27, 2007
CS-2007-22
Title XML Index Recommendation with Tight Optimizer Coupling
Authors Iman Elghandour, Ashraf Aboulnaga, Daniel C. Zilio, Fei Chiang, Andrey Balmin, Kevin Beyer, Calisto Zuzarte
Abstract

XML database systems are expected to handle increasingly complex queries over increasingly large and highly structured XML databases. An important problem that needs to be solved for these systems is how to choose the best set of indexes for a given workload. In this paper, we present an XML Index Advisor that solves this XML index recommendation problem, and has the key characteristic of being tightly coupled with the query optimizer. We rely on the optimizer to enumerate candidate indexes and to estimate the benefit gained from potential index configurations. We expand the set of candidate indexes obtained from the query optimizer to include more general indexes that can be useful for queries other than those in the training workload. To recommended an index configuration, we introduce two new search algorithms. The first algorithm finds the best set of indexes for the specific training workload, and the second algorithm finds a general set of indexes that can benefit the training workload as well as other similar workloads. We have implemented our XML Index Advisor in a prototype version of IBM DB2 9, which supports both relational and XML data, and we experimentally demonstrate the effectiveness of our advisor using this implementation.

Date July 11, 2007
CS-2007-23
Title Small poly-line drawings of series-parallel graphs
Authors Therese Biedl
Abstract In this paper, we study small planar drawings of planar graphs. For arbitrary planar graphs, $\theta(n2)$ is the established upper and lower bound on the area. It is a long-standing open problem for what graphs smaller area can be achieved. We show here that series-parallel can be drawn in $O(n^{3/2})$ area and outer-planar graphs can be drawn in $O(n\log n)$ area, but partial 3-trees and 2-outer-planar graphs require $\Omega(n2)$ area. Our drawings are visibility representations, which can be converted to poly-line drawings of asymptotically the same area.
Date June 21, 2007
CS-2007-24
Title Improving Convergence Rates in Multiagent Learning Through Experts and Adaptive Consultation
Authors Greg Hines and Kate Larson
Abstract We present a multiagent learning algorithm with guaranteed convergence
to Nash equilibria for all games. Our approach is a regret-based learning algorithm which combines a greedy random sampling method with consultation of experts that suggest possible strategy profiles. More importantly, by consulting carefully chosen experts we can greatly improve the convergence rate to Nash equilibria, but in the case where the experts do not return useful advice, we still have guarantees that our algorithm will eventually converge. The goal of our work is to bridge the gap between theoretical and practical learning, and we argue that our approach, FRAME, can serve as a framework for a class of multiagent learning algorithms.
Date July 26, 2007
CS-2007-25
Title Stronger lower bounds for text searching and polynomial evaluation
Authors Alex Golinski
Abstract In this paper, we give two main technical results:

(i) we show a stronger lower bound for substring search problem via compression extending results of Demaine and L{\'o}pez-Ortiz (SODA '01);

(ii) improve the results of Gal and Miltersen (ICALP '03) by showing a bound on the redundancy needed by the polynomial evaluation problem that is linear in terms of the information-theoretic minimum storage required by a polynomial.

Date December 20, 2007
CS-2007-26
Title WILL NOT BE SUBMITTED
Authors Alex Golinski
Abstract

Date
CS-2007-27
Title WILL NOT BE SUBMITTED
Authors Alex Golinski
Abstract

Date
CS-2007-28
Title Reconstructing Orthogonal Polyhedra from Putative Vertex Sets
Authors Therese Biedl and Burkay Genc
Abstract

In this report we study the problem of reconstructing orthogonal polyhedra from a putative vertex set, i.e., we are given a set of points and want to find an orthogonal polyhedron for which this is the set of vertices. We are especially interested in testing whether such a polyhedron is unique. Since this is not the case in general, we focus on orthogonally convex polyhedra, and give an $O(n log n)$ algorithm to find the answer. We then consider the case where the given set of points may be rotated beforehand. For 2D, we prove uniqueness and provide an $O(n log n)$ algorithm to obtain the answer, which can then be used for an $O(n2 log n)$ algorithm in 3D.

Date August 8, 2007
CS-2007-29
Title Robustness in Automatic Physical Database Design
Authors Kareem El Gebaly
Abstract

Automatic physical database design tools rely on what-if'' interfaces to the query optimizer to estimate the execution time of the training query workload under different candidate physical designs. The tools use these what-if interfaces to recommend physical designs that minimize the estimated execution time of the input training workload. Minimizing estimated execution time alone can lead to designs that are not robust to query optimizer errors and workload changes. In particular, if the optimizer makes errors in estimating the execution time of the workload queries, then the recommended physical design may actually degrade the performance of these queries. In this sense, the physical design is risky. Furthermore, if the production queries are slightly different from the training queries, the recommended physical design may not benefit them at all. In this sense, the physical design is not general. We define Risk and Generality as two new measures aimed at evaluating the robustness of a proposed physical database design, and we show how to extend the objective function being optimized by a generic physical design tool to take these measures into account. We have implemented a physical design advisor in PostqreSQL, and we use it to experimentally demonstrate the usefulness of our approach. We show that our two new metrics result in physical designs that are more robust, which means that the user can implement them with a higher degree of confidence. This is particularly important as we move towards truly zero-administration database systems in which there is not the possibility for a DBA to vet the recommendations of the physical design tool before applying them.

Date August 15, 2007
CS-2007-30
Title Support for Collaborative Feature-Based Product Configuration in Software Product Lines
Authors Marcilio Mendonca and Donald Cowan
Abstract In Software Product Lines (SPLs), product configuration is a decision-making process in which a group of stakeholders indicate the features desired for a particular product (software). A feature model is normally used to represent the spectrum of available configuration decisions and thus works as a guide to the configuration process. Although in practice product configuration is seen as a collaborative activity that involves satisfying stakeholders with divergent interests and skills, current configuration technology is essentially single-user-based in which user requirements are interpreted and translated into configuration decisions by a single role referred to as the application engineer. As a consequence, product configuration becomes time-consuming and inaccurate especially in the case of large product lines. This technical report discusses a doctoral research proposal on Collaborative Product Configuration (CPC). The research aims at investigating the major challenges of realizing CPC in SPLs and, subsequently, at developing an approach that explicitly addresses the problems identified. CPC concepts, algorithms, and tool support are discussed and some preliminary experimental results are shown. The research is expected to bring contributions to the SPLs field by paving the way for a deeper understanding of collaborative product configuration, and ultimately by fostering the development of newer and better approaches in the future.
Date August 20, 2007
CS-2007-31
Title Categorization of Implicit Invocation Systems
Authors Rolando Blanco and Paulo Alencar
Abstract

Development and maintenance of implicit invocation systems is not as well understood and supported as the development of explicit invocation systems. The situation is aggravated in systems that allow the composition of functionality developed by different organizations, potentially using different programming languages and methodologies. In this document, we categorize the various ways in which implicit invocation systems can define, bind to, announce, subscribe, and deliver events. We also categorize the architectures and topologies of implicit invocation systems. The purpose of the categorization is to increase the understanding of the type of requirements that these systems impose on software development practices. As the categories are introduced, documented representative invocation systems are discussed in order to illustrate the types of systems that fall under each given category. Our categorization applies to systems as varied as active databases, aspect-oriented applications, and distributed heterogeneous event-based systems.

Date August 22, 2007
CS-2007-32
Title A Fast Unified Optimal Route Query Evaluation Algorithm
Authors Edward P.F. Chan and Jie Zhang
Abstract We investigate the problem of how to evaluate, fast and efficiently, classes of optimal route queries on a massive graph in a unified framework. To evaluate a route query effectively, a large network is partitioned into a collection of fragments, and distances of some optimal routes in the network are pre-computed. Under such a setting, we find a unified algorithm that can evaluate classes of optimal route queries. The classes that can be processed efficiently are called constraint preserving (CP) which include, among others, shortest path, forbidden edges/nodes, α-autonomy, and some of hypothetical weight changes optimal route query classes. We prove the correctness of the unified algorithm. We then turn our attention to the optimization of the proposed algorithm. Several pruning and optimization techniques are derived that minimize the search time and I/O accesses. We show empirically that these techniques are effective. The proposed optimal route query evaluation algorithm, with all these techniques incorporated, is compared with a main-memory and a disk-based brute-force CP algorithms. We show experimentally that the proposed unified algorithm outperforms the brute-force algorithms, both in term of CPU time and I/O cost, by a wide margin.
Date September 6, 2007
CS-2007-33
Title Succinct Data Structures, Adaptive (analysis of) Algorithms: Overview, Combination, and Perspective
Authors Jeremy Barbay
Abstract Succinct data structures replace static instances of pointer based data structures, improving performance in both time and space in the word RAM model (a restriction of the RAM model where the size of a word is restricted). The adaptive analysis of algorithms consider the complexity in a finer way than merely grouping the instances by size, yielding more precise lower and upper bound on the complexity of a problem. We give a quick overview of those two techniques, some brief examples of how they can be combined on various search problems to obtain near optimal solutions, and some general perspective on the application and development of those techniques to other problems and models.
Date September 13, 2007
CS-2007-34
Title Impress: Towards Next-Generation Ubiquitous and Pervasive Computing Systems
Authors James P. Black, Hao Chen, and Omar Zia Khan
Abstract Research in pervasive and ubiquitous computing has resulted in a number of prototype systems developed and implemented in isolation. If this research area is to have impact, the next generation of research prototypes must move beyond single laboratories and isolated, independent software frameworks. Future systems must not only inter-work and inter-operate, but should enable more rapid prototyping and incorporation of components and software artifacts developed by others. Developing and exchanging software, including the concepts embodied in it, requires an approach based on standards and widely available software infrastructure.

The main contribution of the paper is to show that an appropriate infrastructure for next-generation ubiquitous and pervasive computing systems can be found in the Jabber open-source community. Jabber, now standardized as XMPP, was originally conceived as an open-source alternative to proprietary instant-messaging systems. Its design, architecture, and implementation provide a rich, extensible, standardized selection of concepts, protocols, servers, components, and clients. We show how Jabber provides an excellent foundation for the implementation of smart spaces, and a robust platform on which to build a ubiquitous-computing ecology, while supporting ambitious research goals in second-generation smart spaces. We also discuss various proof-of-concept prototypes that we have developed to validate our argument. Finally, we demonstrate how Jabber has allowed us to pursue research in pervasive computing without requiring substantial effort in a start-up phase.

Date October 3 , 2007
CS-2007-35
Title Rapid and Accurate Protein Side Chain Prediction Using Local Backbone Information Only
Authors Jing Zhang, Xin Gao, Jinbo Xu, and Ming Li (The first two authors contribute equally to the paper.)
Abstract High-accuracy protein structure modeling demands on accurate and very
fast side chain prediction since such a procedure must be repeatedly called at each step of structure refinement. Many known side chain prediction programs, such as SCWRL and TreePack, depend on the philosophy that global information and pairwise energy function must be used to achieve high accuracy. These programs are too slow to be used in the case when side chain packing has to be used thousands of times, such as protein structure refinement and protein design. We present an unexpected study that local backbone information can determine side chain conformations accurately. LocalPack, our side chain packing program which is based on only local information, achieves equal accuracy as SCWRL and TreePack, yet runs many times faster, hence providing a key missing piece in our efforts to high-accuracy protein structure modeling.
Date October 3, 2007
CS-2007-36
Title Fast and Optimal Scheduling Over Multiple Network Interfaces
Authors Matei A. Zaharia and Srinivasan Keshav
Abstract Today's mobile phones already contain two or more network interfaces (NICs) and future devices are likely to have several more, each with different energy costs, dollar costs, and data transmission capacities. Given a mobile device that is running multiple data-oriented applications, such as email, instant messenger, and video download, some of which may be delay-tolerant, it becomes necessary to assign, to each unit of application data, the network interface and the time of transmission that maximizes user satisfaction. This scheduling algorithm must take into account not only user, application, and NIC characteristics, but also future opportunistic availability of NICs due to device mobility.

In this paper, we begin by showing that naive scheduling approaches can be far from optimal. We then show that although optimal schedules can be found using either linear programming or network flow, these classical approaches are infeasible in CPU- and memory-constrained mobile devices. This motivates our novel hill-climbing approach that exploits problem structure to optimally schedule application messages over multiple NICs assuming perfect knowledge of future connectivity. We prove mathematically that our algorithm is correct.

Detailed performance measurements show that our algorithm finds optimal solutions 10-100 times faster than the simplex method and network flow. We also describe an implementation of our solution in a J2ME-based scheduler that runs on laptops, smartphones and PDAs, demonstrating its effectiveness in a realistic setting.

Date October 23, 2007
CS-2007-37
Title Recommender Schemes for Peer-to-Peer Systems
Authors Loubna Mekouar, Youssef Iraqi, and Raouf Boutaba
Abstract In Peer-to-Peer (P2P) file sharing systems, peers spend a significant amount of time looking for relevant and interesting files. However, the files available for download represent on one hand a rich collection for different needs and preferences and on the other hand a struggle for the peers to find files that they like. In this paper, we propose new recommender schemes based on collaborative filtering. Peers collaborate to filter out irrelevant files. These schemes help peers find and discover new, interesting and relevant files. We propose recommender schemes based on Files' Popularity and/or Peers' Similarity. To overcome the problems of traditional collaborative filtering recommender systems, an implicit rating approach is used. Simulation results confirm the effectiveness of Peers' Similarity based Recommendation in providing accurate recommendations. In addition, the proposed recommender schemes are proactive. Recommendations are provided to peers to motivate them to download the recommended files.
Date November 20, 2007
CS-2007-38
Title Experimental Evaluation of List Update Algorithms for Data Compression
Authors Reza Dorrigiv, Alejandro Lopez-Ortiz, and J. Ian Munro
Abstract List update algorithms have been widely used as subroutines in compression schemas, specially after the introduction of the Burrows-Wheeler transform. The Burrows-Wheeler transform (BWT), which is the basis of most state-of-the-art general purpose compressors, defines a permutation of the text with increased apparent regularity. Then it applies a compression algorithm on the permuted version of the original text. List update algorithms are a common choice for this second stage of BWT-based compression. As well, list update algorithms have been shown to be a reasonable alternative to simple compression schemes such as Huffman coding \cite{bentley}. In this paper we perform an experimental comparison of various list update algorithms both as stand alone compression mechanisms and as a second stage of the BWT-based compression. Our experiments show that for straightforward compression competitive list update algorithms such as MTF while simpler, are an inferior choice to Huffman based compression on, in contrast they are a good choice as a second stage in BWT. Then we address the question of which list update algorithm is best for this second stage. We show that MTF outperforms other list update algorithms in practice after BWT. This is consistent with the intuition that BWT increases locality of reference and the predicted result from the locality of reference model of Angelopoulos et al.~\cite{bijective}. Lastly, we show theoretically that due to an often neglected difference in the cost models, good list update algorithms may be far from optimal for BWT compression.
Date October 22, 2007
CS-2007-39
Title The Cooperative Ratio of On-line Algorithms
Authors Reza Dorrigiv and Alejandro Lopez-Ortiz
Abstract On-line algorithms are usually analyzed using competitive analysis, in which the performance of an on-line algorithm on a sequence is normalized by the performance of the optimal off-line algorithm on that sequence. In this paper we introduce cooperative analysis as an alternative general framework for the analysis of on-line algorithms. The idea is to normalize the performance of an on-line algorithm by a measure other than the performance of the off-line optimal algorithm OPT. We show that in many instances the perform of OPT on a sequence is a coarse approximation of the difficulty or complexity of a given input. Using a finer, more natural measure we can separate paging and list update algorithms which were otherwise indistinguishable under the classical model. This creates a performance hierarchy of algorithms which better reflects the intuitive relative strengths between them. Lastly, we show that, surprisingly, certain randomized algorithms which are superior to MTF in the classical model are not so in the cooperative case, which matches experimental results. This confirms that the ability of the on-line cooperative algorithm to ignore pathological worst cases can lead to algorithms that are more efficient in practice.

Date October 22, 2007
CS-2007-40
Title Design and Implementation of the KioskNet System
(Extended Version)*
Authors S. Guo, M.H. Falaki, U. Ismail, E.A. Oliver, S. Ur Rahman, A. Seth, M.A. Zaharia, and S. Keshav David R. Cheriton School of Computer Science University of Waterloo Waterloo, Ontario N2L 3G1 {sguo, mhfalaki, uismail, eaoliver, surrahman, a3seth, mazahari, keshav}@uwaterloo.ca
Abstract Rural Internet kiosks in developing regions can cost-effectively provide communication and information services to the poorest sections of society. Yet, a variety of technical and non-technical issues have caused most kiosk deployments to be economically unsustainable. KioskNet addresses the key technical problems underlying kiosk failure by using robust `mechanical backhaul' for connectivity, and by using low-cost and reliable kiosk-controllers to support services delivered from one or more recycled PCs. KioskNet also addresses related issues such as security, user management, and log collection. In this paper, we describe the KioskNet system, outlining its hardware, software, and security architecture. We describe a pilot deployment,and how we used lessons from this deployment to re-design our initial prototype.
Date November 23, 2007
CS-2007-41
Title Circular Arcs as Primitives for Vector Textures
Authors Zheng Qin, Craig S. Kaplan, and Michael D. McCool".
Abstract Because of the resolution independent nature of vector graphics images, it is useful to consider using them directly as textures in 3D rendering. Spline curves are used to represent boundaries in standard vector graphics representations. An- tialiased rendering of such content can be obtained by soft thresholding an implicit representation of these boundary features. The distance function is an especially useful implicit representation, and an accurate distance can also be used for special effects such as embossing and stroke texturing. Unfortunately, computation of the true distance to a spline curve is expensive. Therefore, normally either the distance is approximated or the spline curves are approximated with simpler primitives for which an accurate distance can be computed. Approximation with line segments gives only C 0 continuity. We approximate spline curves using circular arcs instead. This approximation has C 1 continuity, and computing the distance to a circular arc is nearly equivalent in expense to computing the distance to a line segment.
Date December 13, 2007
CS-2007-42
Title Design and Implementation of an SMS Control Channel for Mobile Systems
Authors Earl Oliver
Abstract The Short Message Service (SMS) is one of the most ubiquitous wireless technologies on Earth. Each year hundreds of billions of messages are sent, demand continues to grow, and competition between cellular providers is driving prices down. These trends create practical opportunities for the use of SMS in mobile systems. In this paper we present the design and implementation of an SMS-based data channel, or SMS-NIC, that is lightweight and runs on multiple mobile platforms. Through integration with an existing mobile platform, we show that the SMS-NIC has little operational overhead and provides efficient, reliable transport for large messages send over the cellular network, where message losses frequently occur.

We motivate the design of the SMS-NIC through a characterization of SMS using workloads consisting of bursts of messages between sender and receiver. This analysis differs from previous SMS studies by focusing on transmission patterns that differ from normal cell phone use. Through this characterization we show that message delay and message loss are affected by the transmission order, the time of day has minimal effect on transmission rate, delay, and loss, and that losses and high delays significantly outweigh the rate or message reordering.

Date February 18, 2008
CS-2007-43
Title Securing KioskNet: A Systems Approach
Authors Sumair Ur Rahman, Urs Hengartner, Usman Ismail and S. Keshav
Abstract Internet kiosks typically provide weak security guarantees and therefore cannot support secure web access or transaction-oriented applications such as banking and bill payment. We present the design and implementation of a practical, unobtrusive and easy-to-use security architecture for KioskNet, a system for low-cost rural Internet kiosks. Our system uses a combination of physical and cryptographic mechanisms to protect user data and infrastructure nodes despite frequent disconnections, untrusted administrators, and large end-to-end delays. Our contributions include (a) a detailed threat analysis of rural kiosks, (b) the first comprehensive security architecture for rural kiosk security, (c) a simple and easy to use API for securely sending and receiving files over KioskNet (d) protection for user data both on the file system and in transit, and (e) a usable public key infrastructure (PKI). Performance measurements show that our system imposes 100ms of delay during user login at kiosks and an additional 560ms and 750ms per megabyte of data sent and received by a kiosk user, respectively, suggesting that the system is both efficient and usable.
Date November 15, 2007
CS-2007-44
Title A Preliminary Report on Tool Support and Methodology for Feature Interaction Detection
Authors Alma L. Juarez-Dominguez, Nancy A. Day, Richard T. Fanson
Abstract In this report, we describe our effort to create in matlab's stateflow a set of non-proprietary advanced automotive feature design models, and to translate these design models into models that can be input to the model checker SMV. We are interested in verifying the absence of feature interactions in the integration of the automotive features, as well as the lack of errors in the design. In the automotive domain, a feature is a bundle of system functionality recognized by the driver and providing advanced functionality to the vehicle, for instance, Cruise Control (CC). Each feature is normally implemented in software and has a degree of control over the mechanical components that operate the dynamics of the vehicle. Examples of these mechanical components are brakes, throttle and steering. We have created our set of non-proprietary feature design models to assess different techniques and tools that analyze the integration of the features and their correctness. The translated models will allow us to verify properties of the automotive features at the same level of description as the design, so we can ensure that the findings of our analysis are applicable to the design of the automotive components.
Date December 18, 2007
CS-2007-45
Title Internalization on the Toronto Stock Exchange
Authors Michael Lerman and Kate Larson
Abstract This is a study of the impact of internalization on the Toronto Stock Exchange (TSX). We present an abstract model of the internalization process and discuss some observations and conclusions that can be drawn from the model, including results from a simulation study. We describe a system that we built in order to use actual order data from the Toronto Stock Exchange, and report our findings and experiences from working with this data.
Date November 15, 2007
CS-2007-46
Title On Stability and Convergence of Multi-Commodity Networks and Services
Authors Jin Xiao and Raouf Boutaba
Abstract The rise of distributed services and user-driven networking concepts in recent years poses the critical question of stability. Can a system operating under non-cooperation and self-interest converge to a stable state? and how fast? The answers to these questions readily lend themselves to game theory analysis, and to the study of congestion games in particular. In the past, much work have been done on establishing the existence of pure Nash equilibria in congestion games, and has shown that finding a pure Nash equilibrium is PLS-complete and hence convergence to a pure Nash equilibrium is very difficult (exponential time in worst case). Furthermore, much of the convergence analysis have been carried out on simple single-commodity game models. In this paper, we attempt to construct a more realistic multi-commodity congestion game model suited for distributed service and user-driven networking scenarios. We introduce the desirability of equilibrium concept that is helpful in determining whether a system state meets the quality requirements of the users and services. Desirability is an alternative concept to price of anarchy. In fact we show the desirability ratio is a special case of price of anarchy. We then define the $\alpha$-threshold congestion game whose minimum potential state corresponds to a desirable equilibrium (if the system permits one) and we bound its convergence to polynomial time through game transformation. Finally, we present a mechanism for partial simultaneous moves. To the best of our knowledge, there has been no prior establishment of the desirability concept and no bound given on the convergence of asymmetric multi-commodity congestion games with exponential cost function.
Date November 14, 2007
CS-2007-47
Title Understanding Participatory Media Using Social Networks
Abstract Given the rapid growth of participatory media content such as blogs, user created videos, and podcasts, there is a need to understand and answer the following kind of questions: What is participatory media good for? Does it improve the effectiveness of news media? If so, then why? What are the underlying processes of human communication that can explain how people perceive and interpret the information they gain from participatory media? How can this understanding be used to design better information search and recommendation systems for the Internet? We try to answer these questions by examining participatory media through models built using social networks of people. We first give a few examples of participatory media which indicate that participation by people in online public discourse through blogs and forums can indeed improve the effectiveness of news media: it helps people gain a better understanding of topics discussed in mass media, and presents people with diverse viewpoints to avoid media bias. We then use social networks to propose a model for the underlying processes of communication to answer why and how participatory media would improve the effectiveness of news media. An investigation methodology is then outlined for validation of the model, and validation results from user-surveys and measurements done on an online social networking website are presented. Our results are positive and indicate that a social network based framework offers a plausible approach to the design of useful applications for participatory media.
Date January 14, 2008
CS-2007-48
Title Optimal Speedup on a Low-Degree Multi-Core Parallel Architecture (LoPRAM)
Authors Reza Dorrigiv, Alejandro Lopez-Ortiz and Alejandro Salinger
Abstract Modern microprocessor architectures have gradually incorporated support for parallelism. In the past the degree of parallelism was rather small and as such it could be best modeled as a constant speedup over the traditional RAM model, however, as a consequence of continued growth this assumption might no longer hold. For example, with the introduction of 32- and 64-bit architectures, bit-level parallelism became significant. This led to the introduction of the transdichotomous RAM model, for which many algorithms which are faster in theory and practice have been developed. Similarly, over the last five years, ma jor microprocessor manu- facturers have released road maps for the next decade predicting a rapidly increasing number of cores, with upwards of 64 cores per microprocessor by 2015. In such a setting a sequential RAM computer no longer accurately reflects the architecture on which algorithms are being executed. In this paper we propose a model of low degree parallelism (LoPRAM) which builds upon the RAM and PRAM models yet better reflects recent advances in parallel (multi-core) architectures. The model has as a goal a combination of fidelity of the underlying hardware and ease of programming and analysis. When necessary we make tradeoffs between what is achievable in hardware and what is understandable and programmable by humans/compilers. The proposed model supports a high level abstraction that simplifies the design and analysis of parallel programs. Then we show that in many instances this model, though in many ways similar to the classic PRAM, it naturally leads to work-optimal parallel algorithms via simple modifications to sequential algorithms. This is in contrast to the PRAM model, in which the design and analysis and implementation of work-optimal algorithms proved to be one of the biggest challenges in practice for their adoption.
Date November 26, 2007
CS-2007-49
Title Extending Typestate Analysis to Multiple Interacting Objects
Authors Nomair A. Naeem and Ondrej Lhotak
Abstract This paper extends static typestate analysis to temporal specifications of groups of interacting objects, which are expressed using tracematches. Unlike typestate, a tracematch state may change due to operations on any of a set of objects bound by the tracematch. The paper proposes a lattice-based operational semantics which is proved equivalent to the original tracematch semantics but is better suited to static analysis. The static analysis is presented next, and is proved sound with respect to the semantics. The analysis computes precise local points-to sets and tracks the flow of individual objects, thereby enabling strong state updates. A fully context-sensitive version of the analysis has been implemented as instances of the IFDS and IDE algorithms. The analysis was evaluated on tracematches used in earlier work and found to be very precise. Remaining imprecisions could be eliminated with more precise modeling of references from the heap and of exceptional control flow.
Date December 20, 2007