# 2005 technical reports

CS-2005-01
Title Index-Trees for Descendant Tree Queries on XML documents
Authors Jérémy Barbay
Abstract We propose an index structure and an algorithm to answer Descendant Tree queries, the subset of XPath queries limited to the descendant axis, in computational models where elements can be accessed in non-sequential order. The indexing scheme uses classical labeling techniques, but structurally represents the ancestor-descendant relationships between nodes of same type, in order to allow faster searches. The algorithm solves XPath location path queries along the descendant axis in a holistic way, looking at the whole query rather than just concentrating on individual components, as opposed to “set-at-a-time” strategies. Using an adaptive analysis, we prove tight bounds on the worst case complexity in the comparison model, and a reasonable upper bound on the worst case complexity in the hierarchical memory model, which accounts for the number of I/O transfers at any fixed level. Keywords: XPath location paths; comparison model; holistic; cache-oblivious; adaptive.
Date January 2005
Report Index-Trees for Descendant Tree Queries on XML documents (PS) Index-Trees for Descendant Tree Queries on XML documents (PDF)
CS-2005-02
Title Stereo Rendering: An Overview
Authors Jamie Brettle amd Gladimir Baranoski
Abstract Humans visually perceive the world using a variety of depth cues. Stereopsis is an example of such a depth cue, and it is directly associated with the human binocular visual system. Different computer graphics approaches can be used to take advantage of stereopsis in order to enable a true 3D perception of virtual environments. In this report, we briefly review computer graphics hardware and software tools designed for this purpose as well as optimization techniques that can be applied to speed up the synthesis of stereo images.
Date December 2005
Report Stereo Rendering: An Overview (PDF)
CS-2005-03
Title A new formulation for haplotype inference by pure parsimony
Authors Daniel G. Brown, Ian M. Harrower
Date January 2005
Report A new formulation for haplotype inference by pure parsimony (PDF)
CS-2005-04
Title Building an Embedded Control Program Workload
Authors Heng Yu and Grant Weddell
Abstract A compiled database application has a repository style of architecture in which a collection of modules operate on a common database in main memory through a set of predefined transaction types. To conduct simulated experiments to evaluate the performance of locking techniques for compiled database application, we need to design a workload to approximate the data processing in main memory. We choose the open-source MINIX operating system as our study case, and select a set of system calls as predefined transaction types. We map the kernel data structures to relational tables, and the source code in C programming language to C code with embedded SQL statements (C/SQL code). Furthermore, the C/SQL code are transformed to the finite state machines of the transaction types, and the costs of states and transitions are extracted from the original C code in terms of numbers of machine instructions. Thus the workload is built up. We also discuss how to model the costs of locking/unlocking operations in our workload.
Date February 2005
Report Building an Embedded Control Program Workload (PDF)
CS-2005-05
Title Empirical Analysis on the Geriatric Care Data Set Using Rough Sets Theory
Authors Jiye Li and Nick Cercone
Abstract A well known problem for association rules generation is that too many rules are generated, and it is difficult to determine manually which rules are more useful, interesting and important. In our study of using rough sets theory to improve the utility of association rules, we propose a new rule importance measure to select the most appropriate rules. In order to explore the application of the proposed method to large data sets, we perform the experiments on a Geriatric Care data set.
Date February 2005
Report Empirical Analysis on the Geriatric Care Data Set Using Rough Sets Theory (PDF)
CS-2005-06
Title Query-Based Approach to Workflow Process Dependency Analysis
Authors Weizhen Dai and H. Doninic Covvey
Abstract

Dependency analysis is important in all of the stages of workflow processes. An appropriate analysis will help us to identify the potentially affected entities if changes occur. In this technical report we thoroughly analyze the critical entities of workflow processes and their dependency relationships, and propose a multi-dimensional dependency model, that includes routing dependency, data dependency and role dependency. We further build a knowledge base to represent the multi-dimensional dependency model using the logic programming language Prolog. We then develop a set of query rules that can be applied to the well-defined knowledge base at both activity and process levels to retrieve the potentially affected entities. Finally we use a case study of workflow processes in the healthcare domain to show how our dependency analysis methodology works.

Date October 2005
Report Query-Based Approach to Workflow Process Dependency Analysis (PDF)
CS-2005-09
Title The complexity of Domino Tiling
Authors Therese Biedl
Abstract In this paper, we study the problem of how to tile a layout with dominoes. For non-coloured dominoes, this can be determined easily by testing whether the layout graph has a perfect matching. We study here tiling with
coloured dominoes, where colours of adjacent dominoes must match. It was known that this problem is NP-hard when the layout graph is a tree. We first strengthen this NP-hardness result in two ways: (1) we can use a
path instead of a tree, or (2) we can force that exactly all given dominoes are used. However, both these reductions (as well as the existing one) use an unbounded numbers of colours, which is not realistic for domino tiling. As our main interest, we hence study domino tiling with a constant number of colours. We show that this is NP-hard even with three colours.
Date February 2005
Report The complexity of Domino Tiling (PS) The complexity of Domino Tiling (PDF)
CS-2005-10
Title Indexing the Results of Sliding Window Queries
Authors Lukasz Golab, Piyush Prahladkaz and M. Tamer Ozsu
Abstract A popular method of bounding the memory requirements of queries over data streams is to use a sliding window, where old data are continuously removed as new data arrive. One problem that has not been addressed previously concerns indexing the results of sliding window queries. This is a noteworthy problem because data stream systems often materialize common sub-expressions or final results of similar queries, therefore it is important to allow the queries efficient access into a relevant subset of the materialized result. In this paper, we design and evaluate indexing methods that take advantage of the temporal order in which old answers expire for efficient maintenance and querying of the results. Our techniques allow sharing of indexed results among similar queries, adapt to fluctuating stream arrival rates, and are experimentally shown to perform updates over twice as fast as the existing sliding window indices.
Date May 2005
Report Indexing the Results of Sliding Window Queries (PDF)
CS-2005-11
Title RDL: A Language for Framework Instantiation Representation
Authors Toacy C. Oliveira, Paulo S. C. Alencar, Carlos J.P. de Lucena, Donald D. Cowan.
Abstract Reusing software artifacts for system development is showing increasing promise as an approach to reducing the time and effort involved in building new systems, and to improving the software development process and the quality of its outcome. However, software reuse has an associated steep learning curve, since practitioners must become familiar with a third party rationale for representing and implementing reusable assets. For this reason, enabling a systematic approach to the reuse process by making software reuse tasks explicit, allowing software frameworks to be instantiated using pre- defined primitive and complex reuse operations, and supporting the reuse process in a (semi-)automated way become crucial goals. In this paper, we introduce a systematic reuse approach and Reuse Description Language (RDL), a language designed to specify object-oriented framework instantiation processes, and an RDL execution environment, which is the tool support for definition and execution of reuse processes and framework

Toacy C. Oliveira is with Faculty of Informatics at the Pontifical University Catholic of Rio Grande do Sul, Brazil . His e-mail address is toacy@inf.pucrs.br. Paulo S. C. Alencar and Donald D. Cowan are with the School of Computer Science at the University of Waterloo, Ontario, Canada. Their e-mail addresses are {palencar, dcowan} @csg.uwaterloo.ca respectively. Carlos J. P. de Lucena is with the Computer Science Department at the Pontifical University of Rio de Janeiro Brazil. His e-mail is lucena@inf.puc-rio.br. instantiations that lead to domain-specific applications. We illustrate our approach using JUnit, a framework for testing.
Date May 2005
Report RDL: A Language for Framework Instantiation Representation (PDF)
CS-2005-12
Title xTAO: Enabling a Declarative Approach to the Specification of Multi-Agent Systems
Authors Toacy C. Oliveira, Paulo Alencar, Don Cowan, Carlos Lucena
Abstract

Research on software agents has produced a diversity of conceptual models for high-level abstract descriptions of multi-agent systems (MASs). However, it is still difficult and costly for designers that need a unique set of agent modelling features to either develop a new agent modelling language from scratch or undertake the task of modifying an existing language. In addition to the modelling itself, in both cases a significant effort need to be expended in building or adapting tools to support the language. An extensible agent modelling language is crucial to experimenting with and building tools for novel modelling constructs that arise from evolving research. Existing approaches typically support a basic set of modelling constructs very well, but adapt to others poorly. A declarative language such as XML and its supporting tools provides an ideal platform upon which to develop and extensible modelling language for multi-agent systems. In this paper we describe xTAO, an extensible agent modelling language, and also demonstrate its value in the context of a real-world application.

Date 2005
Report xTAO: Enabling a Declarative Approach to the Specification of Multi-Agent Systems (PDF)
CS-2005-13
Title Complexity of Octagonal and Rectangular Cartograms
Authors T. Biedl and B. Genc ∗
Abstract In this paper, we study the complexity of cartograms, i.e., maps where every region should be deformed such that a given area requirement is satisfied. We consider the case where every region is an orthogonal octagon, and show that this problem is NP-hard. From our proof, it also follows that cartograms with rectangular faces are NP-hard if we allow for the existence of a "sea", i.e., a region of arbitrarily high complexity on the outside of the drawing.
Date December 2005
Report Complexity of Octagonal and Rectangular Cartograms (PDF)
CS-2005-14
Title Shortest Path Trees Computation in Dynamic Graphs
Authors Edward P. F. Chan & Yaya Yang
Abstract Let G = (V, E,w) be a simple digraph, in which all edge weights are non-negative real numbers. Let G¡¦ be obtained from G by the application of a set of edge weight updates to G. Let s „¡ V , and let Ts and T be a Shortest Path Tree (SPT) rooted at s in G and G , respectively. The Dynamic Shortest Path (DSP) problem is to compute T from Ts. For the DSP problem, we correct and extend a few existing SPT algorithms to handle multiple edge weight updates. We prove that these extended algorithms are correct. The complexity of these algorithms is also analyzed. To evaluate the proposed algorithms, we compare them with the well- known static Dijkstra algorithm. Extensive experiments are conducted with both real-life and arti&#64257;cial data sets. The real-life data are road system graphs obtained from the Connecticut road system and are relatively sparse. The arti&#64257;cial data are randomly generated graphs and are relatively dense. The experimental results suggest the most appropriate algorithms to be used under di&#64256;erent circumstances.
Date March 2005
Report Shortest Path Trees Computation in Dynamic Graphs (PDF)
CS-2005-15
Title Optimization and Evaluation of Shortest Path Queries
Authors Edward P. F. Chan & Heechul Lim
Abstract We investigate the problem of how to evaluate efficiently a collection of shortest path queries on massive graphs that are too big to fit in the main memory. To evaluate a shortest path query efficiently, we introduce two pruning algorithms. These algorithms differ on the extent of materialization of shortest path cost and on how the search space is pruned. By grouping shortest path queries properly, batch processing improves the performance of shortest path query evaluation. Extensive study is also done on fragment sizes, cache sizes and query types that we show that affect the performance of a disk-based shortest path algorithm. The performance and scalability of proposed techniques are evaluated with large road systems in the Eastern United States. To demonstrate that the proposed disk-based algorithms are viable, we show that their search times are significantly better than that of main-memory Dijkstra's algorithm.
Date September 2005
Report Optimization and Evaluation of Shortest Path Queries (PDF)
CS-2005-16
Authors Robert Warren, Dana Wilkinson, Alejandro L'opez-Ortiz
Abstract We study the refresh model required to keep an up to date copy of a web page. This has applications for more efficient and frequent crawling, in the case of search engines, and for higher hit rates for proxy servers with pre-fetching. Two main models have been proposed namely the uniform refresh model (Cho and Garcia-Molina, 2000) and the adaptive page refresh model (Edwards et al., 2001), with some debate as to the relative value of each model.

In this work we show that adaptive page refresh models can be further improved by careful selection of initial page-refresh rates of newly added links as indicated by our page evolution studies showing that page-change rates (and consequently page-refresh rates) are dependent on top-level domain, category and page depth.

Date May 2005
Report URL-Enhanced Adaptive Page-Refresh Models (PDF)
CS-2005-17
Title Vocabulary size and email authentication
Authors Robert Warren
Abstract This paper explores the performance of the method proposed by Efron and Thisted to predict vocabulary sizes based on sampled text. The objective of this research is to determine whether this simple and quick test can be used as a coarse indicator of authorship. Three sets of emails, as well as other texts are analyzed in order to collect performance data. The conclusion is that the test is at best a lower bound indicator within the $T<$1.0 region and is not sufficient as an authentication method.
Date July 2005
Report Vocabulary size and email authentication (PDF)
CS-2005-18
Title Title: Dynamic Histograms for Non-Stationary Updates
Authors Elizabeth Lam and Kenneth Salem
Abstract In this paper, we address the problem of incrementally maintaining a histogram in response to a non-stationary update process. In relational database systems, this problem can occur whenever relations model time-varying activities. We present a simple update model that is general enough to describe both stationary and non-stationary update processes, and we use it to show that existing histogram maintenance techniques can perform poorly when updates are non-stationary. We describe several techniques for solving this problem, and we use the update model to demonstrate that these techniques can effectively handle a broad range of update processes, including non-stationary ones.
Date June 2005
Report Title: Dynamic Histograms for Non-Stationary Updates (PDF)
CS-2005-19
Title Optimal Bsic Block Instruction Scheduling for Multiple-Issue Processors using Constraint Programming
Authors Abid M. Malik, Jim McInnes and Peter van Beek
Abstract Instruction scheduling is one of the most important steps for improving the performance of object code produced by a compiler. A fundamental problem that arises in instruction scheduling is to find a minimum length schedule for a basic block---a straight-line sequence of code with a single entry point and a single exit point---subject to precedence, latency, and resource constraints. Solving the problem exactly is NP-complete, and heuristic approaches are currently used in most compilers. In contrast, we present a scheduler that finds provably {\em optimal} schedules for basic blocks using techniques from constraint programming. In developing our optimal scheduler, the keys to scaling up to large, real
problems were improvements to the constraint model and to the constraint propagation phases.
We experimentally evaluated our optimal scheduler on the SPEC 2000 integer and floating point benchmarks. On this benchmark suite, the optimal scheduler was very robust and scaled to the largest basic blocks---all but a handful of the hundreds of thousands of basic blocks in our benchmark suite could not be solved optimally within a reasonable time limit, and the scheduler was able to routinely solve the largest basic blocks, including basic blocks with up to 2600 instructions. This compares favourably to the best previous exact approaches.
Date November 2005
Report Optimal Bsic Block Instruction Scheduling for Multiple-Issue Processors using Constraint Programming (PDF)
CS-2005-20
Title An Evaluation of Metro Express
Authors Shahram Esmaeilsabzali, Leonard Wong, and Nancy A. Day
Abstract We study Express, a system that uses template semantics to map specifications to SMV models. We investigate the efficiency of the generated SMV models. We consider two case studies and compare manually created SMV models with models generated by Express. The generated models are more complex and have larger state spaces, and consequently longer verification times. We also analyze the effect of using different optimization schemes in Cadence SMV. Interestingly, some of the optimizations are more applicable to the generated models and made verification of the generated models outperform the manually created models for some queries. However, considering the best case for each model, the manually created model always outperforms the generated model. We propose some optimizations that could be used to improve dramatically the efficiency of the models generated by Express. We estimate the magnitude of improvements that such optimizations can provide.
Date December 2005
Report An Evaluation of Metro Express (PDF)
CS-2005-21
Title An XML Routing Synopsis in Unstructured P2P Networks
Authors Qiang Wang, Abhay Kumar Jha, and Tamer Ozsu
Abstract Many emerging applications that use XML are distributed, usually over large peer-to-peer (P2P) networks on the Internet. The deployment of an XML query shipping system over P2P networks requires a specialized synopsis to capture XML data in routing tables. In this paper, we propose a novel graph-structured routing synopsis, called kd-synopsis, for deployment over unstructured super-peer based P2P networks. This synopsis is based on length-constrained FBsimulation relationship, which allows the balancing of the precision and size of the synopsis according to different space constraints on peers with heterogeneous capacity. We report comprehensive experiments to demonstrate the effectiveness of the kd-synopsis.
Date August 2006
Report An XML Routing Synopsis in Unstructured P2P Networks (PDF)
CS-2005-22
Title XSEED: Accurate and Fast Cardinality Estimation for XPath Queries
Authors Ning Zhang, M. Tamer Ozsu, Ashraf Aboulnaga, and Ihab F. Ilyas
Abstract Cardinality estimation is a crucial part of a cost-based optimizer. Many research efforts have been focused on XML synopsis structures of path queries for cardinality estimation in recent years. In ideal situations, a synopsis should provide accurate estimates for different types of queries over a wide variety of data sets, consume
a small amount of memory while being able to adjust as memory budgets change, and be easy to construct and update. None of the existing synopsis proposals satisfy all of the above requirements. In this paper, we propose a novel synopsis, XSEED, that is accurate, robust, efficient, and adaptive to memory budgets. We construct an XSEED structure starting from a very small kernel, then incrementally update information of the synopsis. With such an incremental construction, a synopsis structure can be dynamically
configured to accommodate different memory budgets. It can also handle updates to underlying XML documents, and be self-tuning by incorporating query feedback. Cardinality estimation based on XSEED can be performed very efficiently and accurately, with our techniques of small footprint and novel recursion handling. Extensive experiments on both synthetic and real data sets are conducted, and our results show that even with less memory, the accuracy of XSEED could achieve an order of magnitude better than that of other synopsis structures. The cardinality estimation time is under 2% of the actual querying time for a wide range of queries in all test cases.
Date June 2005
Report XSEED: Accurate and Fast Cardinality Estimation for XPath Queries (PDF)
CS-2005-23
Title Proceedings of the Doctoral Consortium at the 13th IEEE International Requirements Engineering Conference
Authors Edited by Nancy A. Day, University of Waterloo
Date August 2005
Report Proceedings of the Doctoral Consortium at the 13th IEEE International Requirements Engineering Conference (PDF)
CS-2005-24
Title Multi column matching for database schema translation
Authors Robert H. Warren, Dr. Frank Wm. Tompa
Abstract We describe a generalised method for discovering complex schema matches involving multiple database columns. The method does not require linked data and is capable of dealing with both fixed- and variable-length field columns.
This is done through an iterative algorithm which learns the correct sequence of concatenations of column substrings in order to translate from one database to another. We introduce the algorithm along with examples on common database data values and examine its performance on real-world and synthetic data sets.
Date August 2005
Report Multi column matching for database schema translation (PDF)
CS-2005-25
Title Assisting Framework Instantiation: Enhancements to Process-Language-based Approaches
Authors Marcillio Mendonca, Paulo Alencar, Toacy Oliveira, Donald Cowan
Abstract Application frameworks have been successfully used as valuable tools to improve software quality while reducing development efforts. Nevertheless, frameworks still face important challenges in order to be widely adopted. In particular, framework instantiation is still a painful task requiring application developers to understand the intricate details surrounding the framework design. Some approaches to alleviate this problem have already been proposed in the literature but they are usually either just a textual cross-referenced document of the instantiation activities or too tied to technology or specific application domains. In this paper, we present the results of our latest investigations to improving our approach to framework instantiation. In particular, we discuss a process language we have developed to guide framework instantiation explicitly, and the most recent updates we have made to improve the language expressiveness. Furthermore, we present a case study used to evaluate our approach and to identify current and future extensions.
Date August 2005
Report Assisting Framework Instantiation: Enhancements to Process-Language-based Approaches (PDF)
CS-2005-26
Title Interface Automata with Complex Actions - Extended Version
Abstract Many formalisms use interleaving to model concurrency. To describe some system behaviours appropriately, we need to limit interleaving. For example, in component-based systems, we wish to limit interleaving to force the inputs to a method to arrive together in order. In web services, the arrival of XML messages consisting of multiple simple parts should not be interleaved with the behaviour of another component. We introduce "interface automata with complex actions" (IACA), which add complex actions to de Alfaro and Henzinger's interface automata (IA). A complex action is a sequence of actions that may not be interleaved with actions from other components. The composition and refinement operations are more involved in IACA compared to IA, and we must sacrifice associativity of composition. However, we argue that the advantages of having complex actions make it a useful formalism. We provide proofs of various properties of IACA and discuss the use of IACA for modelling web services.
Date 2005, Revised May 15, 2006
Report Interface Automata with Complex Actions - Extended Version (PDF)
CS-2005-27
Title Toward an understanding of haplotype inference by pure parsimony
Authors Daniel G. Brown, Ian M. Harrower
Date January 2005
Report Toward an understanding of haplotype inference by pure parsimony (PDF)
CS-2005-28
Title On Concurrency Control in Sliding Window Queries over Data
Authors Lukasz Golab, Kumar Gaurav Bijay, M. Tamer Özsu
Abstract Data stream systems execute a dynamic workload of long-running and one-time queries, with the streaming inputs typically bounded by sliding windows. For efficiency, windows may be advanced periodically by replacing the oldest part of the window with a batch of newly arrived data. Existing work on stream processing assumes that a window cannot be advanced while it is being accessed by a query. In this paper, we argue that concurrent processing of queries (reads) and window-slides (writes) is required by data stream systems in order to allow prioritized query scheduling and improve the freshness of answers. We prove that the traditional notion of conflict serializability is insufficient in this context and define stronger isolation levels that restrict the allowed serialization orders. We also design and experimentally evaluate a transaction scheduler that efficiently enforces the new isolation levels by taking advantage of the access patterns of sliding window queries.
Date December 2005
Report On Concurrency Control in Sliding Window Queries over Data (PDF)
CS-2005-29
Title InterJoin: Exploiting Materialized Views in XML Query Processing
Authors Derek Phillips, Ning Zhang, Ihab F. Ilyas, and M. Tamer Özsu
Abstract Efficient processing of XPath expressions is an integral part of XML data query processing. Exploiting materialized views in query processing can significantly enhance query processing performance. We propose a novel view definition that allows for intermediate (structural) join results to be stored and reused in XML query evaluation. Unlike current XML view proposals, our views do not require navigation in the original document or path-based pattern matching. Hence, they can be evaluated significantly faster and can be more easily costed as part of a query plan.

In general, current structural joins cannot exploit views efficiently when the view definition is not a prefix (or a suffix) of the XPath query. To increase the applicability of our proposed view definition, we propose a novel physical structural join operator called InterJoin. The InterJoin operator allows for joining interleaving XPath expressions, e.g., joining //A//C with //B to evaluate //A//B//C. InterJoin treats structural joins as a logical operator, giving more join alternatives in XML query plan.

We propose several physical implementations for InterJoin, including a technique to exploit spatial indexes on the inputs. We give analytic cost models for the implementations so they can be costed in an existing join-based XML query optimizer. Experiments on real and synthetic XML data show significant speed-ups of up to 200% using InterJoin, as well as speed-ups of up to 400% using our materialized views.

Date October 2005
Report InterJoin: Exploiting Materialized Views in XML Query Processing (PDF)
CS-2005-31
Title Indexing Time vs. Query Time Trade-offs in Dynamic Information Retrieval Systems
Authors Stefan B�ttcher and Charles L. A. Clarke
Abstract We examine issues in the design of fully dynamic information retrieval systems with support for instantaneous document insertions and deletions. We present one such system and discuss some of the major design decisions. These decisions affect both the indexing and the query processing efficiency of our system and thus represent genuine trade-offs between indexing and query processing performance.

Two aspects of the retrieval system -- fast, incremental updates and garbage collection for delayed document deletions -- are discussed in detail, with a focus on the respective trade-offs. Depending on the relative number of queries and update operations, different strategies lead to optimal overall performance. Special attention is given to a particular case of dynamic search systems -- desktop and file system search. As one of the main results of this paper, we demonstrate how security mechanisms necessary for multi-user support can be extended to realize efficient document deletions.

Date October 2005
Report Indexing Time vs. Query Time Trade-offs in Dynamic Information Retrieval Systems (PDF)
CS-2005-32
Title Memory Management Strategies for Single-Pass Index Construction in Text Retrieval Systems
Authors Stefan Büttcher and Charles L. A. Clarke
Abstract Many text retrieval systems construct their index by accumulating postings in main memory until there is no more memory available and then creating an on-disk index from the in-memory data. When the entire text collection has been read, all on-disk indices are combined into one big index through a multiway merge process.

This paper discusses several ways to arrange postings in memory and studies the effects on memory requirements and indexing performance. Starting from the traditional approach that holds all postings for one term in a linked list, we examine strategies for combining consecutive postings into posting groups and arranging these groups in a linked list in order to reduce the number of pointers in the linked list. We then extend these techniques to compressed posting lists and finally evaluate the effects they have on overall indexing performance for both static and dynamic text collections. Substantial improvements are achieved over the initial approach.

Date October 2005
Report Memory Management Strategies for Single-Pass Index Construction in Text Retrieval Systems (PDF)
CS-2005-33
Title Adding variability management to UML-based software product lines
Authors Edson Alves de Oliveira Junior a, Itana M. S. Gimenes a,*, Elisa H. M. Huzita a, José C. Maldonado b, Paulo Alencar c a Departamento de Informática, Universidade Estadual de Maringá (UEM), Maringá, Paraná, Brazil b Departamento de Ciências de Computação e Estatística, Universidade de São Paulo (USP), São Carlos, São Paulo, Brazil c Computer System Group, School of Computer Science, University of Waterloo, Waterloo, Canada
Abstract The software product line (PL) approach promotes the generation of specific products from a set of core assets for a given domain. This approach is applicable to domains in which products have well-defined commonalities and variation points. Variability management is concerned with the management of the differences between products throughout the PL lifecycle. This paper presents a UML-based process for variability management that allows identification, representation and delimitation of variabilities; and, identification of mechanisms for variability implementation. The evaluation of the process was carried out as a case study within the context of an existing PL for the Workflow Management System (WfMS) domain. The results have shown that the proposed process includes well-defined control mechanisms that increase the possibility of identifying and tracing variabilities.
Date October 2005
CS-2005-34
Title Inferring a Serialization Order for Distributed Transactions
Authors Khuzaima Daudjee and Kenneth Salem
Abstract

Data partitioning is often used to scale-up a database system. In a centralized database system, the serialization order of commited update transactions can be inferred from the database log. To achieve this in a shared-nothing distributed database, the serialization order of update transactions must be inferred from multiple database logs. We describe a technique to generate a single stream of updates from logs of multiple database systems. This single stream represents a valid serialization order of update transactions at the sites over which the database is partitioned.

Date February 2007
Report Inferring a Serialization Order for Distributed Transactions (PDF)
CS-2005-35
Title Adaptive Searching in Succinctly Encoded Binary Relations and Tree-Structured Documents
Authors Jérémy Barbay, Alexander Golynski, J. Ian Munro, and S Srinavasa Rao
Abstract Abstract. This paper deals with succinct representations of data types motivated by applications in posting lists for search engines, in querying XML documents, and in the more general setting (which extends XML) of multi-labeled trees, where several labels can be assigned to each node of a tree. To find the set of references corresponding to a set of keywords, one typically intersects the list of references associated with each keyword. We view this instead as having a single list of objects [n] = {1, . . . , n} (the references), each of which has a subset of the labels [σ] = {1, . . . , σ} (the keywords) associated with it. We are able to find the objects associated with an arbitrary set of keywords in time O(δk lg lg σ) using a data structure requiring only t(lg σ+o(lg σ)) bits, where δ is the number of steps required by a non-deterministic algorithm to check the answer, k is the number of keywords in the query, σ is the size of the set from which the keywords are chosen, and t is the number of associations between references and keywords. The data structure is succinct in that it differs from the space needed to write down all t occurrences of keywords by only a lower order term. An XML document is, for our purpose, a labeled rooted tree. We deal primarily with “non-recursive labeled trees”, where no label occurs more than once on any root to leaf path. We find the set of nodes which path from the root include a set of keywords in the same time, O(δk lg lg σ), on a representation of the tree using essentially minimum space, 2n + n(lg σ +o(lg σ)) bits, where n is the number of nodes in the tree. If we permit nodes to have multiple labels, this space bound becomes 2n + t(lg σ + o(lg σ)) bits, that is the information theoretic lower bound for an ordinal tree (a node can have an arbitrary number of children ordered from left to right) plus that for the multiple labeling, where t is the total number of labels assigned. In proving those results, we consider two data-structures if independant interest: we give an encoding for σ by n boolean matrices, using optimal space and supporting in time O(lg lg σ) the operators access (the value at the intersection of a row and a column) rank (how many matches occur in this row to the left of this entry, or how many are in this column and above), and select (find the r-th match in this row, or in this column); and we give an encoding for labeled trees of n nodes and σ labels, using optimal space and supporting in time O(lg lg σ) the labeled based operator labeltree desc(a, x), which finds the first descendant of x labeled a. Keywords: succinct data-structures, labeled trees, multi-labeled trees, conjunctive queries, intersection problem, opt-threshold set.
Date November 2005