CS-2005-01 | ||||
---|---|---|---|---|
Title | Index-Trees for Descendant Tree Queries on XML documents | |||
Authors | Jérémy Barbay | |||
Abstract | We propose an index structure and an algorithm to answer Descendant Tree queries, the subset of XPath queries limited to the descendant axis, in computational models where elements can be accessed in non-sequential order. The indexing scheme uses classical labeling techniques, but structurally represents the ancestor-descendant relationships between nodes of same type, in order to allow faster searches. The algorithm solves XPath location path queries along the descendant axis in a holistic way, looking at the whole query rather than just concentrating on individual components, as opposed to “set-at-a-time” strategies. Using an adaptive analysis, we prove tight bounds on the worst case complexity in the comparison model, and a reasonable upper bound on the worst case complexity in the hierarchical memory model, which accounts for the number of I/O transfers at any fixed level. Keywords: XPath location paths; comparison model; holistic; cache-oblivious; adaptive. | |||
Date | January 2005 | |||
Report | Index-Trees for Descendant Tree Queries on XML documents (PS) | Index-Trees for Descendant Tree Queries on XML documents (PDF) |
CS-2005-02 | ||||
---|---|---|---|---|
Title | Stereo Rendering: An Overview | |||
Authors | Jamie Brettle amd Gladimir Baranoski | |||
Abstract | Humans visually perceive the world using a variety of depth cues. Stereopsis is an example of such a depth cue, and it is directly associated with the human binocular visual system. Different computer graphics approaches can be used to take advantage of stereopsis in order to enable a true 3D perception of virtual environments. In this report, we briefly review computer graphics hardware and software tools designed for this purpose as well as optimization techniques that can be applied to speed up the synthesis of stereo images. | |||
Date | December 2005 | |||
Report | Stereo Rendering: An Overview (PDF) |
CS-2005-03 | ||||
---|---|---|---|---|
Title | A new formulation for haplotype inference by pure parsimony | |||
Authors | Daniel G. Brown, Ian M. Harrower | |||
Date | January 2005 | |||
Report | A new formulation for haplotype inference by pure parsimony (PDF) |
CS-2005-04 | ||||
---|---|---|---|---|
Title | Building an Embedded Control Program Workload | |||
Authors | Heng Yu and Grant Weddell | |||
Abstract | A compiled database application has a repository style of architecture in which a collection of modules operate on a common database in main memory through a set of predefined transaction types. To conduct simulated experiments to evaluate the performance of locking techniques for compiled database application, we need to design a workload to approximate the data processing in main memory. We choose the open-source MINIX operating system as our study case, and select a set of system calls as predefined transaction types. We map the kernel data structures to relational tables, and the source code in C programming language to C code with embedded SQL statements (C/SQL code). Furthermore, the C/SQL code are transformed to the finite state machines of the transaction types, and the costs of states and transitions are extracted from the original C code in terms of numbers of machine instructions. Thus the workload is built up. We also discuss how to model the costs of locking/unlocking operations in our workload. | |||
Date | February 2005 | |||
Report | Building an Embedded Control Program Workload (PDF) |
CS-2005-05 | ||||
---|---|---|---|---|
Title | Empirical Analysis on the Geriatric Care Data Set Using Rough Sets Theory | |||
Authors | Jiye Li and Nick Cercone | |||
Abstract | A well known problem for association rules generation is that too many rules are generated, and it is difficult to determine manually which rules are more useful, interesting and important. In our study of using rough sets theory to improve the utility of association rules, we propose a new rule importance measure to select the most appropriate rules. In order to explore the application of the proposed method to large data sets, we perform the experiments on a Geriatric Care data set. | |||
Date | February 2005 | |||
Report | Empirical Analysis on the Geriatric Care Data Set Using Rough Sets Theory (PDF) |
CS-2005-06 | ||||
---|---|---|---|---|
Title | Query-Based Approach to Workflow Process Dependency Analysis | |||
Authors | Weizhen Dai and H. Doninic Covvey | |||
Abstract |
Dependency analysis is important in all of the stages of workflow processes. An appropriate analysis will help us to identify the potentially affected entities if changes occur. In this technical report we thoroughly analyze the critical entities of workflow processes and their dependency relationships, and propose a multi-dimensional dependency model, that includes routing dependency, data dependency and role dependency. We further build a knowledge base to represent the multi-dimensional dependency model using the logic programming language Prolog. We then develop a set of query rules that can be applied to the well-defined knowledge base at both activity and process levels to retrieve the potentially affected entities. Finally we use a case study of workflow processes in the healthcare domain to show how our dependency analysis methodology works. | |||
Date | October 2005 | |||
Report | Query-Based Approach to Workflow Process Dependency Analysis (PDF) |
CS-2005-09 | ||||
---|---|---|---|---|
Title | The complexity of Domino Tiling | |||
Authors | Therese Biedl | |||
Abstract |
In
this
paper,
we
study
the
problem
of
how
to
tile
a
layout
with
dominoes.
For
non-coloured
dominoes,
this
can
be
determined
easily
by
testing
whether
the
layout
graph
has
a
perfect
matching.
We
study
here
tiling
with coloured dominoes, where colours of adjacent dominoes must match. It was known that this problem is NP-hard when the layout graph is a tree. We first strengthen this NP-hardness result in two ways: (1) we can use a path instead of a tree, or (2) we can force that exactly all given dominoes are used. However, both these reductions (as well as the existing one) use an unbounded numbers of colours, which is not realistic for domino tiling. As our main interest, we hence study domino tiling with a constant number of colours. We show that this is NP-hard even with three colours. | |||
Date | February 2005 | |||
Report | The complexity of Domino Tiling (PS) | The complexity of Domino Tiling (PDF) |
CS-2005-10 | ||||
---|---|---|---|---|
Title | Indexing the Results of Sliding Window Queries | |||
Authors | Lukasz Golab, Piyush Prahladkaz and M. Tamer Ozsu | |||
Abstract | A popular method of bounding the memory requirements of queries over data streams is to use a sliding window, where old data are continuously removed as new data arrive. One problem that has not been addressed previously concerns indexing the results of sliding window queries. This is a noteworthy problem because data stream systems often materialize common sub-expressions or final results of similar queries, therefore it is important to allow the queries efficient access into a relevant subset of the materialized result. In this paper, we design and evaluate indexing methods that take advantage of the temporal order in which old answers expire for efficient maintenance and querying of the results. Our techniques allow sharing of indexed results among similar queries, adapt to fluctuating stream arrival rates, and are experimentally shown to perform updates over twice as fast as the existing sliding window indices. | |||
Date | May 2005 | |||
Report | Indexing the Results of Sliding Window Queries (PDF) |
CS-2005-11 | ||||
---|---|---|---|---|
Title | RDL: A Language for Framework Instantiation Representation | |||
Authors | Toacy C. Oliveira, Paulo S. C. Alencar, Carlos J.P. de Lucena, Donald D. Cowan. | |||
Abstract |
Reusing
software
artifacts
for
system
development
is
showing
increasing
promise
as
an
approach
to
reducing
the
time
and
effort
involved
in
building
new
systems,
and
to
improving
the
software
development
process
and
the
quality
of
its
outcome.
However,
software
reuse
has
an
associated
steep
learning
curve,
since
practitioners
must
become
familiar
with
a
third
party
rationale
for
representing
and
implementing
reusable
assets.
For
this
reason,
enabling
a
systematic
approach
to
the
reuse
process
by
making
software
reuse
tasks
explicit,
allowing
software
frameworks
to
be
instantiated
using
pre-
defined
primitive
and
complex
reuse
operations,
and
supporting
the
reuse
process
in
a
(semi-)automated
way
become
crucial
goals.
In
this
paper,
we
introduce
a
systematic
reuse
approach
and
Reuse
Description
Language
(RDL),
a
language
designed
to
specify
object-oriented
framework
instantiation
processes,
and
an
RDL
execution
environment,
which
is
the
tool
support
for
definition
and
execution
of
reuse
processes
and
framework ♦ Toacy C. Oliveira is with Faculty of Informatics at the Pontifical University Catholic of Rio Grande do Sul, Brazil . His e-mail address is toacy@inf.pucrs.br. Paulo S. C. Alencar and Donald D. Cowan are with the School of Computer Science at the University of Waterloo, Ontario, Canada. Their e-mail addresses are {palencar, dcowan} @csg.uwaterloo.ca respectively. Carlos J. P. de Lucena is with the Computer Science Department at the Pontifical University of Rio de Janeiro Brazil. His e-mail is lucena@inf.puc-rio.br. instantiations that lead to domain-specific applications. We illustrate our approach using JUnit, a framework for testing. | |||
Date | May 2005 | |||
Report | RDL: A Language for Framework Instantiation Representation (PDF) |
CS-2005-12 | ||||
---|---|---|---|---|
Title | xTAO: Enabling a Declarative Approach to the Specification of Multi-Agent Systems | |||
Authors | Toacy C. Oliveira, Paulo Alencar, Don Cowan, Carlos Lucena | |||
Abstract |
Research on software agents has produced a diversity of conceptual models for high-level abstract descriptions of multi-agent systems (MASs). However, it is still difficult and costly for designers that need a unique set of agent modelling features to either develop a new agent modelling language from scratch or undertake the task of modifying an existing language. In addition to the modelling itself, in both cases a significant effort need to be expended in building or adapting tools to support the language. An extensible agent modelling language is crucial to experimenting with and building tools for novel modelling constructs that arise from evolving research. Existing approaches typically support a basic set of modelling constructs very well, but adapt to others poorly. A declarative language such as XML and its supporting tools provides an ideal platform upon which to develop and extensible modelling language for multi-agent systems. In this paper we describe xTAO, an extensible agent modelling language, and also demonstrate its value in the context of a real-world application. | |||
Date | 2005 | |||
Report | xTAO: Enabling a Declarative Approach to the Specification of Multi-Agent Systems (PDF) |
CS-2005-13 | ||||
---|---|---|---|---|
Title | Complexity of Octagonal and Rectangular Cartograms | |||
Authors | T. Biedl and B. Genc ∗ | |||
Abstract | In this paper, we study the complexity of cartograms, i.e., maps where every region should be deformed such that a given area requirement is satisfied. We consider the case where every region is an orthogonal octagon, and show that this problem is NP-hard. From our proof, it also follows that cartograms with rectangular faces are NP-hard if we allow for the existence of a "sea", i.e., a region of arbitrarily high complexity on the outside of the drawing. | |||
Date | December 2005 | |||
Report | Complexity of Octagonal and Rectangular Cartograms (PDF) |
CS-2005-14 | ||||
---|---|---|---|---|
Title | Shortest Path Trees Computation in Dynamic Graphs | |||
Authors | Edward P. F. Chan & Yaya Yang | |||
Abstract | Let G = (V, E,w) be a simple digraph, in which all edge weights are non-negative real numbers. Let G¡¦ be obtained from G by the application of a set of edge weight updates to G. Let s „¡ V , and let Ts and T be a Shortest Path Tree (SPT) rooted at s in G and G , respectively. The Dynamic Shortest Path (DSP) problem is to compute T from Ts. For the DSP problem, we correct and extend a few existing SPT algorithms to handle multiple edge weight updates. We prove that these extended algorithms are correct. The complexity of these algorithms is also analyzed. To evaluate the proposed algorithms, we compare them with the well- known static Dijkstra algorithm. Extensive experiments are conducted with both real-life and artificial data sets. The real-life data are road system graphs obtained from the Connecticut road system and are relatively sparse. The artificial data are randomly generated graphs and are relatively dense. The experimental results suggest the most appropriate algorithms to be used under different circumstances. | |||
Date | March 2005 | |||
Report | Shortest Path Trees Computation in Dynamic Graphs (PDF) |
CS-2005-15 | ||||
---|---|---|---|---|
Title | Optimization and Evaluation of Shortest Path Queries | |||
Authors | Edward P. F. Chan & Heechul Lim | |||
Abstract | We investigate the problem of how to evaluate efficiently a collection of shortest path queries on massive graphs that are too big to fit in the main memory. To evaluate a shortest path query efficiently, we introduce two pruning algorithms. These algorithms differ on the extent of materialization of shortest path cost and on how the search space is pruned. By grouping shortest path queries properly, batch processing improves the performance of shortest path query evaluation. Extensive study is also done on fragment sizes, cache sizes and query types that we show that affect the performance of a disk-based shortest path algorithm. The performance and scalability of proposed techniques are evaluated with large road systems in the Eastern United States. To demonstrate that the proposed disk-based algorithms are viable, we show that their search times are significantly better than that of main-memory Dijkstra's algorithm. | |||
Date | September 2005 | |||
Report | Optimization and Evaluation of Shortest Path Queries (PDF) |
CS-2005-16 | ||||
---|---|---|---|---|
Title | URL-Enhanced Adaptive Page-Refresh Models | |||
Authors | Robert Warren, Dana Wilkinson, Alejandro L'opez-Ortiz | |||
Abstract |
We
study
the
refresh
model
required
to
keep
an
up
to
date
copy
of
a
web
page.
This
has
applications
for
more
efficient
and
frequent
crawling,
in
the
case
of
search
engines,
and
for
higher
hit
rates
for
proxy
servers
with
pre-fetching.
Two
main
models
have
been
proposed
namely
the
uniform
refresh
model
(Cho
and
Garcia-Molina,
2000)
and
the
adaptive
page
refresh
model
(Edwards
et
al.,
2001),
with
some
debate
as
to
the
relative
value
of
each
model.
In this work we show that adaptive page refresh models can be further improved by careful selection of initial page-refresh rates of newly added links as indicated by our page evolution studies showing that page-change rates (and consequently page-refresh rates) are dependent on top-level domain, category and page depth. | |||
Date | May 2005 | |||
Report | URL-Enhanced Adaptive Page-Refresh Models (PDF) |
CS-2005-17 | ||||
---|---|---|---|---|
Title | Vocabulary size and email authentication | |||
Authors | Robert Warren | |||
Abstract | This paper explores the performance of the method proposed by Efron and Thisted to predict vocabulary sizes based on sampled text. The objective of this research is to determine whether this simple and quick test can be used as a coarse indicator of authorship. Three sets of emails, as well as other texts are analyzed in order to collect performance data. The conclusion is that the test is at best a lower bound indicator within the $T<$1.0 region and is not sufficient as an authentication method. | |||
Date | July 2005 | |||
Report | Vocabulary size and email authentication (PDF) |
CS-2005-18 | ||||
---|---|---|---|---|
Title | Title: Dynamic Histograms for Non-Stationary Updates | |||
Authors | Elizabeth Lam and Kenneth Salem | |||
Abstract | In this paper, we address the problem of incrementally maintaining a histogram in response to a non-stationary update process. In relational database systems, this problem can occur whenever relations model time-varying activities. We present a simple update model that is general enough to describe both stationary and non-stationary update processes, and we use it to show that existing histogram maintenance techniques can perform poorly when updates are non-stationary. We describe several techniques for solving this problem, and we use the update model to demonstrate that these techniques can effectively handle a broad range of update processes, including non-stationary ones. | |||
Date | June 2005 | |||
Report | Title: Dynamic Histograms for Non-Stationary Updates (PDF) |
CS-2005-19 | ||||
---|---|---|---|---|
Title | Optimal Bsic Block Instruction Scheduling for Multiple-Issue Processors using Constraint Programming | |||
Authors | Abid M. Malik, Jim McInnes and Peter van Beek | |||
Abstract |
Instruction
scheduling
is
one
of
the
most
important
steps
for
improving
the
performance
of
object
code
produced
by
a
compiler.
A
fundamental
problem
that
arises
in
instruction
scheduling
is
to
find
a
minimum
length
schedule
for
a
basic
block---a
straight-line
sequence
of
code
with
a
single
entry
point
and
a
single
exit
point---subject
to
precedence,
latency,
and
resource
constraints.
Solving
the
problem
exactly
is
NP-complete,
and
heuristic
approaches
are
currently
used
in
most
compilers.
In
contrast,
we
present
a
scheduler
that
finds
provably
{\em
optimal}
schedules
for
basic
blocks
using
techniques
from
constraint
programming.
In
developing
our
optimal
scheduler,
the
keys
to
scaling
up
to
large,
real problems were improvements to the constraint model and to the constraint propagation phases. We experimentally evaluated our optimal scheduler on the SPEC 2000 integer and floating point benchmarks. On this benchmark suite, the optimal scheduler was very robust and scaled to the largest basic blocks---all but a handful of the hundreds of thousands of basic blocks in our benchmark suite could not be solved optimally within a reasonable time limit, and the scheduler was able to routinely solve the largest basic blocks, including basic blocks with up to 2600 instructions. This compares favourably to the best previous exact approaches. | |||
Date | November 2005 | |||
Report | Optimal Bsic Block Instruction Scheduling for Multiple-Issue Processors using Constraint Programming (PDF) |
CS-2005-20 | ||||
---|---|---|---|---|
Title | An Evaluation of Metro Express | |||
Authors | Shahram Esmaeilsabzali, Leonard Wong, and Nancy A. Day | |||
Abstract | We study Express, a system that uses template semantics to map specifications to SMV models. We investigate the efficiency of the generated SMV models. We consider two case studies and compare manually created SMV models with models generated by Express. The generated models are more complex and have larger state spaces, and consequently longer verification times. We also analyze the effect of using different optimization schemes in Cadence SMV. Interestingly, some of the optimizations are more applicable to the generated models and made verification of the generated models outperform the manually created models for some queries. However, considering the best case for each model, the manually created model always outperforms the generated model. We propose some optimizations that could be used to improve dramatically the efficiency of the models generated by Express. We estimate the magnitude of improvements that such optimizations can provide. | |||
Date | December 2005 | |||
Report | An Evaluation of Metro Express (PDF) |
CS-2005-21 | ||||
---|---|---|---|---|
Title | An XML Routing Synopsis in Unstructured P2P Networks | |||
Authors | Qiang Wang, Abhay Kumar Jha, and Tamer Ozsu | |||
Abstract | Many emerging applications that use XML are distributed, usually over large peer-to-peer (P2P) networks on the Internet. The deployment of an XML query shipping system over P2P networks requires a specialized synopsis to capture XML data in routing tables. In this paper, we propose a novel graph-structured routing synopsis, called kd-synopsis, for deployment over unstructured super-peer based P2P networks. This synopsis is based on length-constrained FBsimulation relationship, which allows the balancing of the precision and size of the synopsis according to different space constraints on peers with heterogeneous capacity. We report comprehensive experiments to demonstrate the effectiveness of the kd-synopsis. | |||
Date | August 2006 | |||
Report | An XML Routing Synopsis in Unstructured P2P Networks (PDF) |
CS-2005-22 | ||||
---|---|---|---|---|
Title | XSEED: Accurate and Fast Cardinality Estimation for XPath Queries | |||
Authors | Ning Zhang, M. Tamer Ozsu, Ashraf Aboulnaga, and Ihab F. Ilyas | |||
Abstract |
Cardinality
estimation
is
a
crucial
part
of
a
cost-based
optimizer.
Many
research
efforts
have
been
focused
on
XML
synopsis
structures
of
path
queries
for
cardinality
estimation
in
recent
years.
In
ideal
situations,
a
synopsis
should
provide
accurate
estimates
for
different
types
of
queries
over
a
wide
variety
of
data
sets,
consume a small amount of memory while being able to adjust as memory budgets change, and be easy to construct and update. None of the existing synopsis proposals satisfy all of the above requirements. In this paper, we propose a novel synopsis, XSEED, that is accurate, robust, efficient, and adaptive to memory budgets. We construct an XSEED structure starting from a very small kernel, then incrementally update information of the synopsis. With such an incremental construction, a synopsis structure can be dynamically configured to accommodate different memory budgets. It can also handle updates to underlying XML documents, and be self-tuning by incorporating query feedback. Cardinality estimation based on XSEED can be performed very efficiently and accurately, with our techniques of small footprint and novel recursion handling. Extensive experiments on both synthetic and real data sets are conducted, and our results show that even with less memory, the accuracy of XSEED could achieve an order of magnitude better than that of other synopsis structures. The cardinality estimation time is under 2% of the actual querying time for a wide range of queries in all test cases. | |||
Date | June 2005 | |||
Report | XSEED: Accurate and Fast Cardinality Estimation for XPath Queries (PDF) |
CS-2005-23 | ||||
---|---|---|---|---|
Title | Proceedings of the Doctoral Consortium at the 13th IEEE International Requirements Engineering Conference | |||
Authors | Edited by Nancy A. Day, University of Waterloo | |||
Date | August 2005 | |||
Report | Proceedings of the Doctoral Consortium at the 13th IEEE International Requirements Engineering Conference (PDF) |
CS-2005-24 | ||||
---|---|---|---|---|
Title | Multi column matching for database schema translation | |||
Authors | Robert H. Warren, Dr. Frank Wm. Tompa | |||
Abstract |
We
describe
a
generalised
method
for
discovering
complex
schema
matches
involving
multiple
database
columns.
The
method
does
not
require
linked
data
and
is
capable
of
dealing
with
both
fixed-
and
variable-length
field
columns. This is done through an iterative algorithm which learns the correct sequence of concatenations of column substrings in order to translate from one database to another. We introduce the algorithm along with examples on common database data values and examine its performance on real-world and synthetic data sets. | |||
Date | August 2005 | |||
Report | Multi column matching for database schema translation (PDF) |
CS-2005-25 | ||||
---|---|---|---|---|
Title | Assisting Framework Instantiation: Enhancements to Process-Language-based Approaches | |||
Authors | Marcillio Mendonca, Paulo Alencar, Toacy Oliveira, Donald Cowan | |||
Abstract | Application frameworks have been successfully used as valuable tools to improve software quality while reducing development efforts. Nevertheless, frameworks still face important challenges in order to be widely adopted. In particular, framework instantiation is still a painful task requiring application developers to understand the intricate details surrounding the framework design. Some approaches to alleviate this problem have already been proposed in the literature but they are usually either just a textual cross-referenced document of the instantiation activities or too tied to technology or specific application domains. In this paper, we present the results of our latest investigations to improving our approach to framework instantiation. In particular, we discuss a process language we have developed to guide framework instantiation explicitly, and the most recent updates we have made to improve the language expressiveness. Furthermore, we present a case study used to evaluate our approach and to identify current and future extensions. | |||
Date | August 2005 | |||
Report | Assisting Framework Instantiation: Enhancements to Process-Language-based Approaches (PDF) |
CS-2005-26 | ||||
---|---|---|---|---|
Title | Interface Automata with Complex Actions - Extended Version | |||
Authors | Shahram Esmaeilsabzali, Nancy A. Day, and Farhad Mavaddat | |||
Abstract | Many formalisms use interleaving to model concurrency. To describe some system behaviours appropriately, we need to limit interleaving. For example, in component-based systems, we wish to limit interleaving to force the inputs to a method to arrive together in order. In web services, the arrival of XML messages consisting of multiple simple parts should not be interleaved with the behaviour of another component. We introduce "interface automata with complex actions" (IACA), which add complex actions to de Alfaro and Henzinger's interface automata (IA). A complex action is a sequence of actions that may not be interleaved with actions from other components. The composition and refinement operations are more involved in IACA compared to IA, and we must sacrifice associativity of composition. However, we argue that the advantages of having complex actions make it a useful formalism. We provide proofs of various properties of IACA and discuss the use of IACA for modelling web services. | |||
Date | 2005, Revised May 15, 2006 | |||
Report | Interface Automata with Complex Actions - Extended Version (PDF) |
CS-2005-27 | ||||
---|---|---|---|---|
Title | Toward an understanding of haplotype inference by pure parsimony | |||
Authors | Daniel G. Brown, Ian M. Harrower | |||
Date | January 2005 | |||
Report | Toward an understanding of haplotype inference by pure parsimony (PDF) |
CS-2005-28 | ||||
---|---|---|---|---|
Title | On Concurrency Control in Sliding Window Queries over Data | |||
Authors | Lukasz Golab, Kumar Gaurav Bijay, M. Tamer Özsu | |||
Abstract | Data stream systems execute a dynamic workload of long-running and one-time queries, with the streaming inputs typically bounded by sliding windows. For efficiency, windows may be advanced periodically by replacing the oldest part of the window with a batch of newly arrived data. Existing work on stream processing assumes that a window cannot be advanced while it is being accessed by a query. In this paper, we argue that concurrent processing of queries (reads) and window-slides (writes) is required by data stream systems in order to allow prioritized query scheduling and improve the freshness of answers. We prove that the traditional notion of conflict serializability is insufficient in this context and define stronger isolation levels that restrict the allowed serialization orders. We also design and experimentally evaluate a transaction scheduler that efficiently enforces the new isolation levels by taking advantage of the access patterns of sliding window queries. | |||
Date | December 2005 | |||
Report | On Concurrency Control in Sliding Window Queries over Data (PDF) |
CS-2005-29 | ||||
---|---|---|---|---|
Title | InterJoin: Exploiting Materialized Views in XML Query Processing | |||
Authors | Derek Phillips, Ning Zhang, Ihab F. Ilyas, and M. Tamer Özsu | |||
Abstract |
Efficient
processing
of
XPath
expressions
is
an
integral
part
of
XML
data
query
processing.
Exploiting
materialized
views
in
query
processing
can
significantly
enhance
query
processing
performance.
We
propose
a
novel
view
definition
that
allows
for
intermediate
(structural)
join
results
to
be
stored
and
reused
in
XML
query
evaluation.
Unlike
current
XML
view
proposals,
our
views
do
not
require
navigation
in
the
original
document
or
path-based
pattern
matching.
Hence,
they
can
be
evaluated
significantly
faster
and
can
be
more
easily
costed
as
part
of
a
query
plan.
In general, current structural joins cannot exploit views efficiently when the view definition is not a prefix (or a suffix) of the XPath query. To increase the applicability of our proposed view definition, we propose a novel physical structural join operator called InterJoin. The InterJoin operator allows for joining interleaving XPath expressions, e.g., joining //A//C with //B to evaluate //A//B//C. InterJoin treats structural joins as a logical operator, giving more join alternatives in XML query plan. We propose several physical implementations for InterJoin, including a technique to exploit spatial indexes on the inputs. We give analytic cost models for the implementations so they can be costed in an existing join-based XML query optimizer. Experiments on real and synthetic XML data show significant speed-ups of up to 200% using InterJoin, as well as speed-ups of up to 400% using our materialized views. | |||
Date | October 2005 | |||
Report | InterJoin: Exploiting Materialized Views in XML Query Processing (PDF) |
CS-2005-31 | ||||
---|---|---|---|---|
Title | Indexing Time vs. Query Time Trade-offs in Dynamic Information Retrieval Systems | |||
Authors | Stefan B�ttcher and Charles L. A. Clarke | |||
Abstract |
We
examine
issues
in
the
design
of
fully
dynamic
information
retrieval
systems
with
support
for
instantaneous
document
insertions
and
deletions.
We
present
one
such
system
and
discuss
some
of
the
major
design
decisions.
These
decisions
affect
both
the
indexing
and
the
query
processing
efficiency
of
our
system
and
thus
represent
genuine
trade-offs
between
indexing
and
query
processing
performance.
Two aspects of the retrieval system -- fast, incremental updates and garbage collection for delayed document deletions -- are discussed in detail, with a focus on the respective trade-offs. Depending on the relative number of queries and update operations, different strategies lead to optimal overall performance. Special attention is given to a particular case of dynamic search systems -- desktop and file system search. As one of the main results of this paper, we demonstrate how security mechanisms necessary for multi-user support can be extended to realize efficient document deletions. | |||
Date | October 2005 | |||
Report | Indexing Time vs. Query Time Trade-offs in Dynamic Information Retrieval Systems (PDF) |
CS-2005-32 | ||||
---|---|---|---|---|
Title | Memory Management Strategies for Single-Pass Index Construction in Text Retrieval Systems | |||
Authors | Stefan Büttcher and Charles L. A. Clarke | |||
Abstract |
Many
text
retrieval
systems
construct
their
index
by
accumulating
postings
in
main
memory
until
there
is
no
more
memory
available
and
then
creating
an
on-disk
index
from
the
in-memory
data.
When
the
entire
text
collection
has
been
read,
all
on-disk
indices
are
combined
into
one
big
index
through
a
multiway
merge
process.
This paper discusses several ways to arrange postings in memory and studies the effects on memory requirements and indexing performance. Starting from the traditional approach that holds all postings for one term in a linked list, we examine strategies for combining consecutive postings into posting groups and arranging these groups in a linked list in order to reduce the number of pointers in the linked list. We then extend these techniques to compressed posting lists and finally evaluate the effects they have on overall indexing performance for both static and dynamic text collections. Substantial improvements are achieved over the initial approach. | |||
Date | October 2005 | |||
Report | Memory Management Strategies for Single-Pass Index Construction in Text Retrieval Systems (PDF) |
CS-2005-33 | ||||
---|---|---|---|---|
Title | Adding variability management to UML-based software product lines | |||
Authors | Edson Alves de Oliveira Junior a, Itana M. S. Gimenes a,*, Elisa H. M. Huzita a, José C. Maldonado b, Paulo Alencar c a Departamento de Informática, Universidade Estadual de Maringá (UEM), Maringá, Paraná, Brazil b Departamento de Ciências de Computação e Estatística, Universidade de São Paulo (USP), São Carlos, São Paulo, Brazil c Computer System Group, School of Computer Science, University of Waterloo, Waterloo, Canada | |||
Abstract | The software product line (PL) approach promotes the generation of specific products from a set of core assets for a given domain. This approach is applicable to domains in which products have well-defined commonalities and variation points. Variability management is concerned with the management of the differences between products throughout the PL lifecycle. This paper presents a UML-based process for variability management that allows identification, representation and delimitation of variabilities; and, identification of mechanisms for variability implementation. The evaluation of the process was carried out as a case study within the context of an existing PL for the Workflow Management System (WfMS) domain. The results have shown that the proposed process includes well-defined control mechanisms that increase the possibility of identifying and tracing variabilities. | |||
Date | October 2005 | |||
Report | Adding variability management to UML-based software product lines (PDF) |
CS-2005-34 | ||||
---|---|---|---|---|
Title | Inferring a Serialization Order for Distributed Transactions | |||
Authors | Khuzaima Daudjee and Kenneth Salem | |||
Abstract |
Data partitioning is often used to scale-up a database system. In a centralized database system, the serialization order of commited update transactions can be inferred from the database log. To achieve this in a shared-nothing distributed database, the serialization order of update transactions must be inferred from multiple database logs. We describe a technique to generate a single stream of updates from logs of multiple database systems. This single stream represents a valid serialization order of update transactions at the sites over which the database is partitioned. | |||
Date | February 2007 | |||
Report | Inferring a Serialization Order for Distributed Transactions (PDF) |
CS-2005-35 | ||||
---|---|---|---|---|
Title | Adaptive Searching in Succinctly Encoded Binary Relations and Tree-Structured Documents | |||
Authors | Jérémy Barbay, Alexander Golynski, J. Ian Munro, and S Srinavasa Rao | |||
Abstract | Abstract. This paper deals with succinct representations of data types motivated by applications in posting lists for search engines, in querying XML documents, and in the more general setting (which extends XML) of multi-labeled trees, where several labels can be assigned to each node of a tree. To find the set of references corresponding to a set of keywords, one typically intersects the list of references associated with each keyword. We view this instead as having a single list of objects [n] = {1, . . . , n} (the references), each of which has a subset of the labels [σ] = {1, . . . , σ} (the keywords) associated with it. We are able to find the objects associated with an arbitrary set of keywords in time O(δk lg lg σ) using a data structure requiring only t(lg σ+o(lg σ)) bits, where δ is the number of steps required by a non-deterministic algorithm to check the answer, k is the number of keywords in the query, σ is the size of the set from which the keywords are chosen, and t is the number of associations between references and keywords. The data structure is succinct in that it differs from the space needed to write down all t occurrences of keywords by only a lower order term. An XML document is, for our purpose, a labeled rooted tree. We deal primarily with “non-recursive labeled trees”, where no label occurs more than once on any root to leaf path. We find the set of nodes which path from the root include a set of keywords in the same time, O(δk lg lg σ), on a representation of the tree using essentially minimum space, 2n + n(lg σ +o(lg σ)) bits, where n is the number of nodes in the tree. If we permit nodes to have multiple labels, this space bound becomes 2n + t(lg σ + o(lg σ)) bits, that is the information theoretic lower bound for an ordinal tree (a node can have an arbitrary number of children ordered from left to right) plus that for the multiple labeling, where t is the total number of labels assigned. In proving those results, we consider two data-structures if independant interest: we give an encoding for σ by n boolean matrices, using optimal space and supporting in time O(lg lg σ) the operators access (the value at the intersection of a row and a column) rank (how many matches occur in this row to the left of this entry, or how many are in this column and above), and select (find the r-th match in this row, or in this column); and we give an encoding for labeled trees of n nodes and σ labels, using optimal space and supporting in time O(lg lg σ) the labeled based operator labeltree desc(a, x), which finds the first descendant of x labeled a. Keywords: succinct data-structures, labeled trees, multi-labeled trees, conjunctive queries, intersection problem, opt-threshold set. | |||
Date | November 2005 | |||
Report | Adaptive Searching in Succinctly Encoded Binary Relations and Tree-Structured Documents (PDF) |
CS-2005-36 | ||||
---|---|---|---|---|
Title | Viscoelastic Registration of Medical Images | |||
Authors | Zhao Yi and Justin Wan | |||
Abstract | Since the physical behaviour of many tissues is shown to be viscoelastic, we propose a novel registration technique for medical images via physical principles of viscoelasticity. It allows the template and the target images to differ by large local deformations like viscous fluid and small local deformations like elastic solid. With an extra elastic term introduced, the proposed model constrains the transformation by coupled viscoelastic behaviour and therefore reduces the smearing effect caused by the fluid terms. We apply our approach to both artificial test data and MR brain slices yielding accurate registration results. | |||
Date | December 2005 | |||
Report | Viscoelastic Registration of Medical Images (PDF) |
CS-2005-37 | ||||
---|---|---|---|---|
Title | Image Registration via Particle Movement | |||
Authors | Zhao Yi and Justin Wan | |||
Abstract | Though fluid model offers a good approach to nonrigid registration with large deformations, it suffers from the blurring artifacts introduced by the viscosity term. To overcome this drawback, we present an inviscid model expressed in a particle framework. Our idea is to simulate the template image as a set of particles moving towards the target positions. The proposed model can accommodate both small and large deformations with sharp edges and reasonable displacement maps achieved. Because of its simplicity, the computational cost is much smaller than other nonrigid registration approaches. | |||
Date | December 2005 | |||
Report | Image Registration via Particle Movement (PDF) |