2003 Technical Reports | SCS | UW
[Please remove <h1>]
CS-2003-01 |
Title |
Processing Sliding Window
Multi-Joins in Continuous Queries |
Authors |
Lukasz Golab and M. Tamer
Ozsu
|
Abstract |
We study sliding window
multi-join processing in continuous queries over data streams. Several algorithms
are reported for performing continuous, incremental joins, under assumption
that all the sliding windows fit in main memory. The algorithms include
multi way incremental nested loop joins (NLJs) and multi-way incremental
hash joins. We also propose join ordering heuristics to minimize the processing
cost per unit time. We test a possible implementation of these algorithms
and show that as expected, has joins are faster than NLJs for performing
equi-joins, and that the overall processing cost is influenced by the strategies
used to remove expired tuples from the sliding windows |
Date |
February 2003 |
Comments |
|
Report |
Latex |
PostScript |
Adobe
PDF |
Compressed
PostScript
|
CS-2003-04 |
Title |
A Roadmap to a Medeling
Language for Multi-Agent Systems Engineering |
Authors |
Viviane Silva, Carlos
Lucena, Paulo Alencar and Donald Cowan |
Abstract |
In this paper we present a conceptual framework called TAO that organizes
the core concepts related to multi-agent systems and assumes that agents
and objects co-exist as independent and unique abstractions. We then use
the TAO framework to create the multi-agent system modeling language MAS-ML,
which extends UML (Unified Modeling Language). MAS-ML is defined as a conservative
extension of the UML metamodel, and includes agent-related notions that
are part of the TAO conceptual framework while preserving all object-related
concepts, which constitute the UML metamodel. Creating a modeling language
for multi-agent systems is a complex task and we should consider following
a process or roadmap similar to the one that led to the development of the
UML and its various sub-models and supporting facilities.
|
Date |
February 2003 |
Comments |
|
Report |
Latex |
PostScript |
Adobe
PDF |
Compressed
PostScript
|
CS-2003-06 |
Title |
Title: Towards Identifying
Frequent Items in Sliding Windows |
Authors |
David DeHaan, Erik D.
Demaine, Lukasz Golab, Alejandro Lopez-Ortiz and J. Ian Munro |
Abstract |
Queries that return
a list of frequently occurring items are popular in the analysis of data
streams such as real-time Internet traffic logs. While several resultsexist
for computing frequent item queries using limited memory in the infinite
stream model, none
have been extended to the limited-memory sliding window model, which
considers only the
last N items that have arrived at any given time and forbids the storage
of the entire
window in memory. We present several algorithms for identifying frequent
items in sliding windows, both under arbitrary distributions and assuming
that each window conforms to a multinomial distribution. The former is
a straightforward extension of existing algorithms and is shown to work
well when tested on real-life TCP traffic logs. Our algorithms for the
multinomial distribution are shown to outperform classical inference
based on random sampling from the sliding window, but lose their accuracy
as predictors of item frequencies when the underlying distribution is
not multinomial.
|
Date |
March 2003 |
Comments |
|
Report |
Latex |
PostScript |
Adobe
PDF |
Compressed
PostScript
|
CS-2003-10 |
Title |
An Efficient Bounds Consistency
Algorithm for the Global Cardinality Constraint |
Authors |
Claude-Guy Quimper, Peter
van Beek, Alejandro Lopez-Ortiz, Alexander Golynski and Sayyed Bashir Sadjad |
Abstract |
Previous studies have
demonstrated that designing special purpose constraint propagators can
significantly
improve the efficiency of a constraint programming approach. In this paper
we present an efficient algorithm for bounds consistency propagation of
the generalized cardinality constraint "gcc". Using a variety
of benchmark and random problems, we show that our bounds consistency
algorithm is competitive
with and can dramatically outperform existing state-of-the-art commercial
implementations of constraint propagators for the gcc. We also present
a
new algorithm for domain consistency propagation of the gcc which improves
on the worst-case performance of the best previous algorithm for problems
that occur often in applications. |
Date |
February 2003 |
Comments |
|
Report |
Latex |
PostScript |
Adobe
PDF |
Compressed
PostScript
|
CS-2003-21 |
Title |
A Geometric B-Spline Over
the Triangular Domain |
Authors |
Christopher K. Ingram |
Abstract |
For modelling curves,
B-splines [3] are among the most versatile control schemes. However, scaling
this technique to surface patches has proven to be a non-trivial endeavor.
While a suitable scheme exists for rectangular patches in the form of tensor
product B-splines, techniques involving the triangular domain are much less
spectacular.
The current cutting edge in triangular B-splines is the DMS-spline [2].
While the resulting surfaces possess high degrees of continuity, the control
scheme is awkward and the evaluation is computationally expensive. A more
fundamental problem is the construction bears little resemblance to the
construction used for the B-Spline. This deciency leads to the central
idea
of the thesis; what happens if the simple
blending functions found at the heart of the B-Spline construction are
used over higher dimension domains?
In this thesis I develop a geometric generalization of B-Spline curves
over the triangular domain. This construction mimics the control point
blending
that occurs with uniform B-Splines. The construction preserves the simple
control scheme and evaluation of B-Splines, without the immense computational
requirements of DMSsplines. The result is a new patch control scheme, the
G-Patch, possessing C0 continuity between adjacent patches. |
Date |
June 2003 |
Comments |
|
Report |
Latex |
PostScript |
Adobe
PDF |
Compressed
PostScript
|
CS-2003-22 |
Title |
Offset Surface Light
Fields |
Authors |
Jason Ang |
Abstract |
For producing realistic
images, reflection is an important visual effect.
Reflections of the environment are important not only for highly
reflective objects, such as mirrors, but also for more common objects such
as brushed metals and glossy plastics. Generating these reflections
accurately at real-time rates for interactive applications, however, is
a
difficult problem. Previous works in this area have made assumptions that
sacrifice accuracy in order to preserve interactivity.
I will present an algorithm that tries to handle reflection accurately
in
the general case for real-time rendering. The algorithm uses a database
of prerendered environment maps to render both the original object itself
and an additional bidirectional reflection distribution function (BRDF).
The algorithm performs image-based rendering in reflection space in order
to achieve accurate results. It also uses graphics processing unit (GPU)
features to accelerate rendering.
|
Date |
August 2003 |
Comments |
|
Report |
Latex |
PostScript |
Adobe
PDF |
Compressed
PostScript
|
CS-2003-24 |
Title |
Evaluation of DBMSs Using XBench Benchmark |
Authors |
M. Tamer Ozsu and Benjamin B. Yao |
Abstract |
XML support is being added to existing database management systems (DBMSs) and native XML
systems are being developed both in industry and in academia. The individual performance characteristics
of these approaches as well as the relative performance of various systems is an ongoing
concern.
XBench is a family of XML benchmarks which recognizes that the XML data that DBMSs manage
are quite varied and no one database schema and workload can properly capture this variety. Thus,
the members of this benchmark family have been defined for capturing diverse application domains.
In this report we briefly discuss the XBench XML benchmark and report on the relative performance
of three commercial DBMSs: X-Hive, DB2 and SQL Server. We also describe the potential
issues when storing XML documents into relational DBMSs. |
Date |
August 2003 |
Comments |
|
Report |
Latex |
PostScript |
Adobe PDF |
|
CS-2003-25 |
Title |
Investigations in Tree
Locking for Compiled Database Applications |
Authors |
Heng YU and Grant Weddell |
Abstract |
We report on initial
experiments in tree locking schemes for compiled database applications.
Such applications have a repository style of architecture in which a
collection of software modules or subsystems operate on a common database
in terms of a
predefined set of transaction types, and are very often at the core of
embedded systems. Since the tree locking protocol is deadlock free, it
becomes possible to decouple recovery mechanisms from concurrency control,
a property that we believe is critical to the successful deployment of
database technology to this new application area. Our experiments show
that the performance of tree locking can compete with two phase locking
for cases such asmthe above in which a great deal can be known at the time
of system generation about workloads.
|
Date |
September 2003 |
Comments |
|
Report |
Latex |
PostScript |
Adobe
PDF |
Compressed
PostScript
|
CS-2003-26 |
Title |
A Study on Atmospheric
Halo Visualization |
Authors |
Sung Min Hong and Gladimir
Baranoski |
Abstract |
In this report, the
atmospheric halo phenomena have been explained and current trends of
the halo simulation have been described. Then A ray tracer based on the
Monte Carlo method has been presented and demonstrated. The comparison
of the simulation results with the photos taken for the same phenomena
has shown that the ray tracer gives reasonable results. This ray tracer
provides a very useful tool for simulating the halo phenomena and identifying
atmospheric data like proportions of ice crystals and orientations. With
some enhancements of the ray tracer, which are future works, the ray
tracer is expected to give physically correct realistic halo images.
|
Date |
September 2003 |
Comments |
|
Report |
Latex |
PostScript |
Adobe
PDF |
Compressed
PostScript
|
CS-2003-27 |
Title |
Supporting Set-at-a-time
Extensions for XML through DOM |
Authors |
Hai (Helena) Chen, Frank
Tompa |
Abstract |
With the rapid growth
of the web and e-commerce, W3C produced a new standard, XML,
as a universal data representation format to facilitate information interchange
and integration from heterogeneous systems. In order to process XML,
documents need to be parsed and then users can access, manipulate, and
retrieve
the XML data easily. DOM is one of the major application programming interfaces,
which provides the abstract, logical tree structure of an XML document.
Though it is a very promising initiative in defining the standard interface
for XML documents to access all data required by applications, significant benefit
can result if some functions are extended.
In this thesis, we want to support set-at-a-time extensions for XML
through DOM.
Through extending some powerful functions to get summary information from different
groups of related document elements, and filter, extract and transform
this
information
as a sequence of nodes, the extended DOM can reduce the communications overhead
and
response time between the client and the server and therefore provide applications
with more convenience. Our work is to explore the ideas for set-at-a-time processing,
define and implement some suitable methods, and code some query application
examples
query using the DOM and our extensions to make comparisons through a test system.
Finally,
future work will be given. |
Date |
September 2003 |
Comments |
|
Report |
Latex |
PostScript |
Adobe
PDF |
Compressed
PostScript
|
CS-2003-28 |
Title |
On the Relevance of Self-Similarity
in Network Traffic Prediction |
Authors |
Majid Ghaderi |
Abstract |
Self-similarity is
an important characteristic of traffic in high
speed networks which can not be captured by traditional traffic
models. Traffic predictors based on non-traditional long-memory
models are computationally more complex than traditional
predictors based on short-memory models. Even on-line estimation
of their parameters for actual traffic traces is not trivial work.
Based on the observation that the Hurst parameter of real traffic
traces rarely exceeds 0.85, which means that real traffic does not
exhibit strong long-range dependence, and the fact that infinite
history is not possible in practice, we propose to use a simple
non-model-based minimum mean square error predictor.
In this paper, we look at the problem of traffic prediction in the
presence of self-similarity. We briefly describe a number of
short-memory and long-memory stochastic traffic models and talk
about non-model-based predictors, particularly minimum mean square
error and its normalized version. Numerical results of our
experimental comparison between the so-called fractional
predictors and the simple minimum mean square error predictor show
that this simple method can achieve accuracy within $5\%$ of the
best fractional predictor while it is much simpler than any
model-based predictor and is easily used in an on-line fashion.
|
Date |
October 2003 |
Comments |
|
Report |
Latex |
PostScript |
Adobe
PDF |
Compressed
PostScript |
CS-2003-29 |
Title |
On Indexing Sliding Windows
over On-line Data Streams |
Authors |
Lukasz Golab,Shaveen
Garg and M. Tamer Ozsu |
Abstract |
We consider indexing
sliding windows in main memory over on-line data streams. Our proposed
data structures and query semantics are based on a division of the sliding
window into sub-windows. When a new sub-window fills up with newly arrived
tuples, the oldest sub-window is evicted, indices are refreshed, and
continuous queries are re-evaluated to reflect the new state of the window.
By classifying relational operators according to their method of execution
in the windowed scenario, we show that many useful operators require
access to the entire window, motivating the need for two types of indices:
those which provide a list of attribute values and their counts for answering
set-valued queries, and those which provide direct access to tuples for
answering attribute-valued queries. For the former, we evaluate the performance
of linked lists, search trees, and hash tables as indexing structures,
showing that the high costs of maintaining such structures over rapidly
changing data are offset by the savings in query processing costs. For
the latter, we propose novel ways of maintaining windowed ring indices,
which we show to be much faster than conventional ring indices and more
efficient than executing windowed queries without an index.
|
Date |
September 2003 |
Comments |
|
Report |
Latex |
PostScript |
Adobe
PDF |
Compressed
PostScript
|
CS-2003-30 |
Title |
Symbolic Representation
and Retrieval of Moving Object Trajectories |
Authors |
Lei Chen, Vincent Oria, and M. Tamer Ozsu |
Abstract |
Similarity-based retrieval of moving object trajectory is useful
to many applications, such as GPS system, sport and surveillance
video analysis. However, due to sensor failures, errors in
detection techniques, or different sampling rates, noises, local
shifts and scales may appear in the trajectory records. Hence, it
is difficult to design a robust and fast similarity measure for
similarity-based retrieval in a large database. In this paper, we
propose a normalized edit distance (NED) to measure the similarity
between two trajectories. Compared to Euclidean distance, Dynamic
Time Warping (DTW), and Longest Common Subsequences (LCSS), NED is
more robust and accurate for trajectories contain noise and local
time shifting. In order to improve the retrieval efficiency, we
further convert the trajectories into a symbolic representation,
called movement pattern strings, which encode both the movement
direction and the movement distance information of the
trajectories. The distances that are computed in a symbolic space
are lower bounds of the distances of original trajectory data,
which guarantees that no false dismissals will be introduced using
movement pattern strings to retrieve trajectories. Furthermore, we
define a modified frequency distance for frequency vectors that
are obtained from movement pattern strings to reduce the
dimensionality of movement pattern strings and computation cost of
NED. The tests that we conducted indicate that the normalized edit
distance is effective and the matching techniques are efficient.
|
Date |
September 2003 |
Comments |
|
Report |
Latex |
PostScript |
Adobe PDF |
Compressed PostScript
|
CS-2003-33 |
Title |
Generalized Infinite Undo and Speculative User Interfaces |
Authors |
Alexjandro Lopez-Ortiz |
Abstract |
We study the potential benefits of a computations model supporting an unlimited number of undo-like operations. We argue that, when implemented to its full generality this result s in a novel mode of human-computer interaction in which the user session is not linear in time. The model proposed is also more resilient in the presence of user or computer error. This resiliency can be exploited by a speculative user interface (SUI) which guesses user intentions. If the prediction is incorrect, the generalized infinite undo facility provides the ability to extemporaneously roll back the misprediction. We observe that this model is only now becoming a realistic possibility due to the current surplus of available CPU cycles on a modern desktop.
|
Date |
September 2003
|
Comments |
|
Report |
Latex |
PostScript |
Adobe PDF |
|
CS-2003-34 |
Title |
Exploiting Statistics
of Web Traces to Improve Caching Algorithms |
Authors |
Alexander Golynski, Alejandro
L´opez-Ortiz and Ray Sweidan |
Abstract |
Web caching plays
an important role in reducing network traffic,user delays, and server
load.
One important factor in describingthe stream of requests made to a given
server is the
popularity oflinks between accessed pages. For example, the probability
ofrequesting
page A increases if a user makes a request to a pagewhich contains a link
to page A.
Furthermore, this probabilitydepends on the amount of time elapsed since
the request was
made to the page containing the link and on popularity of the link. Inthis
project we (1)
analyze web access logs and determine thefrequency distribution of access
and mean life
expectancy ofcorrelations, (2) propose two new cache replacement policies
thatuse the above
distribution, and (3) evaluate the effectiveness ofthe new policies by
comparing them to
widely-used algorithms suchas GreedyDual-Size (GDS) and GreedyDual-Frequency
(GDSF).
|
Date |
October 2003 |
Comments |
|
Report |
Latex |
PostScript |
Adobe PDF |
Compressed PostScript |
CS-2003-36 |
Title |
Title: Using Word Position
in Documents for Topic Characterization |
Authors |
Authors: Reem K. Al-Halimi,
Frank W. Tompa |
Abstract |
In this report we
show how to use the position of words in a text for characterizing the
topic of the text, and compare our method to measures that use frequency
statistics that are independent of word position. We show that word position
information produces words that are more suited for characterizing topics
and at the same time relies on a vocabulary size that is as little as
10\%$ of the size used by the other measures.
|
Date |
October 2003 |
Comments |
|
Report |
Latex |
PostScript |
Adobe PDF |
Compressed PostScript
|
CS-2003-39 |
Title |
Enforcing Domain Consistency
on the extended Global Cardinality Constraint is NP- Hard |
Authors |
Claude-Guy Quimper |
Abstract |
In a global cardinality
constraint, there is a set of variables X = {x_1, ... x_n} and a set
of values D. Each variable x_i is associated to a domain dom(x_i) contained
in D and each value v in D is associated to a cardinality set K(v). An
assignment satisfies the extended global cardinality constraint (extended-GCC)
if each variable x_i is instantiated to a value in its domain dom(x_i)
and if each value v in D is assigned to k variables for some k in K(v).
Extended-GCC differs from normal GCC in the fact that its sets of cardinality
K(v) can be any set of values. In normal GCC these cardinality sets are
restricted to intervals. We prove that enforcing domain consistency on
the extended-GCC is NP-hard.
|
Date |
November 2003 |
Comments |
|
Report |
Latex |
PostScript |
Adobe
PDF |
Compressed PostScript
|
CS-2003-41 |
Title |
Call Admission Control
for Voice/Data Integration in Broadband Wireless Networks |
Authors |
Majid Ghaderi and Raouf
Boutaba |
Abstract |
This paper addresses bandwidth allocation for an integrated
voice/data broadband mobile wireless network. Specifically, we
propose a new admission control scheme called EFGC, which is an
extension of the well-known fractional guard channel scheme
proposed for cellular networks supporting voice traffic. The main
idea is to use two acceptance ratios, one for voice calls and the
other for data calls in order to maintain the proportional service
quality for voice and data traffic while guaranteeing a target
handoff failure probability for voice calls. We describe two
variations of the proposed scheme: EFGC-REST, a conservative
approach which aims at preserving the proportional service quality
by sacrificing the bandwidth utilization; and EFGC-UTIL, a greedy
approach which achieves higher bandwidth utilization at the
expense of increasing the handoff failure probability for voice
calls. Extensive simulation results show that our schemes satisfy
the hard constraints on handoff failure probability and service
differentiation while maintaining a high bandwidth utilization.
|
Date |
November 2003 |
Comments |
|
Report |
Latex |
PostScript |
Adobe PDF |
Compressed PostScript |
CS-2003-45 |
Title |
Revisiting the Foundations of Subsurface Scattering |
Authors |
Gladimir V. G. Baranoski, Aravind Krishnaswamy and Bradley Kimmel |
Abstract |
Despite the significant advances in rendering, we are far from being able to automatically generate realistic and predictable images of organic materials such as plant and human tissues. Creating convincing pictures of these materials is usually accomplished by carefully adjusting rendering parameters. A key issue in this context is the simulation of subsurface scattering. Current algorithmic models usually rely on scattering approximations based on the use of phase functions, notably the Henyey-Greenstein phase function and its variations, which were not derived from biophysical principles of any organic material, and whose parameters have no biological meaning. In this report, we challenge the validity of this approach for organic materials. Initially, we present an original chronology of the use of these phase functions in tissue optics simulations, which highlights the pitfalls of this approach. We then demonstrate that a significant step toward predictive subsurface scattering simulations can be given by replacing it by a more efficient and accurate data oriented approach. Our investigation is supported by comparisons involving the original measured data that motivated the application of phase functions in tissue subsurface scattering simulations. We hope that the results of our investigation will help strengthen the biophysical basis required for the predictive rendering of organic materials. |
Date |
December 2003
|
Comments |
|
Report |
Latex |
PostScript |
Adobe PDF |
Compressed PostScript
|