2002 Technical Reports | SCS | UW

[Please remove <h1>]

CS-2002-43
Title	Optimal Spaced Seeds for Finding Homologous Coding Regions
Authors	Brona Brejova and Daniel G. Brown
Abstract	We study the problem of computing optimal spaced seeds for identifying homologous coding DNA sequences in large genomic data sets. We develop two models of DNA sequence alignment in coding regions, and using data sets from human/Drosophila and human/mouse comparisons, we compute optimal spaced seeds using a dynamic programming algorithm. The seeds we identify are more sensitive by far at identifying homologous regions than the seeds from BLAST or from PatternHunter, and also significantly improve on the sensitivity of WABA, which also uses a simple spaced seed. In particular, in human/Drosophila comparisons, we offer an 82% improvement in false negatives over BLAST and a 33% improvement over WABA. Our results offer the hope of improved gene finding due to fewer missed exons in DNA/DNA comparison, and more effective homology search in general.
Date	October 2002
Comments
Report		PostScript	Adobe PDF

CS-2002-42
Title	A High-level Specification Language for Structured Document Transformation
Authors	X. Tang and F. W. Tompa
Abstract	The purpose of this paper is to introduce and study the problem of automatic transformation of structured documents. We consider collections of documents where the instances in each collection share a common structure in the sense that they can all be characterized by grammar rules such as found in a context-free grammar (CFG) or forest-regular grammar (FRG). We extend the notation to a single XML (or SGML) document with accompanying DTD (document type definition) to say that it is structured. As long as documents do not conform to a single universal standard, the data transformation between them remains a problem. Thus in the absence of a universal tag set and schema, structured document transformation is important for XML to serve as the data interchange format for the Web. Recently, W3C proposed XSLT (Extensible Stylesheet Language Transformations) as a transformation language for XML data. This language has considerable computation power. However, it requires detailed and tedious programming to accomplish complex structure transformations. As alternatives, SDT (Syntax Directed Translation) and its extended form TT (Tree Transformation) grammar are widely used to specify transformations of source code in various programming languages, and they have been proposed as specification languages for structured document transformation. These languages are descriptive but have limited expressive power, which makes them unable to specify complex structure transformations. In this paper, we propose an approach based on syntax tree templates. We show that our language is both descriptive and expressive. We also provide algorithms to convert our specification to XSLT for executing the transformation. Based on the algorithms, we present a prototype implementation.
Date	October 2002
Comments
Report			Adobe PDF

CS-2002-41
Title	Wei Wei Zheng and Keith O. Geddes
Authors	Exploiting Fast Hardware Floating Point in High Precision Computation
Abstract	We present an iterative refinement method bases on a linear Newton iteration for solving a particular group of high precision computation problems. Our method generates an initial solution at hardware floating point precision using a traditional method and then repeatedly refines this solution to higher precision, exploiting hardware floating point computation in each iteration. This is in contrast to direct solution of the high precision problem completely in software floating point. Theoretical coast analysis, as well as experimental evidence, shows a significant reduction in computational cost is achieved by the iterative refinement method on this group of problems.
Date	December 2002
Comments
Report		PostScript	Adobe PDF

CS-2002-40
Title	An XQuery Canonical Form and its Translation to Extended Relational Algebra
Authors	H. Zhang and F. W. Tompa
Abstract	As XML becomes more widespread as a standard representation for data, XML-based query languages and their evaluations are increasingly important. For this purpose, several XML based query languages have been proposed, including W3C's XQuery. In this paper, we define a query canonical form which provides a conceptually uniform vision of path expressions, element constructors and FLWR expressions in XQuery. The power of this canonical form is shown by identifying an important subset of XQuery that can be translated to this canonical form. Moreover, this canonical form nicely separates different aspects of an XML query, i.e., structure, navigation, and condition. This property makes it easy to be extended, and a possible extension of the canonical form is presented. Having this canonical form, we present an algorithm to translate from it into an extended relational algebra that includes operators defined for the structured text datatype, and we prove its correctness. This algorithm can be used as the basis of a sound translation from XQuery to SQL, and the starting point for query optimization, which is required for XML to be supported by relational database technology.l
Date	October 2002
Comments	29 pp
Report			Adobe PDF

CS-2002-39
Title	XBench - A Family of Benchmarks for XML DBMSs
Authors	Benjamin Bin Yao, M. Tamer Ozsu and John Keenleyside
Abstract	XML is beginning to be extensively used in various application domains, and as a result, large amounts of XML documents are being generated. Researchers in both industry and academia have proposed a number of approaches to efficiently store, manipulate, and retrieve XML documents. The individual performance characteristics of these approaches as well as the relative performance of various systems is an ongoing concern. The range of XML application and the XML data that they manage are quite varied and no one database schema and workload can properly capture this variety. We propose a family of XML benchmarks, collectively call XBench, to measure and evaluate the performance of different approaches to deal with the management of XML documents. The family is defined according to a classification of applications, and each class has its own database and workload. We discuss the general requirements for an XML DBMS benchmark, followed by a detailed explanation of the XBench, including the methodology of database generation, the workload, and the setup of test environment. A brief discussion of other existing XML benchmarks and comparison among them will be given as well.
Date
Comments	163 pgs
Report		PostScript	Adobe PDF

CS-2002-37
Title	Fraction-free Row Reduction of Matrices of Ore Polynomials
Authors	Berhard Beckermann, Howard Cheng and George Labahn
Abstract	In this paper we give formulas for performing row reduction of a matrix of Ore polynomials in a fraction-free way. The reductions can be used for finding the rank and left nullspace of such matrices. When specialized to matrices of skew polynomials our reduction can be used for computing a weak Popov form of such matrices and for computing a GCRD and an LCLM of skew polynomials or matrices of skew polynomials. The algorithm is suitable for computation in exact arithmetic domains where the growth of coefficients in intermediate computations is a central concern. This coefficient growth is controlled by using fraction-free methods. The known factor can be predicted and removed efficiently.
Date	November 2002
Comments
Report		PostScript	Adobe PDF

CS-2002-36
Title	Optimizing Correlated Path Queries in XML Languages
Authors	Ning Zhang and Tamer Ozsu
Abstract	Path expressions are ubiquitous in XML processing languages such as XPath, XQuery, and XSLT. Expressions in these languages typically include multiple path expressions, some of them correlated. Existing approaches evaluate these path expressions one-at-a-time and miss the optimization opportunities that may be gained by exploiting the correlations among them. In this paper, we address the evaluation and optimization of correlatedpath expressions. In particular, we propose two types of optimization techniques: integrating correlated path expressions into a single pattern graph, and rewriting the pattern graph according to a set of rewriting rules. The first optimization technique allows the query optimizer to choose an execution plan that is impossible by using the existing approaches. The second optimization technique rewrites pattern graphs at a logical leveland produce a set of equivalent pattern graphs from which a physical optimizer can choose given an appropriate cost function. Under certain conditions that we identify, the graph pattern matching-based executionapproach that we propose may be more efficient than the join-based approaches.
Date	November 2002
Comments
Report			Adobe PDF

CS-2002-35
Title	A User Interest Model for Web Page Navigation
Authors	Sule Gunduz and M. Tamer Ozsu
Abstract	Making recommendation requires predicting what is of interest to a user at a specific time. Even the same user may have different desires at different times. This paper concentrates on the discovery and modelling of the user's aggregate interest in a session. This approach relies on the premise that the visiting time of a page is an indicator of the user's interest in that page. The proportion of times spent in a set of pages requested by the user within a single session forms the aggregate interest of that user in that session. We first partition user sessions into clusters such that only sessions which represent similar aggregate interest of users are placed in the same cluster. We employ a according to similar amount of time in similar pages. In particular, we cluster sessions by learning a mixture of Poisson models using Expectation Maximization algorithm. The resulting clusters are then used to recommend pages to a user that are most likely contain the information which is of interest to that user at that time. Although the approach does not use the sequential patterns of transactions, experimental evaluation shows that the approach is quite effective in capturing a Web user's access pattern. The model has an advantage over previous proposals in terms of speed and memory usage.
Date	October 2002
Comments
Report		PostScript	Adobe PDF

CS-2002-33
Title	Reducing the Dimensionality of Plant Spectral Databases
Authors	Ian Bell and Gladimir Baranoski
Abstract	This study investigates the application of Principal Components Analysis (PCA) in the storage and reconstruction of plant spectral data. A new Piecewise PCA approach (PPCA), which takes into account the biological factors that affect the interaction of solar radiation with plants, is also proposed. These techniques are examined through experiments involving the reconstruction of reflectance and transmittance curves for herbaceous and and woody specimens. The spectral data used in these experiments was obtained from the LOPEX (Leaf Optical Properties Experiment) database. The reconstructions were performed aiming at a root mean square error (rmse) lower than 1%. The results of these experiments indicate that PCA can effectively reduce the dimensionality of plant spectral databases from the visible to the infrared regions of the light spectrum, and that the PPCA approach can further maximize the accuracy/cost ratio of the storage and reconstruction of plant spectral reflectance and transmittance data.
Date
Comments
Report		PostScript

CS-2002-32
Title	Symbolic Summation in Maple
Authors	S.A. Abramov, K.O. Geddes, J.J. Carette, H.Q. Le
Abstract	We describe the design and implementation of the Maple toolbox SumTools, a package for computing closed forms of indefinite and definite sums.
Date	December 2002
Comments
Report		PostScript	Adobe PDF

CS-2002-31
Title	Lazy Database Replication with Freshness Guarantees
Authors	K. Daudjee and K. Salem.
Abstract	Lazy replication is a popular technique for improving the performance and availability of database systems. Although there are concurrency control techniques which guarantee serializability in lazy replication systems, these techniques do not provide freshness guarantees. Since transactions may see stale data, they may be serialized in an order different from the one in which they were submitted. Strong serializ- ability avoids such problems, but it is very costly to imple- ment. In this paper, we propose a generalized form of strong serializability that is suitable for use with lazy replication. It has many of the advantages of strong serializability, but can be implemented more efficiently. We show how gener- alized strong serializability can be implemented in a lazy replication system, and we present the results of a simula- tion study that quantifies the strengths and limitations of the approach.
Date	November 2002
Comments
Report			Adobe PDF

CS-2002-30
Title	Streaming MPEG-4 Audio-Visual Objects with Quality Adaptation
Authors	Toufik Ahmed (1),(2), Youssef Iraqi (1), Raouf Boutaba (1) and Ahmed Mehaoua (2)
Abstract	This paper presents an Object-based Quality Adaptation Mechanism (OQAM) for streaming unicast MPEG-4 Audio-Visual content over the Internet. This mechanism dynamically adds and drops MPEG-4 Audio-Visual Objects (AVOs) by using a TCP-Friendly Rate Control (TFRC) mechanism. TFRC adjusts the number of AVOs streamed to meet the need for rapid change in transmission rate caused by network congestion and the need for stable perceptual audio-visual quality. This end-to-end quality adaptation is combined with a Diffserv marking scheme to guarantee AVOs prioritization within the network. Performance evaluation shows that the quality of the received video adapts gracefully to network state and to heterogeneous clients capabilities.
Date	August 2002
Comments
Report			Adobe PDF

CS-2002-29
Title	An Efficient Algorithmic Approach for the Estimation of the Red Edge Position of Plant Leaf Reflectance
Authors	Gladimir V. G. Baranoski and Jon G. Rokne
Abstract	The point of maximum slope on the reflectance spectrum of vegetation between red and near-infrared wavelengths< is known as the red edge position (REP). The REP is stronglycorrelated with foliar chlorophyll content, and hence, it provides a very sensitive indicator for a variety of environmental factors such as vegetation stress, drought and senescence. Due to its importance for the application of inversion procedures, a number of techniques have been developed for determining the REP for foliar spectral reflectance. In this paper a new approach is proposed. It allows an unsupervised estimation of the REP. The accuracy of the new approach is evaluated by comparing REP estimates with values derived from measured spectral data for woody and herbaceous pecies.
Date	August 2002
Report			Adobe PDF	Compressed PostScript

CS-2002-26
Title	A Phase Velocity Analysis of Multigrid Methods for Hyperbolic Equations
Authors	Justin W.L. Wan, Tony F. Chan
Abstract	In this paper, we study the effects of coarse grid correction (CGC) on multigrid convergence for hyperbolic problems in one and two dimensions. We approach this from the perspective of phase velocity, which allows us to exploit the hyperbolic nature of the underlying PDE. In particular, we consider three combination of coarse grid operators and coarse grid solution approaches: (1) Runge-Kutta smoothing CGC, direct discretization, (2) exact coarse grid solve, direct discretization, and (3) Galerkin CGC. For all these approaches, we show that the convergence behavior of multigrid can be precisely described by the phase velocity analysis of the coarse grid correction matrix, and we verify our results by numerical examples in one and two dimensions.
Date	July 2002
Comments	Conference presentation: Householder Symposium XV
Report		PostScript	Adobe PDF

CS-2002-25
Title	Closed form solutions of linear odes having doubly periodic coefficients
Authors	Reinhold Burger, George Labahn, Mark van Hoeij
Abstract	We consider the problem of finding closed form solutions of linear differential equations having doubly periodic coefficients. We give a decision procedure for determining if such equations have doubly periodic solutions and study algorithms for solving such a decision procedure. In addition, in the case of a second order equation we show how to find the general solution to such an ode.
Date	July 2002
Report			Adobe PDF	Compressed PostScript

CS-2002-23
Title	On the Equivalence Between Prony's and Ben-Or's/Tiwari's Methods
Authors	Mark Giesbrecht, George Labahn, Wen-shin Lee.
Abstract	We show the equivalence between the exact \BenTi algorithm and numerical Prony's method. Taking advantage of the recent results in both exact and numerical approaches, we present new algorithms and outline possible developments. Key words:Prony's method, sparse polynomial interpolation, early termination, Ben-Or/Tiwari algorithm, exponential functions.
Date	July 2002
Report		PostScript	Adobe PDF

CS-2002-22
Title	SNAP User's Guide
Authors	Claude-Pierre Jeannerod, George Labahn,
Abstract	In this paper we describe the SNAP (Symbolic-Numeric Algorithms for Polynomials) package for computing with polynomials having inexact coefficients. This package is a first attempt to provide the standard functionalities for inexact polynomials that exist for exact polynomials, including the taking of quotients and remainders, determining if two polynomials are relatively prime and finding greatest common divisors (GCDs). The package is included in the coming release of the MAPLE computer algebra system.
Report			Adobe PDF	Compressed PostScript

CS-2002-21
Title	Virtual Goniophotometric Measurements Protocol
Authors	Aravind Krishnaswamy, Gladimir V. G. Baranoski and Jon G. Rokne
Abstract	Many scattering models have been proposed in the graphics literature. Few of them, however, have been evaluated through comparisons with real measured data. As the demand for plausible and predictable scattering models increases, more effort is focused on performing such comparisons, which require the use of measurement devices. Once the accuracy of a given model is determined, data can be extracted from this model in several dimensions. In this paper we examine the formulation of virtual goniophotometric devices used to evaluate and extract data from scattering models. We discuss implementation issues which affect the reliability of the readings provided by these devices. Our discussion of these issues is supported by experiments whose results are also presented in this paper.
Date
Comments
Report			Adobe PDF	Compressed PostScript

CS-2002-17
Title	State of the Art in the Realistic Simulation of Plant Leaf Venation Systems
Authors	Julia Taylor-Hell and Gladimir Baranoski
Abstract	In order to create a realistic portrayal of plant leaves in computer graphics, the veins on these leaves must be represented. In the computer graphics industry, an artist may draw a branching vein structure as a texture and paste it onto the leaf model. Because natural scenes are so frequently represented, it would be useful to devise an automatic and predictable technique for simulating leaf venation systems and embedding them to the leaf's geometric model. This technical report examines this problem and reviews the state of the art in the simulation of venation systems.
Date	April 2002
Report			Adobe PDF	Compressed PostScript

CS-2002-16
Title	Simulating the Dynamics of the Dancing Lights
Authors	Gladimir Baranoski, Justin Wan, Jon Rokne, Ian Bell
Abstract	The auroral displays, known as the Aurora Borealis and Aurora Australis, are geomagnetic phenomena of impressive visual characteristics and remarkable scientific value. Auroras present a complex behavior that arises from interactions between plasma (hot, ionized gases composed of ions, electrons and neutral atoms) and Earth's electromagnetic fields. In this paper we present a physically-based model to perform 3D visual simulations of auroral dynamics. This model takes into account the physical parameters and processes directly associated with plasma flow. The set of partial differential equations associated with these processes is solved using a practical multigrid algorithm, which can also be applied in the simulation of natural phenomena such as gas, smoke or water flow. In order to illustrate the applicability of our model we provide animation sequences rendered using a distributed forward mapping approach.
Date	May 2002
Report			Adobe PDF	Compressed PostScript

CS-2002-14
Title	A Formal Analysis of the Will-Retire Correctness Statement
Authors	Nancy A. Day, Mark D. Aagaard and Meng Lou
Abstract	We relate two microprocessor correctness statements and show that they are equivalent. The first correctness statement in question uses synchronization at retirement over a series of steps of the implementation and external equality as the required correspondence between states. The second correctness statement is the classic single-step commuting diagram with external equality as the match. We prove that if any microprocessor implementation and specification are shown to satisfy one of these correctness statements, they also satisfy the other correctness statement. This technical report is a continuation of Technical Report 2002-11 and includes little introductory material.
Date	March 2002
Comments
Report		PostScript	Adobe PDF

CS-2002-13
Title	A Collapsing Method for Efficient Recovery of Best Supported Edges in Phylogenetic Trees
Authors	Mike Hu, Paul Kearney
Abstract	We present a novel algorithm, HyperCleaning* for effectively inferring phylogenetic trees. The method is based on the quartet method paradigm and is guaranteed to recover the best supported edges of the underlying phylogeny based on the witness quartet set. This is performed efficiently using a collapsing mechanism that employs memory/time tradeoff to ensure no loss of information. This enables HyperCleaning* to solve the relaxed version of the Maximum-Quartet-Consistency problem feasibly, thus providing a valuable tool for inferring phylogenies using quartet based analysis.
Date	March 2002
Report			Adobe PDF	Compressed PostScript

CS-2002-11
Title	A Mechanized Theory of Microprocessor Correctness Statements
Authors	Nancy A. Day, Mark. D. Aagaard, and Meng Lou
Abstract	Microprocessor verification has become increasingly challenging with the use of optimizations such as out-of-order execution. Because of the complexity of the implementations, a wide variety of microprocessor correctness statements have been proposed and used in verification efforts. In this work, we have mechanized a previously proposed framework for classifying these correctness statements. We have verified the relationships between the different points in the framework, and developed a characterization of the commonly used flushing abstraction function. The relationships between points in the framework are general theorems that provide "verification highways" to connect different correctness statements and provide reusable verification strategies. We have used these highways to determine the precise relationships between top-level correctness statements used in verification efforts.
Date	February 2002
Report			Adobe PDF	Compressed PostScript

CS-2002-10
Title	High-Order Lifting
Authors	Arne Storjohann
Abstract	The well-known technique of adic-lifting for linear-system solution is studied. Some new methods are developed and applied to get algorithms for the following problems over the ring of univariate polynomials with coefficients from a field: rational system solving, integrality certification and determinant/ Smith-form computation. All algorithms are Las Vegas probabilistic.
Date	# issued in February
Comments	Submitted to: ISSAC '02
Report			Adobe PDF	Compressed PostScript

CS-2002-07
Title	Factoring Zero-dimensional Ideals of Linear Partial Differential Operators
Authors	Ziming Li, Fritz Schwarz, Serguei P Tsarev
Abstract	This paper presents an algorithm for factoring a zero-dimensional left ideal in the ring generated by two derivation operators over the field of bivariate rational functions, i.e., factoring a linear homogeneous partial differential system whose coefficients are rational functions, and whose solution space is finite-dimensional over the field of constants. The algorithm computes all the zero-dimensional left ideals containing the given ideal. It generalizes the Beke-Schlesinger algorithm for factoring linear ordinary differential operators, and uses an algorithm for finding hyperexponental solutions of such ideals. Keywords:differential operator, linear partial differential system, zero-dimensional left ideal, factorization.
Date	January 2002
Comments	Has been submitted to ISSAC 2002
Report			Adobe PDF	Compressed PostScript

CS-2002-06
Title	Simplification of Definite Sums of Rational Functions
Authors	H.Q. Le
Abstract	We propose an algorithm for simplification of definite sums of rational functions which, for a given input rational function F(n,k), constructs two rational functions G(n) and T(n,k) such that n / n \ --- \| --- \| \ \| \ \| ) F(n, k) = G(n) + \| ) T(n, k)\| / \| / \| --- \| --- \| k = 0 \ k = 0 / where the degree of the denominator w.r.t. k of T(n,k) is “small”.
Date	# issued in January
Report			Adobe PDF	Compressed PostScript

CS-2002-05
Title	Fraction-free Row Reduction of Matrices of Skew Polynomials
Authors	Bernhard Beckermann, Howard Cheng, and George Labahn
Abstract	We present a new algorithm for row reduction of a matrix of skew polynomials. The algorithm can be used for finding full rank decompositions and other rank revealing transformations of such matrices. In particular these reductions can be applied to problems such as the desingularization of linear recurrence systems and for computing rational solutions of a large class of linear functional systems. The algorithm is suitable for computation in exact arithmetic domains where the growth of coefficients in intermediate computations is a central concern. This coefficient growth is controlled by using fraction-free methods. At the same time the method is fast: for an $m \times s$ matrix of input skew polynomials of degree $N$ with coefficients bounded by $K$ the algorithm has a worst case complexity of $O(m^5 s^4 (N+1)^4 K^2)$ bit operations.
Date	January 2002
Comments	Submitted to: ISSAC 2002
Report		PostScript	Adobe PDF	Compressed PostScript

CS-2002-04
Title	A Modular Greatest Common Divisor Algorithm for Matrix Polynomials
Authors	Howard Cheng and George Labahn
Abstract	In this paper we give a modular algorithm to compute one-sided greatest common divisors for matrix polynomials, improving on the fraction-free algorithm by Beckermann and Labahn. We define lucky homomorphisms for the modular algorithm and give bounds on the coefficients in the results computed. In addition, the greatest common left (right) divisor computed by our algorithm is in column (row) reduced form. For computing a greatest common left divisor of two matrix polynomials of dimensions $m \times n_1$ and $m \times n_2$ having degree $N$ in which all the coefficients of the entries have sizes bounded by $K$, the bit complexity of our algorithm is $O(n(m^2N)^3 K^2)$ where $n = n_1+n_2$. This is a significant improvement over the fraction-free algorithm. Our algorithm also solves the extended one-sided GCD problem, and can be used to transform any matrix polynomial into column reduced form.
Date	January 2002
Comments	Submitted to: ISSAC 2002
Report			Adobe PDF	Compressed PostScript

CS-2002-03
Title	A reduced form for perturbed matrix polynomials
Authors	Claude-Pierre Jeannerod
Abstract	We show that every perturbation of a square matrix polynomial with zero eigenvalues only can be reduced by equivalence, under certain conditions, to a perturbed matrix polynomial whose leading matrix has maximal Smith form. This yields a reduced form for square perturbed matrix polynomials from which one can easily recover all the eigenvalue leading terms whose exponent is the inverse of a positive integer. Key words:matrix polynomials, perturbed eigenvalues, Smith form, Newton diagram.
Date	January 2002
Comments	Also available from the Proceedings of the 2002 International Symposium on Symbolic and Algebraic Computation (ISSAC'02), Lille, France, July 2002.
Report			Adobe PDF (pdf)	Compressed PostScript

CS-2002-02
Title	On integral representation and algorithmic approaches to the evaluation of combinatorial sums
Authors	G.P. Egorychev and E.V. Zima
Abstract	Integral representation is applied to various problems of algorithmic indefinite and definite summation and for generating combinatorial identities. An integral representation approach to rational summation is compared to known algorithmic approaches. It is shown that the integral representation can be used for practical improvements of known summation algorithms. A new solution to Riordan's problem of combinatorial identities classification is presented.
Date	January 2002
Report		PostScript (ps)	Adobe PDF (pdf)	Compressed (.Z)

CS-2002-01
Title	A Graph Unification Machine for N.L. Parsing
Authors	V. Keselj and N. Cercone
Abstract	A simple, novel, and efficient computational model for a graph unification method for NL parsing is presented. We rely the body of existing research on labeled graph unification for natural language parsing. This model offers several advantages including: simplicity, efficiency, and amenability to a low-level, efficient, and straight-forward implementation. A consequence of this is that some earlier considerations with respect to garbage collection and redundant node copying become obsolete. The model uses a novel feature of sub-node structure sharing.
Date	January 2002
Report	C and Javasource		Adobe PDF	Compressed PostScript