1998 technical reports

CS-98-31
Title	Quickly Excluding K(2,r) from Planar Graphs
Authors	D.M. Thilikos
Abstract	We prove that any planar graph that does not contain $K_{2,r}$ as a minor has treewidth $\leq r+2$.
Date	December 1998
Comments	Submitted to: Journal of Combinatorial Theory Series B.
Report	Quickly Excluding K(2,r) from Planar Graphs (PDF)					Compressed PostScript: Quickly Excluding K(2,r) from Planar Graphs (PS.Z)

CS-98-30
Title	Convergence of Lattice and PDE Methods for Pricing Asian Options
Authors	P.A. Forsyth, K.R. Vetzal and R. Zvan
Abstract	In a recent article, the authors stated that the lattice based Forward Shooting Grid (FSG) method was convergent for Asian options even if nearest lattice point interpolation is used, and independent of any relationship between the grid quantization parameter (for the spacing of the nodal averages), and the timestep size. However, a more detailed analysis of the propagation of interpolation error reveals a problem. A worst case error analysis shows that the error may be large as the number of timesteps becomes large. We demonstrate that the worst case error is indeed attained in some numerical examples. In contrast, the Hull and White method is convergent provided that the node spacing in the average direction is selected appropriately as $\Delta t \rightarrow 0$. It is a also a straightforward matter to show that partial differential equation (PDE) based methods are also convergent. Numerical examples comparing convergence for all three techniques are presented.
Date	December 1998
Comments	This paper has been submitted to mathematical finance.
Report	CS-98-30 (PDF)			Compressed PostScript: CS-98-30 (PS.Z)

CS-98-29
Title	On the Scalability of Monitoring and Debugging Distributed Computations: Vector-Clock Size
Authors	P.A.S. Ward
Abstract	The vector-clock size necessary to characterize causality in an arbitrary distributed computation is equal to the number of processes in that computation. However, in practice the vector-clock size necessary may be much smaller. The vector-clock size is not strictly bounded by the number of processes, but rather by the dimension of the partial order induced by the computation. While the dimension theoretically can be as large as the number of processes, in practice we have found it to be much smaller. In this paper we quantify exactly how small the dimension, and hence the vector-clock size required, is over a range of typical distributed computations. We have found that typical distributed computations, with as many as 300 processes, have dimension less than 10. In order to achieve this quantification we developed several novel theorems and algorithms which we also describe.
Date	December 1998
Comments	Published in ICPP'99 under the title: An Offline Algorithm for Dimension-Bound Analysis
Report	On the Scalability of Monitoring and Debugging Distributed Computations: Vector-Clock Size (PDF)			Compressed PostScript: On the Scalability of Monitoring and Debugging Distributed Computations: Vector-Clock Size (PS.Z)

CS-98-28
Title	Stratified Wavelength Clusters for Efficient Spectral
Authors	G.F. Evans and M.D. McCool
Abstract	Wavelength dependent Monte Carlo rendering can correctly and generally capture effects such as spectral caustics (rainbows) and chromatic abberation. It also improves the colour accuracy of reflectance models and of illumination effects such as colour bleeding and metamerism. The stratified wavelength clustering (SWC) strategy carries several wavelength stratified radiance samples along each light transport path. The cluster is split only if a specular refraction at the surface of a dispersive material is encountered along the path. The overall efficiency of this strategy is high since the fraction of clusters that need to be split in a typical scene is low, and also because specular dispersion tends to decrease the source colour variance, offseting the increased amortized cost of generating each path.
Date	November 1998
Comments	This report was typeset in LaTeX, but has been converted to PostScript at 600dpi. It contains colour images, but can be printed on a high-resolution black and white laser printer with halftoning. Obviously colour is important in this work, but only the last page need be printed in colour. There is a website at http://www.cgl.uwaterloo.ca/Projects/rendering/ which contains more information on this and other rendering projects in CGL. This report has been submitted to Graphics Interface '99.
Report	Stratified Wavelength Clusters for Efficient Spectral (PDF)			Compressed PostScript: Stratified Wavelength Clusters for Efficient Spectral (GZIP)

CS-98-27
Title	Two-Phase Clustering of Large Datasets
Authors	N.J.-K. Khazjvi and K. Salem
Abstract	This paper addresses the problem of single-pass clustering of large, multi-dimensional datasets. A general two-phase approach to this problem is defined. In the first phase, an in-memory summary of the data set is constructed using a single pass through the data. The second phase then uses any clustering algorithm to cluster the summary data. Most clustering algorithms for large datasets are two-phase. Two techniques for producing in-memory summaries are compared. The first is based on a previously-proposed data structure called a cluster-feature tree, or CF-tree. The second, simpler technique uses random sampling. Experiments with skewed artificial data sets are used to show that sampling produces clusters that are at least as accurate as those produced with CF-trees, and that sampling is much faster. An important issue when sampling is the sample size, which determines the amount of memory required during the first phase of two-phase clustering. It also controls a trade-off between clustering time and and clustering accuracy. This tradeoff is quantified, and guidelines for sample size selection are suggested.
Date	November 1998
Report	Two-Phase Clustering of Large Datasets (PDF)			Compressed PostScript: Two-Phase Clustering of Large Datasets (PS.Z)

CS-98-26
Title	A Computational Model of lexical cohesion analysis and its application to the evaluation of text coherence
Authors	Marzena H. Makuta
Abstract	In this thesis, we discuss how to apply the analysis of lexical cohesion in texts to the problem of evaluating text coherence. We have two objectives. The first one is to create a computational model to represent the lexical cohesion of a given text. In order to store this information we design a new data structure — the lexical graph — with lexical items as nodes and lexical relations between those items, such as synonymy, represented as arcs. This structure is particularly suitable for short texts. For longer texts, we propose a different but related data structure, the collapsed lexical graph, with paragraphs as nodes and lexical bonds as arcs. In addition, we discuss the role of a domain-specific thesaurus as a resource for determining the lexical relations for the representation. Next, we show how to apply our model for the representation of cohesion to the problem of evaluating text coherence, for texts of arbitrary length. We present hypotheses on how to detect the sites of possible coherence problems based on the cohesion information supplied by our model. We also describe an experiment which we conducted to confirm the validity of our model, comparing the predictions of the model with text evaluations performed by human judges. In addition, we discuss the areas of application for the model, commenting on how detecting sites of possible incoherence can be of value to problems such as text critiquing and second language learning and proposing new improvements to automated procedures such as natural language generation and machine translation. The thesis therefore provides important new research within the field of computational linguistics on how a representation of the cohesion of a text provides an understanding of the coherence of that text.
Report	A Computational Model of lexical cohesion analysis and its application to the evaluation of text coherence (PDF)			Compressed PostScript: A Computational Model of lexical cohesion analysis and its application to the evaluation of text coherence (PS.Z)

CS-98-25
Title	A Computational Model of Turn-taking in Discourse
Authors	T. Donaldson
Abstract	This thesis presents a computational model for turn-taking applicable to dialog-based interactive systems. A general, three-part model of the internal architecture of a turn-taking agent is presented, accounting for why an agent should take a turn, what it should contribute, and when it should act. For deciding what to say, the next-action constraint optimization problem is developed, which allows an agent to order multiple goals using an interruptible local search algorithm sensitive to the time-constraints imposed by dynamic dialog environments. For deciding when to speak, a formal account of turn-taking is given, based on the situation calculus, and accompanied by a state-based algorithm that tells an agent when it should take a turn based in part on the status of its processing. Two prototype implementations of course-advising systems are presented, to illustrate how interactive systems can be analyzed using a checklist for system designers derived from the turn-taking model. In developing a computational model for turn-taking, this thesis shows how to bring models for written discourse one step closer to spoken dialog, and thus advances the state of the art in natural language discourse theory and in natural language generation.
Date	December 1998
Comments	PhD thesis on computational discourse, turn-taking in particular
Report	A Computational Model of Turn-taking in Discourse (PDF)			Compressed PostScript: A Computational Model of Turn-taking in Discourse (PS.Z)

CS-98-24
Title	Multi-Agent Systems for Internet Information Retrieval using Natural Language Processing
Authors	V. Keselj
Abstract	In the context of the vast and still rapidly expanding Internet, the problem of Internet information retrieval becomes more and more important. Although the most popular at the moment, the keyword-based search engine is just one component in a complex software mosaic that needs to be further developed in order to provide a more efficient and scalable solution. This report will argue that the multi-agent approach is a viable methodology for this task. The main issue is how the natural language processing could be used in it, as well as why it should be used. Two implementations and their theoretical foundations are presented: One is the natural language parser generator NL PAGE, which produces parsers in C or Java; and, the other one is the communication part of the multi-agent framework MIN. The higher levels of the framework are also discussed and a simple implementation of a multi-agent system is presented.
Date	September 1998
Comments	This technical report is a reprint of my MMath thesis, finished in September 1997.
Report	Multi-Agent Systems for Internet Information Retrieval using Natural Language Processing (PDF)			Compressed PostScript: Multi-Agent Systems for Internet Information Retrieval using Natural Language Processing (PS.Z)

CS-98-23
Title	User's Manual Elem2 Rule Induction System
Authors	A. An and N. Cercone
Date	September 1998
Report	User's Manual Elem2 Rule Induction System (DOC)

CS-98-22
Title	The Role of Morphology in Machine Translation
Authors	B. Hui
Abstract	Although the study of machine translation has been around for a long time, it has not always taken advantage of the existing linguistic knowledge. This observation leads to the objective of this paper: to apply various aspects of morphology to machine translation (MT). We first review basic approaches in MT and common phenomena in morphological theory. Then we turn to a discussion on four major problems of MT and how morphological analysis can result in more accurate solutions. The problems we study in this paper are the applicability of stemming in MT, the effects of lexical ambiguity, the structure of the lexicon, and the process of lexical choice in generation. A new proposal using cross-language word hierarchies was made to tackle the problem of translational ambiguity in MT. We sampled data from Japanese and Cantonese and illustrated how our approach can be used in understanding and generation. Since the analysis was based on limited data, the results are preliminary and need to be fine tuned before a sound theory can be integrated into an MT system.
Date	July 1998
Report	The Role of Morphology in Machine Translation (PDF)			Compressed PostScript: The Role of Morphology in Machine Translation (PS.Z)

CS-98-21
Title	An efficient algorithm for computing the i'th letter of Phi^n(a)
Authors	J. Shallit and D. Swart
Abstract	Let Sigma be a finite alphabet, and let $Phi:Sigma* ->Sigma* be a homomorphism, i.e., a mapping satisfying Phi(xy) = Phi(x)Phi(y) for all x and y in Sigma*. Let a be a letter in Sigma, and let i <= 1, n <= 0 be integers. We give the first efficient algorithm for computing the i'th letter of Phi^n(a). Our algorithm runs in time polynomial in the size of the input, i.e., polynomial in log n, log i, and the description size of Phi. Our algorithm can be easily modified to give the distribution of letters in the prefix of length i of Phi^n(a). There are applications of our algorithm to computer graphics and biological modelling.
Date	July 1998
Comments	Submitted to: Symposium On Discrete Algorithms (SODA) 1998
Report	An efficient algorithm for computing the i'th letter of Phi^n(a) (PDF)			Compressed PostScript: An efficient algorithm for computing the i'th letter of Phi^n(a) (GZIP)

CS-98-19
Title	Animated Surface Pasting
Authors	C. Tsang
Abstract	Surface pasting is a composition method in which a feature surface is attached to a base surface to provide details on the underlying surface. In computer animation, a lot of time is spent modelling and surface pasting is a modelling method that can quickly and easily model faces and other objects with small details. Thus, surface pasting may be a useful animation technique because it reduces the modelling time. However, it is unclear whether pasted surfaces will have problems such as distortion in animation. In this thesis, surface pasting was combined into the animation process to see how pasted surfaces behave in animation. In general, while surface pasting behaved as desired in animation, there are some situations where it behaves poorly. Most of these bad situations can be avoided or corrected by the animator.
Date	July 1998
Report	Animated Surface Pasting (PDF)			Compressed PostScript: Animated Surface Pasting (PS.Z)

CS-98-17
Title	Planar Drawings of Origami Polyhedra
Authors	E.D. Demaine and M.L. Demaine
Abstract	We present a new infinite class of polyhedra based on a class of origami bases that we have developed. To understand these polyhedra and their underlying bases, we examine drawings of their graphs. We present an elegant linear-time algorithm to find a straight-line planar drawing. It displays a recursive structure in the polyhedra that may lead to interesting fractals. We introduce a ``zoom'' feature that allows one to interactively explore the details of the graph while displaying this recursive structure.
Date	August 1998
Comments	A two-page abstract of this paper appeared in the Proceedings of the 6th Symposium on Graph Drawing, Lecture Notes in Computer Science, volume 1547, Montreal, Quebec, Canada, August 13-15, 1998, pages 438-440. Feel free to contact us.
Report	Planar Drawings of Origami Polyhedra (PDF)			Compressed PostScript: Planar Drawings of Origami Polyhedra (PS.Z)

CS-98-15
Title	Cubic precision Clough-Tocher interpolation
Authors	S. Mann
Abstract	The standard Clough-Tocher split-domain scheme constructs a surface element with quadratic precision. In this paper, I will look at methods for improving the degrees of freedom in Clough-Tocher schemes. In particular, I will discuss modifications to the cross-boundary construction that improve the interpolant from quadratic precision to cubic precision.
Date	July 1998
Report	Cubic precision Clough-Tocher interpolation (PDF)			Compressed PostScript: Cubic precision Clough-Tocher interpolation (PS.Z)

CS-98-14
Title	Multiresolution Curve and Surface Representation by Reversing and Subdivision Rules
Authors	F.F. Samavati and R.H. Bartels
Date	June 1998
Report	Multiresolution Curve and Surface Representation by Reversing and Subdivision Rules (PDF)			Compressed PostScript: Multiresolution Curve and Surface Representation by Reversing and Subdivision Rules (GZIP)

CS-98-13
Title	Reasoning about Non-Key Uniqueness Constraints on Object Schema
Authors	V.L. Khizder and G. Weddell
Abstract	Uniqueness constraints such as keys and functional dependencies in the relational model are a core concept in information systems technology. In this technical report, we identify a boundary between efficient and intractable kinds of uniqueness constraints suitable for object relational data models. The efficient subclass of the constraints is a strict generalization of both keys and relational functional dependencies. We present a complete and efficient procedure that solves the membership problem for this subclass. To motivate our results, we review some applications of our procedure in query optimization and schema evaluation. In addition, we formulate the problem in terms of a description logic (DL). DLs are a family of languages for knowledge representation that have many applications in information systems technology. Thus, in addition to enhancing earlier work on uniqueness constraints, we integrate the results in the larger body of work on DLs.
Date	June 1998
Report	Reasoning about Non-Key Uniqueness Constraints on Object Schema (DOC)			Reasoning about Non-Key Uniqueness Constraints on Object Schema (ZIP)

CS-98-11
Title	Symmetries and First Order ODE Patterns
Authors	E.S. Cheb-Terrab, and A.D. Roche
Abstract	A scheme for determining symmetries for certain families of first order ODEs, without solving any differential equations, and based mainly in matching an ODE to patterns of invariant ODE families, is presented. The scheme was implemented in Maple, in the framework of the {\it ODEtools} package and its ODE-solver. A statistics of the performance of this approach in solving the first order ODE examples of Kamke's book is shown.
Date	April 1998
Comments	Submitted to: Computer Physics Communications
Report	Symmetries and First Order ODE Patterns (PDF)			Compressed PostScript: Symmetries and First Order ODE Patterns (PS.Z)

CS-98-10
Title	Integrating Factors and ODE Patterns
Authors	E.S. Cheb-Terrab and A.D. Roche
Abstract	A systematic algorithm for building integrating factors of the form mu(x,y') or mu(y,y') for non-linear second order ODEs is presented. When such an integrating factor exists, the scheme returns the integrating factor itself, without solving any differential equations. The scheme was implemented in Maple, in the framework of the ODEtools package and its ODE-solver. A comparison between this implementation and other computer algebra ODE-solvers in solving non-linear examples from Kamke's book is shown.
Date	April 1998
Comments	Submitted to: Journal of Symbolic Computation
Report	Integrating Factors and ODE Patterns (PDF)			Compressed PostScript: Integrating Factors and ODE Patterns (PS.Z)

CS-98-06
Title	Shadow Volume Reconstruction
Authors	M.D. McCool
Abstract	Current graphics hardware can be used to generate shadows using either shadow volumes or shadow maps. However, the shadow volume technique requires access to a represention of the scene as a polygonal model, and shadow maps currently require extensions to (and interfere with the use of) texture mapping. We present a hybrid of the shadow map and shadow volume approaches which does not have these difficulties. The scene is rendered from the point of view of the light source and a sampled depth map is recovered. Computer vision techniques are used to reconstruct a minimal shadow volume boundary, after which the shadows can be rendered using only a one bit stencil buffer.
Comments	This report was typeset in LaTeX, but has been converted to PostScript. It contains colour images, but can be printed on a high-resolution black and white laser printer with halftoning. This report has been submitted to the ACM Transactions on Graphics. There is a web site which contains more information on this and other rendering projects in CGL. The pages for this algorithm include slides from a talk and colour images.
Report	No report

CS-98-05
Title	Discrete Parisian and Delayed Barrier Options: A General Numerical Approach
Authors	K.R. Vetzal and P.A. Forsyth
Abstract	In this paper we present a numerical method for the valuation of derivative securities which have a payoff dependent upon the amount of time during the life of the contract that some underlying variable lies within a specified range. We concentrate in particular on the examples of Parisian options and delayed barrier options, but our approach is easily adapted to other cases such as switch options and step options. Available analytic pricing formula are based on the assumption that the underlying variable is monitored continuously. In contrast, we consider discrete (e.g.\ daily or weekly) sampling. Given that path-dependent option values are known to be generally very sensitive to sampling frequency, this is an important advantage of our numerical approach.
Date	March 1998
Report	Discrete Parisian and Delayed Barrier Options: A General Numerical Approach (PDF)			Compressed PostScript: Discrete Parisian and Delayed Barrier Options: A General Numerical Approach (PS.Z)

CS-98-04
Title	Lock-Free Distributed Objects: A Shared Spreadsheet
Authors	C.R. Palmer and G.V. Cormack
Abstract	A Lock-Free Distributed System is a purely-replicated environment (replication with no central authority) in which updates are performed with no inter-object communication. All replicated copies of an object take on a common state when there are no messages in transit. Such a system is often used for groupware applications and operation transformations are a popular technique in this field. We broaden the scope of operation transformations by using a lock-free concurrency system to specify a distributed spreadsheet. The resulting spreadsheet is highly fault tolerant and the interactive performance is independent of the speed and the quality of the underlying network. To specify the concurrent semantics of the spreadsheet, an abstract data type is defined and a general model of a subset of the operations is proposed.
Date	February 1998
Report	Lock-Free Distributed Objects: A Shared Spreadsheet (PDF)			Compressed PostScript: Lock-Free Distributed Objects: A Shared Spreadsheet (PS.Z)

CS-98-03
Title	A Formalization of Shestakov's Decomposition
Authors	J.J. Lou and J.A. Brzozowski
Abstract	We consider two methods for the decomposition of Boolean and multi-valued functions into functions with fewer arguments. Shestakov's method (which we call double decomposition) expresses a function f(x_1,... ,x_n) as a composite function h(g_1(u_1,...,u_r),g(v_1,...,v_s)), where the union of U={u_1,...,u_r} and V={v_1,...,v_s} is the set X={x_1,...,x_n}. The independently developed method of Luba and Selvaraj (which we call single decomposition expresses a function f(x_1,...,x_n) as a composite function h(u_1,...,u_r,g(v_1,...,v_s)). The latter method has been formalized by Brzozowski and Luba using "blankets", which are generalizations of set systems. In this paper, we use the same blanket approach to formalize Shestakov's decomposition method. We compare the two methods, and verify that double decomposition is equivalent to two steps of single decomposition. Recently, Brzozowski and Lou extended the single decomposition methods to multi-valued functions. Using the same approach, we also extend Shestakov's method to the multi-valued case.
Date	February 1998
Report	A Formalization of Shestakov's Decomposition (PDF)			Compressed PostScript: A Formalization of Shestakov's Decomposition (PS.Z)

CS-98-02
Title	Testing for Input Stuck-at Faults in Content-Addressable Memories
Authors	P.R. Sidorowicz and J.A. Brzozowski
Abstract	Results pertaining to testing for input stuck-at faults in a n-word by l-bit static CMOS content-addressable memory (CAM) array are presented. First, a formal approach to modelling CAM cells is developed. The basis of this approach is the mathematical framework proposed by Brzozowski and Jurgensen for testing and diagnosis of sequential machines. Next, an input stuck-at fault model for a CAM is defined and a test of length 7n+2l+5 with 100% fault coverage with respect to this fault model is constructed. This test also detects all the usual cell stuck-at and transition faults. Some design-for-testability (DFT) modifications facilitating a further reduction of this test's length are also proposed. Finally, two other CAM tests are verified with respect to our fault model. Keywords:CMOS, content-addressable, design for testability, fault modelling, stuck-at faults, testing.
Date	April 1998
Comments	This research was supported by the Natural Sciences and Engineering Research Council of Canada under grant No. OGP0000871.
Report	Testing for Input Stuck-at Faults in Content-Addressable Memories (PDF)			Compressed PostScript: Testing for Input Stuck-at Faults in Content-Addressable Memories (PS.Z)