1995 Technical Reports | SCS | UW

[Please remove <h1>]

CS-95-56
Title	Numerical Control Tool Path Generation using Space-Filling Curves and Pixel Models
Authors	D.P. Dragomatz
Abstract	Two methods for generating numerical control tool paths have ap- peared recently, one using pixel models, the other using Hilbert curves. The pixel model approach is able to represent many as- pects of path generation, but requires large amounts of storage for physically large objects represented at high accuracy. The Hilbert curve approach uses the local refinement property of Hil- bert space-filling curves to create an adaptive sampling approach that increases the density of path points only where necessary. In this thesis, I propose and implement a hybrid technique that uses elements of each method, retaining the major benefits of both without incurring the storage costs of the pixel model. The scheme supports the creation of both roughing paths and surface machining paths, but is non-optimal for the later. The addition of containment and exclusion zones to the method leads to a promising paradigm for partitioning object geometry into machin- able regions.
Date	December 1995
Comments	Chapter 2 is a 35-page discussion on issues in tool path generation mostly from a computational point of view. Chapter 3 is background material on space-filling curves and pixel models. Chapter 4 describes the approach and shows some sample paths, including photographs of machined parts. Appendix A is a 36-page classified bibliography of papers on various topics related to NC tool path generation.
Report			PostScript	Compressed PostScript

CS-95-54
Title	Spline Extensions for the MAPLE Plot System
Authors	Wolfgang Heidrich
Abstract	Traditionally computer algebra systems use lines and polygons to represent mathematical functions graphically. While these geometric primitives can easily be rendered on conventional raster graphics hardware, a smooth representation using splines would provide a wider range of tradeoffs between image quality and rendering performance. Since modern computer graphics hardware directly supports rendering of spline objects, their use becomes more and more interesting. In this thesis we examine the possibilities for replacing traditional representations of functions and graphs by spline representations. We describe the use of \BSplines\ for interpolation and approximation, and discuss several approaches for generating parameterizations for these tasks. Finally we present some novel results regarding the use of rational splines for curve and surface fitting.
Report			Adobe PDF	Compressed PostScript

CS-95-53
Title	A topological data structure for hierarchical planar subdivisions
Authors	W. Celes F., L.H. de Figuiredo, M. Gattass, P.C. Carvalho
Abstract	We introduce HPS, a new topological data structure that efficiently represents hierarchies of planar subdivisions, thus providing direct and efficient support for GIS concepts such as abstract generalizations and multi-scale partitions. Unlike previous ad hoc solutions, HPS provides efficient access to adjacency information for each level and across levels, while storing the complete hierarchy in a single data structure, without duplications. HPS also provides topological operators that ensure global consistency. Like all topological data structures, HPS can be used as a framework onto which geometric and attribute information is placed: HPS explicitly handles attributes consistently with modeling, and naturally supports both topological and geometrical multi-resolution representations. We also discuss how some typical applications in GIS, Digital Cartography, and Finite Element mesh generation can be improved with HPS. Keywords:topological data structures, hierarchical modeling, multi-resolution, multi-scale partitions, Geographic Information Systems
Date	December 1995
Comments	This paper has been submitted to Algorithmica (special issue on GIS)
Report			Adobe PDF	Compressed PostScript

CS-95-52
Title	Program Understanding: A Constraint Satisfaction Modeling Framework; Understanding as Plan Recognition
Authors	S. Woods, A. Quilici and Qiang Yang
Abstract	Different program understanding algorithms often use different representational frameworks and take advantage of numerous heuristic tricks. This situation makes it is difficult to compare these approaches and their performance. This paper addresses this problem by proposing constraint satisfaction as a general framework for describing program understanding algorithms, demonstrating how to tranform a relatively complex existing program understanding algorithm into an instance of a constraint satisfaction problem, and showing how this facilitates better understanding of its performance. Plan recognition is the task of interpreting the actions of agents in the environment, in the context of the knowledge we possess about how action occurs in the world, and why. The recognition task involves constructing a mapping, possibly partial, between an existing repository of plan and domain knowledge and a set of dynamic observations of a subset of the actions taken toward a goal. Program understanding can be viewed as a special case of plan recognition, where the task is to recognize the plans programmers have used in constructing a particular piece of legacy source code. However, program understanding differs from generalized plan recognition in that a complete set of action observations is the basis of goal determination. This paper discusses, in detail, how this difference leads to inadequacies in applying typical plan recognition algorithms to program understanding. Program understanding can instead be viewed as a special case of plan recognition which is particularly amenable to constraint satisfaction techniques.
Date	November 1995
Report			Adobe PDF	Compressed PostScript

CS-95-51
Title	Program Understanding as Constraint Satisfaction: Representation and Reasoning Techniques
Authors	S. Woods and Q. Yang
Abstract	The process of understanding a source code in a high-level programming language involves complex computation. Given a piece of legacy code and a library of program plan templates, understanding the code corresponds to building mappings from parts of the source code to particular program plans. These mappings could be used to assist an expert in reverse engineering legacy code, to facilitate software reuse, or to assist in the translation of the source into another programming language. In this paper we present a model of program understanding using constraint satisfaction. Within this model we intelligently compose a partial global picture of the source program code by transforming knowledge about the problem domain and the program itself into sets of constraints. We then systematically study different search algorithms and empirically evaluate their performance. One advantage of the constraint satisfaction model is its generality; many previous attempts in program understanding could now be cast under the same spectrum of heuristics, and thus be readily compared. Another advantage is the improvement in search efficiency using various heuristic techniques in constraint satisfaction.
Date	November 1995
Report			Adobe PDF	Compressed PostScript

CS-95-50
Title	Analysis of Hashing Algorithms and a New Mathematical Transform
Authors	A. Viola
Abstract	The main contribution of this report is the introduction of a new mathematical tool that we call the Diagonal Poisson Transform, and its application to the analysis of some linear probing hashing schemes. We also present what appears to be the first exact analysis of a linear probing hashing scheme with buckets of size b. First, we present the Diagonal Poisson Transform. We show its main properties and apply it to solve recurrences, find inversse relations and obtain several generalizations of Abel's summation formula. We follow with the analysis of LCFS hashing with linear probing. It is known that the Robin Hood linear probing algorith minimizes the variance of the cost of successful searches for all linear probing algorithms. We prove that the variance of the LCFS scheme is within lower order terms of this optimum. Finally, we present the first exact analysis of linear probing hashing with buckets of size b. From the generating function for the Robin Hood heuristic, we obtain exact expressions for the cost of successful searches when the table is full. Then, with the help of Singularity Analysis, we find the asymptotic expansion of this cost up to O((bm)-1), where m is the number of buckets. We also give upper and lower bound when the table is not full. We conclude with a new approach to study certain recurrences that involves truncated exponentials. A new family of numbers that satisfies a recurrence resembling that of the Bernoulli numbers is introduced. These numbers may prove helpful in studying recurrences involving truncated generating functions.
Date	December 1995
Report			Adobe PDF	Compressed PostScript

CS-95-48
Title	Process Spaces
Authors	R. Negulescu
Abstract	This paper introduces process spaces, a unified theory of interacting systems. The main new trait, abstract executions, leads to a simple and general set formalism. For concurrent systems (including digital circuits), process spaces apply to diverse correctness concerns and yield a new classification of liveness and progress faults. The resulting studies of different correctness concerns are decoupled and homogeneous (i.e., they do not interfere with each other and they have the same algebraic structure). Applications to other interacting systems, such as electrical networks and dynamical systems, are also possible. Process spaces have many meaningful properties; here, some results from concurrency theory are generalized and simplified, and some new results are found.
Date	December 1995
Report			Adobe PDF	Compressed PostScript

CS-95-47
Title	Surface intersection using affine arithmetic
Authors	L.H. de Figueiredo
Abstract	We describe a variant of a domain decomposition method proposed by Gleicher and Kass for intersecting and trimming parametric surfaces. Instead of using interval arithmetic to guide the decomposition, the variant described here uses affine arithmetic, a tool recently proposed for range analysis. Affine arithmetic is similar to standard interval arithmetic, but takes into account correlations between operands and sub-formulas, generally providing much tighter bounds for the computed quantities. As a consequence, the quadtree domain decompositions are much smaller and the intersection algorithm runs faster. Keywords: surface intersection, trimming surfaces, range analysis, interval analysis, CAGD
Date	October 1995
Comments	This paper has been submitted to Graphics Interface `96
Report			Adobe PDF	Compressed PostScript

CS-95-46
Title	Transformation of Structured Documents
Authors	E. Kuikka and M. Penttonen
Abstract	Structure definitions of documents have been used successfully for inputting and formatting in text processing systems. This report considers transformations between different representations of structured documents and studies possiblities to extend the use of structure definitions to document transformations and to discover algorithmic methods for carrying out transformations. Documents are presented as parse trees for context-free grammars and transformations are made from parse tree to parse tree. First, the report describes differences of manuscript styles demanded by various scientific journals and presents a declarative classification for structure differences between two parse trees. Second, a set of tree transformation methods are described and their suitability for transformations between documents having a structure difference in each defined class is analyzed. For each class several methods may or must be used and only certain kinds of differences can be managed automatically. Finally, instead of designing a system where a method accommodates for all kinds of differences or where different methods are used in various transformations, the report presents a model for a document transformation system that presents a possibility of using various methods according to differences in document representations. The system is divided two modules. In the first one transformations are made automatically and they do not change the hierarchical structure of a document. In the second one transformations are made semiautomatically or nonautomatically and the hierarchical structure changes. Differences between the existing and the required representation of a document are analyzed and methods selected according to the classified differences.
Date	October 1995
Comments	The later version of this report has been submitted to Eletronic Publishing journal.
Report			Adobe PDF	Compressed PostScript

CS-95-45
Title	Scenario-Based Analysis of Software Architecture
Authors	R.N. Kazman, Gregory Abowd, Len Bass and Paul Clements
Abstract	Software architecture is one of the most important tools for designing and understanding a system, whether that system is in preliminary design, active deployment, or maintenance. Scenarios are important tools for exercising an architecture in order to gain information about a system's fitness with respect to a set of desired quality attributes. This paper presents a set of experiential case studies illustrating the methodological use of scenarios to gain architecture-level understanding and predictive insight into large, real-world systems in various domains. A structured method for scenario-based architectural analysis is presented, using scenarios to analyze architectures with respect to achieving quality attributes. Finally, lessons and morals are presented, drawn from the growing body of experience in applying scenario-based architectural analysis techniques.
Date	October 1995
Report			Adobe PDF	Compressed PostScript

CS-95-44
Title	Depth from Shading as an Attentional Cue in User Surfaces
Authors	S.L. Loop
Abstract	When using a computer, we are engaging in a dialogue with the computer. In order to communicate effectively with the computer, we must be aware of the state of the computer. Therefore, we need our attention drawn to particular areas that may indicate the state. When these areas are not found easily, that is these areas are not conspicuous, then we may sacrifice speed and accuracy resulting in potentially disastrous consequences. It has been known for years that colour is very conspicuous: it is easily noticed and attracts attention. For this reason, colour has been used in the interface to group, discriminate and draw attention. Visual texture, normally studied for its segregating and discriminating properties, is also very conspicuous. Another visual attribute that research has shown to be conspicuous is depth conveyed through shape from shading perception: convex and concave objects are easily discriminated. The thee-dimensional look that shape from shading imparts is becoming more popular in interface design due to its concrete, realistic appeal. However, designers are not making use of the functional benefits, that is the conspicuousness, of depth from shading in the design of the interface. A windowing system is a good example of an interface that needs to present conspicuous information to the user so that the user knows the state of the computer. In this case, so it is known which window is active and can receive input. This thesis empirically supports the idea that depth from shading can be used as a conspicuous attribute in a windowing system to indicate the active window. However, as the depth from shading cue is obscured by othe overlapping windows, the conspicuousness of the cue decreases. Some guidelines for maintaining conspicuity of the depth from shading cue are provided.
Date	October 1995
Comments	The experimental results files before processing are in the 'results.' files. 'results.E1.' = experiment 1 'results.E2.' = experiment 2 'results.E3.' = experiment 3
Report	E1.sl, E2.ed, E2.fg, E2.gv, E2.ib, E2.jw	E2.sl, E2.sm, E2.wc, E3.sl, E3.wc	Adobe PDF	Compressed PostScript

CS-95-41
Title	Searching in Constant Time and Minimum Space
Authors	A. Brodnik
Abstract	This report deals with techniques for minimal space representation of a subset of elements from a bounded universe so that various types of searches can be performed in constant time. In particular, we introduce a data structure to represent a subset of N elements of [0, ..., M-1]$ in a number of bits close to the information- theoretic minimum and use the structure to answer membership queries in constant time. Next, we describe a representation of an arbitrary subset of points on an M x M grid such that closest neighbour queries (under L_1 and L_oo) can be performed in constant time. This structure requires M^2 + o(M^2) bits. Finally, under a byte overlap model of memory we present an M+o(M) bit, constant time solution to the dynamic one-dimensional closest neighbour problem (hence, also union-split-find and priority queue problems) on [0, ..., M-1].
Date	September 1995
Comments	This report is based on the author's PhD thesis. Many results are joint work with J. Ian Munro. Supported in part by the Natural Science and Engineering Research Council of Canada under grant number A-8237 and the Information Technology Research Centre of Ontario
Report			Adobe PDF	Compressed PostScript

CS-95-40
Title	Programming Support for Blossoming: The Blossom Classes
Authors	Wayne Liu
Abstract	A C++ library has been created to facilitate prototyping of curve and surface modeling techniques. The library provides general-purpose blossoming datatypes to support creation of modeling techniques based on blossoming analysis. The datatypes have efficient operations which are generalizations of important CAGD algorithms, and can be used to implement many algorithms. Most importantly, the library is able to inter-operate with user-supplied datatypes or routines to create complex modeling techniques.
Comments	- This Technical report is based on author's Master's thesis. - The code for the library described in this thesis is available by contacting the author.
Report			Adobe PDF	Compressed PostScript

CS-95-39
Title	A Study of Delays and De-Synchronisation in a Multiple-View Direct Manipulation Task
Authors	F. Jaubert
Abstract	Computer Graphics are used in increasingly complex situations, often involving multiple dynamic views. This poses the problem of maintaining proper synchronisation among the various views. The effect of de-synchronisation and delays in command-line and single-view graphical environments has been well studied; this thesis aims to do the same with multiple view environments. To that end, an experimental platform is developed, which requires subjects to complete a placement task in a multiple-view representation of a three-dimensional world. Delays are imposed on the different views, and the effect on the subject's performance (in terms of completion time and number of manipulation operations required) is noted. The results show that delays on the view under manipulation have a definite detrimental effect on all subjects; furthermore, certain subjects are affected by delays on other views as well. To conclude, the experiment reveals that the problem of de-synchronisation in a multiple-view environment is a complex one, requiring further research; several new experiments are suggested to help answer the questions posed in this study.
Date	September 1995
Comments	A Master of Mathematics Thesis
Report			Adobe PDF	Compressed PostScript

CS-95-37
Title	Editable Software Views
Authors	Michael Hardy
Abstract	Understanding software is a very expensive component of software development and maintenance. As a result, tools that assist in the process of software understanding play an important part in reducing software development and maintenance costs. As part of this research, the current literature was examined and it was determined, surprisingly, that very little work had been done in developing tools to assist programmers in understanding source code. Many of the tools presented in the literature suffer from one or more drawbacks that limit their usefulness for a software developer or maintainer. To address some of these limitations, an extensible prototype system that presents extrinsic information about a program in editable source code views was designed and developed. The views are created by embellishing the source code using changes in font family, size, and style, text colour, and text elision. This thesis presents the design and implementation of this system. This research represents a step in the development of tools to assist in software understanding, but it is clear that more effort must be directed towards this area of research.
Report			Adobe PDF	Compressed PostScript

CS-95-35
Title	Improving Depth Perception in 3D Interfaces with Sound
Authors	S. Mereu
Abstract	The ability for users to perceive depth in 3D computer interfaces is essential for these applications to be useful. Due to the 2D nature of the display screen, perceiving three dimensions is not always as easy as it is in the real world. Solutions to the lack of depth cues presented are numerous including stereo glasses and head tracking. These solutions, however, are often too expensive for the general user. Sound on the other hand is relatively inexpensive. Experiments have recently been completed in this area and have shown that sound is an effective depth cue. This report discusses the results and makes suggestions on implementing a sound based application.
Date	August 1995
Report			Adobe PDF	Compressed PostScript

CS-95-32
Title	Relative Liveness: From Intuition to Automated Verification
Authors	R. Negulescu and J.A. Brzozowski
Abstract	We point out deficiencies of previous treatments of liveness. We define a new liveness condition in two forms: one based on finite trace theory, and the other on automata. We prove the equivalence of these two definitions. We also introduce a safety condition and provide modular and hierarchical verification theorems for both safety and liveness. Finally, we present a verification algorithm for liveness.
Date	July 1995
Comments	An extended summary of this report was presented at the Second Working Conference on Asynchronous Design Methodologies, South Bank University, London, UK, May 1995.
Report			Adobe PDF	Compressed PostScript

CS-95-31
Title	Nonlinear Iteration Methods for High Speed Laminar Compressible Navier-Stokes Equations
Authors	P.A. Forsyth and H. Jiang
Abstract	Full Newton nonlinear iteration is compared to use of a defect correction approach (first order Jacobian, second order residual) for solving the steady state compressible flow equations. The Jacobian is constructed numerically, and solved using a PCG type method with block $ILU(k)$ preconditioning. Numerical tests are carried out using the NACA 0012 airfoil, at various free stream Mach numbers and Reynolds numbers. The full Newton approximation is generally more robust and takes less CPU time than the defect correction approach. No particular difficulty was observed in solving the full Newton Jacobian using an $ILU(2)$ ($\simeq 100,000$ unknowns) with CGSTAB acceleration.
Date	July 1995
Report			Adobe PDF	Compressed PostScript

CS-95-30
Title	Finding Largest Subtrees and Smallest Supertrees
Authors	A. Gupta and N. Nishimura
Abstract	As trees are used in a wide variety of application areas, the comparison of trees arises in many guises. Here we consider two generalizations of classical tree pattern matching, which consists of determining if one tree is isomorphic to a subgraph of another. For the embedding problems of subgraph isomorphism and topological embedding, we present algorithms for determining the largest tree embeddable in two trees T and T' (or a largest subtree) and for constructing the smallest tree in which each of T and T' can be embedded (or a smallest supertree). Both subtrees and supertrees can be used in a variety of different applications. For example, when each of the two trees contains partial information about a data set, such as the evolution of a set of species, the subtree or supertree corresponds to a structuring of the data in a manner consistent with both original trees. The size of a subtree or supertree of two trees can also be used to measure the similarity between two arrangements of data, whether images, documents, or RNA secondary structures. In this paper, we present a general paradigm for sequential and parallel subtree and supertree algorithms for subgraph isomorphism and topological embedding. Our sequential algorithms run in time O(n^{2.5} log n) and our parallel algorithms in time O(log^3 n) on a randomized CREW PRAM using a polynomial number of processors. In addition, we produce better algorithms for these problems when the underlying trees are ordered, that is, when the children of each node have a left-to-right ordering associated with them. In particular, we obtain O(n^2) time sequential algorithms and O(log^3 n) time deterministic parallel algorithms on CREW PRAMs for both embeddings.
Date	July 1995
Report			Adobe PDF	Compressed PostScript

CS-95-25
Title	Text/Relational Database Management Systems: Overview and Proposed SQL Extensions
Authors	G.E. Blake,M.P. Consens, I.J. Davis, P. Kilpelainen, E. Kuikka, P.-A. Larson, T. Snider and F.W. Tompa
Abstract	Combined text and relational database support is increasingly recognized as an emerging need of industry, spanning applications requiring text fields as parts of their data (e.g., for customer sup- port) to those augmenting primary text resources by conventional rela- tional data (e.g., for publication control). In this paper, we propose extensions to SQL2 that provide flexible and efficient access to struc- tured text described by SGML or other encodings. We also propose an architecture to support a text/relational database management system as a federated database environment, where component databases are accessed via ``agents'': SQL agents that translate standard or extended SQL2 queries into vendor-specific dialects, and text agents that pro- cess text sub-queries on full-text search engines.
Date	June 1995
Comments	Supercedes earlier version published in Proceedings of the ADB'94 Conference.
Report			Adobe PDF	Compressed PostScript

CS-95-24
Title	An Analysis of Polynomial Composition Algorithms
Authors	S. Mann and W. Liu
Abstract	An analysis is made of the runtime of a previously published algorithm for polynomial composition. Two new, more efficient algorithms are presented. One of these algorithms is optimal, while the other algorithm is numerically more stable than the optimal one. Additionally, as a generalization of polynomial composition, we show how to compose a multiaffine function with a set of polynomials as an extension to an earlier algorithm for composing two polynomial functions. With this extension, we are able to perform degree raising with composition.
Date	June 1995
Report			Adobe PDF	Compressed PostScript

CS-95-22
Title	Evaluating Tensor Product and Triangular Bezier Surfaces
Authors	J. Carriere
Abstract	Many papers describe techniques for evaluating spline curves and surfaces. While each paper provides some theoretical or empirical evidence with which to compare techniques, there exist few global comparisons. Also, papers describing particular algorithms often provide few details, making implementation of the technique presented difficult or impossible. This report attempts to illuminate the performance relationships between, and implementations of, various methods for rendering spline surfaces. Empirical results are given for bicubic tensor product Bezier surfaces, and for cubic and quartic triangular Bezier surfaces.
Date	May 1995
Report			Adobe PDF	Compressed PostScript

CS-95-20
Title	Walking Streets Faster
Authors	A. Lopez-Ortiz and S. Schuierer
Abstract	A fundamental problem in robotics is to compute a path for a robot from its current location to a given goal. In this paper we consider the problem of a robot equipped with an on-board vision system searching for a goal g in an unknown environment. We assume that the robot is originally located at a point $s$ on the boundary of a street polygon. A street is a simple polygon with two distinguished points s and g, which are located on the polygon boundary and the part of the polygon boundary from s to g is weakly visible to the part from g to s and vice versa. Our aim is to minimise the ratio of the length of the path traveled by the robot to the length of the shortest path from s to g. In analogy to on-line algorithms this value is called the competitive ratio. We present a series of strategies all of which follow the same general high level strategy. In the first part we present a class of strategies any of which can be shown to have a competitive ratio of Pi + 1. These strategies are robust under small navigational errors and their analysis is very simple. In the second part we present the strategy Continuous Lad which is based on the heuristic optimality criterion of minimising the Local Absolute Detour. We show an upper bound on the competitive ratio of Continuous Lad of ~2.03. Finally, we also present a hybrid strategy consisting of Continuous Lad and the strategy Move-in-Quadrant. We show that this combination of strategies achieves a competitive ratio of 1.73. This about halves the gap between the known sqrt(2) lower bound for this problem and the previously best known competitive ratio of ~2.05.
Date	October 1995
Report			Adobe PDF	Compressed PostScript

CS-95-17
Title	From Data Representation to Data Model: Meta-Semantic Issues in the Evolution of SGML
Authors	D. Raymond, F.W. Tompa and D. Wood
Abstract	SGML provides standard representations for documents, but as documents become more fluid, we will need standard semantics for them as well. The ability to manage change is a fundamental capability of any system that supports document semantics. We look at three areas important in change management: equivalence, redundancy, and operators. We show how these areas are implicitly addressed in SGML and SGML-based standards, and argue that more explicit consideration would be useful both for evaluating current standards, and for developing new systems for document semantics. Keywords:document semantics, document update, document data modelling
Date	April 1995
Report			Adobe PDF	Compressed PostScript

CS-95-14
Title	The Complexity of Subgraph Isomorphism: Duality Results for Graphs of Bounded Path-and Tree-Width
Authors	A. Gupta and N. Nishimura
Abstract	We present a clear demarcation between classes of bounded tree-width graphs for which the subgraph isomorphism problem is NP-complete and those for which it can be solved in polynomial time. In previous work, it has been shown that this problem is solvable in polynomial time if the source graph has either bounded degree or is k-connected, for k the tree-width of the two graphs. As well, it has also been shown that for certain specific connectivity or degree conditions, the problem becomes NP-complete. Here we give a complete characterization of the complexity of this problem on bounded tree-width graphs for all possible connectivity conditions of the two input graphs. Specifically, we show that when the source graph is not k-connected or has more than k vertices of unbounded degree the problem is NP-complete, thus answering an open question of Matousek and Thomas. Many of our reductions are restricted to using a subset of graphs of bounded tree-width, namely graphs of bounded path-width. As a direct result of our constructions, we also show that when the source and target graphs have connectivity less than k or at least one has k vertices of unbounded degree, the subgraph isomorphism problem for bounded path-width graphs is NP-complete.
Date	February 1995
Report			Adobe PDF	Compressed PostScript

CS-95-12
Title	A Global Search Architecture
Authors	F.J. Burkowski, G.V. Cormack, C.L.A. Clarke and R.C. Good
Abstract	Recent advances in communication and storage technology make available vast quantities of on-line information. But this information is of limited use unless it can be searched effectively. Huge scale and heterogeneity of data raise a unique combination of architectural issues that must be addressed to support effective search. These issues occasion the use of multi-user distributed search databases with the following capabilities: efficient structured searching of the contents of files having various schema; continuous availability in spite of failures and maintenance; high-throughput incorporation of a continuous stream of updates, especially the arrival new data and removal of obsolete data. We present an architecture that embodies solutions to specific technical problems that arise in the provision of these capabilities.
Date	March 1995
Report			Adobe PDF	Compressed PostScript

CS-95-10
Title	A Robust Storage System Architecture
Authors	R.C. Good, G.V. Cormack, D. Taylor and C.L.A. Clarke
Abstract	Error-correcting codes allow either incorrect data to be corrected or missing data to be rebuilt. They are frequently used with communications channels to recover data lost through line noise and thus provide a `noise free' bit pipe. Data can also be lost through hardware failure; for instance a disk crash. In case of hardware failure, we want a storage system that has the robustness and tunability of error-correcting codes in order to provide recovery of the lost data. This is especially so when dealing with systems involving a large number of disks as, when taken as a group, they are more error prone than single disks but are a vary practical way of building large data stores. At present, the most common way to provide data recovery is straight duplication (mirroring) or a code able to detect single failures within a tightly coupled array of disks. A new prototype system has been designed and implemented which uses linear error-correcting codes to provide data storage over a loosely coupled distributed storage system. The fault-tolerance of this system can be varied by choosing the amount of redundant information stored.
Date	February 1995
Report			Adobe PDF	Compressed PostScript

CS-95-09
Title	Interchanging the Order of Grouping and Join
Authors	W.P. Yan and P.A. Larson
Abstract	Assume that we have an SQL query containing joins, a GROUP-BY and possibly a HAVING predicate. The standard way of evaluating this type of query is to first perform the group-by early, that is to push the group-by operation past one or more joins. This may reduce the query processing cost by reducing the amount of data partocipating in joins. The reverse transformation, ie.e, performing joing before group-by, can also be beneficial because the join may greatly reduce the input to the group-by. We formally define the problem, adhering strictly to the semantics of SQL2, and prove necessary and sufficient conditions for deciding when the transformation is valid. In practice, it may be expensive or even impossible to test whether the conditions are satisfied. Therfore, we also present a more practical algorith that tests a simpler, sufficient condition. This algorithm is fast and detects a large subclass of transformable queries.
Date	February 1995
Report			Adobe PDF	Compressed PostScript

CS-95-08
Title	A Calculus for Concurrent Update
Authors	G.V. Cormack
Abstract	The distributed operation transform (dOPT) is proposed by Ellis and Gibbs (SIGMOD Record 18:2, 1989) as a lock-free algorithm to ensure the consistency of concurrently updatable distributed objects. dOPT is shown by counterexample to be incorrect. A corrected algorithm is given for a restricted environment based on point-to-point communication. There appears to be no simple correction to dOPT for a general environment based on broadcast communication.
Date	January 1995
Report			Adobe PDF	Compressed PostScript

CS-95-07
Title	On the use of Regular Expressions for Searching Text
Authors	C.L.A. Clarke and G.V. Cormack
Abstract	The use of regular expressions to search text is well known and understood as a useful technique. It is then surprising that the standard techniques and tools prove to be of limited use for searching text formatted with SGML or other similar markup languages. Experience with structured text search has caused us to carefully re-examine the current practice. The generally accepted rule of "left-most longest match" is an unfortunate choice and is at the root of the difficulties. We instead propose a rule which is semantically cleaner and is incidentally more simple and efficient to implement. This rule is generally applicable to any text search application.
Date	February 1995
Report			Adobe PDF	Compressed PostScript

CS-95-06
Title	A Calculus for Concurrent Update
Authors	G.V. Cormack
Abstract	This paper introduces a calculus for concurrent update (CCU) that is used to specify distributed objects. The calculus permits updates to be effected immediately at each site - no central server, locking, token passing, rollback, or other form of serialization is enforced. Notice of each update at each site is transmitted to every other site, where a corresponding update is effected. Unless special provisions are taken, network transmission delay may cause corresponding updates to be effected at different sites in different orders, potentially rendering them meaningless or inconsistent. CCU avoids this eventuality: transformations are applied to corresponding updates as necessary to preserve overall meaning and consistency. CCU derives from the Distributed Operational Transform (dOPT) algorithm proposed by Ellis and Gibbs (ACM SIGMOD Record 18:2, 1989) as a concurrency control mechanism for groupware systems. dOPT introduces the notion of consistency-preserving transformations, and embeds exactly the lightweight CBCAST algorithm published later by Birman, Schiper and Stephenson (ACM TOCS 9:3, 1991). However, Ellis and Gibbs present no method for reasoning about either the overall consistency or meaningfulness of the transforms as applied by dOPT. Indeed, dOPT is incorrect, as demonstrated by a counterexample in which it fails to maintain consistency.
Date	January 1995
Report			Adobe PDF	Compressed PostScript

CS-95-01
Title	The Grail Papers: Version 2.3
Authors	D. Raymond and D. Wood
Abstract	Grail is a package for computing with finite-state machines and regular expressions, written in C++. Grail supports input and output of textual descriptions of automata and regular expres- sions, conversions between machines and regular expressions, and other operations. Grail can be used as a set of shell-callable processes, a library of functions, or as individual C++ classes. Version 2.3 of Grail supports parameterizable machines and expressions. This collection of papers includes a description of the the his- tory and design of Grail, a user's guide, a programmer's guide.
Date	January 1995
Report	Intro: PDF, PS.Z	Prog: PDF, PS.Z	Title: PDF, PS.Z	User: PDF, PS.Z