Professor Taylor's research concerns two major areas: distributed systems and fault tolerance. In the distributed-systems area, he has studied several issues, including replication and the debugging and monitoring of distributed applications. The work on replication produced several new replication-control protocols that appear to have very good performance relative to previous protocols. Recent work has concentrated on debugging and monitoring of distributed applications. A major theme of this work is that the fundamental “happened before” partial-order relationship should be used as a basis for understanding distributed executions rather than the real time of event occurrence.
Much of the work has concerned problems of scale. One scaling concern is how to present information about large execution histories so that the user is not overwhelmed with detail but without distortions relative to the full execution history. Another concern is the representation of partial orders involving very large numbers of processes, since the usual vector-clock techniques do not scale well. Recent work has focused on techniques for locating patterns in large partially ordered event histories, both after a distributed application has completed and “on line,” that is, as the events occur and are reported.
In addition to theoretical study, a tool for displaying event histories, the Partial-Order Event Tracer, has been developed to investigate the practical implications of the theoretical techniques in a wide variety of environments, including TCP sockets, PVM, MPI, and uC++.
In the fault-tolerance area, he has primarily studied the design of storage structures (implementations of data structures) for fault-tolerant systems. This study has included development of a general theoretical framework and basic results within that framework, design of specific structures with good fault-tolerance properties, and empirical study of the efficiency and fault tolerance of some specific structures.
Degrees and Awards
BSc (Saskatchewan), M.Math, PhD (Waterloo)
Paul G. Sorenson Distinguished Graduate Lecture, University of Saskatchewan (1992)
Industrial and Sabbatical Experience
In 1983-1984, Professor Taylor spent a sabbatical at the Computing Laboratory, University of Newcastle-upon-Tyne. During that sabbatical, he studied the use of forward error recovery in the context of atomic actions, which normally use backward-error-recovery techniques. In 1990-1991, he spent a sabbatical at the Centre for Advanced Studies, IBM Toronto Laboratory. During that sabbatical, he studied debugging of distributed applications and produced a prototype of a partial-order event-display tool, later incorporated into an IBM product. In 1999-2000, he spent a sabbatical at the IBM T. J. Watson Research Center, Hawthorne, New York. During that sabbatical, he examined issues in distributed-systems management. A major project concerned techniques for exploratory examination of large quantities of monitoring data, as found in an event-data warehouse. In 2008-2009, he spent a combined sabbatical and administrative leave at the Centre for Advanced Studies, IBM Toronto Laboratory. During that time, he developed a novel technique for reducing the space required for vector-clock representation of partial orders
M. J. Nichols and D. J. Taylor. A faster closure algorithm for pattern matching in partial-order event data. Proceedings of the 13th International Conference on Parallel and Distributed Systems (ICPADS), 2007.
M. J. Nichols and D. J. Taylor. Pattern rewriting for efficient search in partial-order event data. Proceedings of CASCON, pp. 58-70, 2007.
D. J. Taylor. Scrolling partially ordered event displays. Journal of Parallel and Distributed Computing, 65:643-653, 2005.
I.-L. Yen, F. B. Bastani, and D. J. Taylor. Design of multi-invariant data structures for robust shared accesses in multiprocessor systems. IEEE Transactions on Software Engineering, 27:193-207, 2001.