Charles L.A. ClarkeResearch interests

Professor Clarke's work addresses problems in information retrieval, question answering, database systems, and software engineering. His work is primarily of an applied nature, with the results validated through implementation, experimentation, and performance measurement. His research has touched on many topics across the broad area of information retrieval: ranking, efficiency, evaluation, question answering, user behaviour, clustering, filtering, document structure and XML.

His current work on focused retrieval addresses the size and scope of of results returned from IR systems. In many IR systems, the basic unit of retrieval is a document, which in practice might be a web page, a news article, or an email message. The goal of focused retrieval is to tailor the result to fit the information need, rather than returning the same pre-defined units under all circumstances. For example, given the query “text compression” over a large collection of academic books and journals, an ideal ranked result list might include a mixture of articles, sections, pages, journals and books. A special issue of a journal might be treated as a single result. A result taken from a textbook devoted to data structures might be expressed as a range of pages, covering part of a single chapter. On the other hand, a textbook entirely devoted to the subject could be returned as a single result, but with key definitions and concepts identified and highlighted as an aid to the searcher.

In addition, Professor Clarke has a longstanding interest in evaluation methodologies for information retrieval systems. Over the past several years, his efforts have been directed toward the creation of evaluation measures and test collections that are both realistic and reusable. Most recently, he has been working to develop and validate evaluation measures that account for novelty and diversity in search results.

Another current area of research is the implementation of filesystem search. In contrast to many desktop search applications, filesystem search aims to make the search facilities an integral component of the operating system. All filesystem changes are tracked. As files are added, deleted and modified, these changes are immediately reflected in an inverted index, through the use of indexing components specialized to each file type.

Degrees and awards

BSc (Memorial), MMath, PhD (Waterloo)

David R. Cheriton Faculty Fellowship, University of Waterloo (2008-2011); IBM CAS Fellow: Corporation Faculty Award (2007); IBM CAS Fellow: Corporation Faculty Award (2006); Professional Engineer of Ontario (2006)

Industrial and sabbatical experience

In addition to his research background, he has worked as a consultant and software developer in both commercial and engineering environments. From 1985 to 1998 he worked for an ocean engineering company developing remote sensing, DSP and image processing applications. From 1992 to 1993 he worked for a systems integration firm developing embedded point-of-sale systems and network software. In 1998, he co-founded isagn Inc. a privately held company that created and supported digital library systems. During his sabbatical leave in 2006 he visited Microsoft, where he was involved in the development of what became the Bing Web search engine.

Representative publications

Stefan Büttcher, Charles L. A. Clarke, and Gordon V. Cormack. Information Retrieval: Implementing and Evaluating Search Engines. MIT Press, 2010

S. Buettcher and C. L. A. Clarke. Hybrid Index Maintenance for Contiguous Inverted Lists. Information Retrieval, 11(3):175-207, 2008.

C. L. A. Clarke, E. Agichtein, S. Dumais, and R. W. White. The influence of caption features on clickthrough patterns in Web search. 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 135-142, 2007.

C. Latulipe, S. Mann, C. L.A. Clarke, and C. S. Kaplan. symSpline: Bimanual Symmetric Spline Manipulation. Conference on Human Factors in Computing Systems (CHI 2006), 2006.

C. L. A. Clarke. Controlling Overlap in Content-Oriented XML Retrieval. 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2005.

University of Waterloo
Contact information: 

Profiles by type