Requirements for Requirements Engineering Tools that Require Understanding Requirement Semantics --- Why such tools should be clerical and not NLP-based

Daniel M. Berry

Cheriton School of Computer Science
University of Waterloo
Waterloo, ON, Canada

Abstract:

This talk notes the advanced state of the natural language (NL) processing art and considers four broad categories of tools for processing NL requirements documents. These tools are used in a variety of scenarios. The strength of a tool for a NL processing task is traditionally measured by its recall, precision, and their simple harmonic mean, the F-measure.

A hairy requirements or software engineering task involving NL documents is one that is not inherently difficult for NL-understanding humans on a small scale but becomes unmanageable in the large scale, such as occurs in industrial software development projects. A hairy task demands tool assistance. Because humans need far more help in carrying out a hairy task completely than they do in making the local yes-or-no decisions, a tool for a hairy task should have as close to 100% recall as possible, even at the expense of high imprecision. A tool that falls short of 100% recall may even be useless, e.g., when the software involved has high-dependability requirements, because to find the missing information, a human has to do the entire task manually anyway. Any such tool based on NL processing techniques inherently fails to achieve 100% recall, because even the best parsers are no more than 91% correct. Therefore, to achieve 100% recall in a tool for a hairy task, it needs to be based on something other than traditional NLP. Perhaps a dumb, clerical tool doing an identifiable part of such a task may be better than an intelligent tool trying but failing in unidentifiable ways to do the entire task.

The reality is that a tool's achieving exactly 100% recall, which may be impossible anyway, may not be necessary. It suffices for a human working with the tool on a task to achieve better recall than a human working on the task entirely manually.

This talk describes research whose goal is to discover and test a variety of non-traditional approaches to building tools for hairy tasks to see which, if any, allows a human working with with the tool to achieve better recall than a human working entirely manually. Among the early results are (1) some advice about the correct balance between recall and precision and the resulting weighted F-measure to use to evaluate tools for hairy tasks (2) and the introduction of a new measure, summarization.

Joint work with Ricardo Gacitua, Pete Sawyer, and Sri Fatimah Tjong