Spoken dialogue systems (SDS) are increasingly being deployed in a variety of commercial applications ranging from traditional Call Centre automation (e.g. travel information) to new ``troubleshooting'' or customer self-service lines (e.g. help fixing broken internet connections). SDS are notoriously fragile (especially to speech recognition errors), do not offer natural ease of use, and do not adapt to different users. One of the main problems for SDS is to maintain an accurate view of the user's goals in the conversation (e.g. find a good indian restaurant nearby, or repair a broadband connection) under uncertainty, and thereby to compute the optimal next system dialogue action (e.g. offer a restaurant, ask for clarification). Recent research in statistical spoken dialogue systems (SSDS) has successfully addressed aspects of these problems but, we shall show, it is currently hamstrung by an impoverished representation of user goals, which has been adopted to enable tractable learning with standard techniques. In the field as a whole, currently only small and unrealistic dialogue problems (usually less than 100 searchable entities) are tackled with statistical learning methods, for reasons of computational tractability.
In addition, current user goal state approximations in SSDS make it impossible to represent some plausible user goals, e.g. someone who wants to know about nearby cheap restaurants and high-quality ones further away. This renders dialogue management sub-optimal and makes it impossible to deal adequately with the following types of user utterance: ``I'm looking for french or italian food'' and ``Not Italian, unless it's expensive''. User utterances with negations and disjunctions of various sorts are very natural, and exploit the full power of natural language input, but current SSDS are unable to process them adequately. Moreover, much work in dialogue system evaluation shows that real user goals are generally sets of items with different features, rather than a single item. People like to explore possible trade offs between features of items.
Our main proposal is therefore to:
- develop realistic large-scale SSDS with an accurate, extended representation of user goals, and
- to use new Automatic Belief Compression (ABC) techniques to plan over the large state spaces thus generated. Techniques such as Value-Directed Compression demonstrate that compressible structure can be found automatically in the SSDS domain (for example compressing a test problem of 433 states to 31 basis functions).
These techniques have their roots in methods for handling the large state spaces required for robust robot navigation in real environments, and may lead to breakthroughs in the development of robust, efficient, and natural human-computer dialogue systems, with the potential to radically improve the state-of-the-art in dialogue management.