R. Mann and A. Jepson Towards the Computational Perception of Action, Proc. Computer Vision and Pattern Recognition, June 23--25, 1998, Santa Barbara, CA, pp. 794--799.
Abstract: Understanding observations of interacting objects requires one to reason about qualitative scene dynamics. For example, on observing a hand lifting a can, we may infer that an `active' hand is applying an upwards force (by grasping) to lift a `passive' can. In previous work~\cite{MannJS97} we presented a system that infers qualitative scene dynamics from the instantaneous motion of objects. However, since that analysis only considered single frames in isolation, there were often multiple interpretations for each frame. In this work we show how the dynamic information inferred at each frame can be integrated over time to reduce ambiguity. Our approach to integrating information is to extend our representation to describe objects by a set of {\sl properties} or {\sl capabilities} that are assumed to persist over time. Given this extended representation we find interpretations that require the smallest set(s) of properties over the whole image sequence.