We present a framework for tracking rigid objects based on an adaptive Bayesian recognition technique that incorporates dependencies between object features. At each frame we find a maximum a posteriori (MAP) estimate of the object parameters that include positioning and configuration of non-occluded features. This estimate may be rejected based on its quality. Our careful selection of data points in each frame allows temporal fusion via Kalman filtering. Despite "unimodality" of our tracking scheme, we demonstrate fairly robust results in highly cluttered aerial scenes. Our technique forms a natural feedback loop between the recognition method and the filter that helps to explain such robustness. We study this loop and derive a number of interesting properties. First, the effective threshold for recognition in each frame is adaptive. It depends on the current level of noise in the system. This allows the system to identify partially occluded or distorted objects as long as the predicted locations are accurate. But requires a very good match if there is uncertainty as to the object location. Second, the search area for the recognition method is automatically pruned based on the current system uncertainty, yielding an efficient overall method.