The Impact of Classifier Configuration and Classifier Combination on Bug Localization
Authors -
Stephen, W. Thomas;
Meiyappan, Nagappan;
Dorothea, Blostein and
Ahmed, E. Hassan
Venue -
IEEE Transactions on Software Engineering, October 2013
Related Tags -
Abstract -
Bug localization is the task of determining which source code entities are relevant to a bug report. Manual bug localization is
labor intensive, since developers must consider thousands of source code entities. Current research builds bug localization classifiers, based on information retrieval models, to locate entities that are textually similar to the bug report. Current research, however, does
not consider the effect of classifier configuration, i.e., all the parameter values that specify the behavior of a classifier. As such, it is
unknown the effect of each parameter or which parameter values lead to the best performance. In this paper, we empirically investigate
the effectiveness of a large space of classifier configurations, 3,172 in total. Further, we introduce a framework for combining the results
of multiple classifier configurations, since classifier combination has shown promise in other domains. Through a detailed case study
on over 8,000 bug reports from three large-scale projects, we make two main contributions. First, we show that the parameters of a
classifier have a significant impact on its performance. Second, we show that combining multiple classifierswhether those classifiers
are hand-picked or randomly chosen relative to intelligently-defined subspaces of classifiersimproves the performance of even the
best individual classifiers.
Preprint -
PDF
BibTex -
@article{Thomas2013,
author = {Stephen, W. Thomas and Meiyappan, Nagappan and Dorothea, Blostein and Ahmed, E. Hassan},
keyword = {Bugs, Topic Models},
title = {The Impact of Classifier Configuration and Classifier Combination on Bug Localization},
type = {journal},
venue = {IEEE Transactions on Software Engineering, October 2013}
}