Explaining Software Defects Using Topic Models
Authors -
Tse-Hsun, Chen;
Stephen, W. Thomas;
Meiyappan, Nagappan and
Ahmed, E. Hassan
Venue -
In Proceedings of the 9th Working Conference on Mining Software Repositories (MSR 2012). Zurich, Switzerland. June 2-3, 2012
Related Tags -
Abstract -
Researchers have proposed various metrics based
on measurable aspects of the source code entities (e.g., methods, classes, files, or modules) and the social structure of a software
project in an effort to explain the relationships between software
development and software defects. However, these metrics
largely ignore the actual functionality, i.e., the conceptual
concerns, of a software system, which are the main technical
concepts that reflect the business logic or domain of the system.
For instance, while lines of code may be a good general measure
for defects, a large entity responsible for simple I/O tasks is
likely to have fewer defects than a small entity responsible for
complicated compiler implementation details. In this paper, we
study the effect of conceptual concerns on code quality. We use
a statistical topic modeling technique to approximate software
concerns as topics; we then propose various metrics on these
topics to help explain the defect-proneness (i.e., quality) of the
entities. Paramount to our proposed metrics is that they take
into account the defect history of each topic. Case studies on
multiple versions of Mozilla Firefox, Eclipse, and Mylyn show
that (i) some topics are much more defect-prone than others, (ii) defect-prone topics tend to remain so over time, and (iii)
defect-prone topics provide additional explanatory power for
code quality over existing structural and historical metrics
Preprint -
PDF
BibTex -
@article{Chen2012_2,
author = {Tse-Hsun, Chen and Stephen, W. Thomas and Meiyappan, Nagappan and Ahmed, E. Hassan},
keyword = {Defect Prediction, Topic Models},
title = {Explaining Software Defects Using Topic Models},
type = {conference},
venue = {In Proceedings of the 9th Working Conference on Mining Software Repositories (MSR 2012). Zurich, Switzerland. June 2-3, 2012}
}