Week 1 (Sept 13): Introduction and Course Overview

Admin details
Intro, a primer in stats and building models, and a study on importance of data in SE. Slides, R Scripts

Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results
Jelte M. Wicherts, Marjan Bakker, Dylan Molenaar
Reading
Predicting fault incidence using software change history
Todd L. Graves, Alan F. Karr, J. S. Marron, and Harvey P. Siy
Analysis Techniques: Basic linear regression, GLM, R2, model error, exponential decay
Reading
Week 2 (Sept 20): Infrastructure
No Class (Spend time on selecting paper and Proposal)

Intro to Azure.

Using Pig as a data preparation language for large-scale mining software repositories studies: An experience report
Weiyi Shang, Bram Adams, and Ahmed E. Hassan
Analysis Techniques: Pig Queries
Reading
MapReduce as a General Framework to Support Research in Mining Software Repositories (MSR)
Weiyi Shang, Zhen Ming Jiang, Bram Adams, Ahmed E. Hassan
Analysis Techniques: MapReduce
Reading
Week 3 (Sept 27): Collecting Large Datasets (The process used to collect the dataset)
Proposal Paper due (2 pages IEEE Format)
Amassing and indexing a large sample of version control systems: Towards the census of public source code history
Audris Mockus
Analysis Techniques: Indexing techniques
Rahul
GHTorrent: Github's data from a firehose
Georgios Gousios and Diomidis Spinellis
Camilo
The Ultimate Debian Database: Consolidating bazaar metadata for Quality Assurance and data mining
Lucas Nussbaum, and Stefano Zacchiroli
Jia Wu
The Qualitas Corpus: A Curated Collection of Java Code for Empirical Studies
Ewan Tempero, Craig Anslow, Jens Dietrich, Ted Han, Jing Li, Markus Lumpe, Hayden Melton, and James Noble
Reading
Week 4 (Oct 4): Source code similarity
A study of the uniqueness of source code
Mark Gabel and Zhendong Su
Analysis Techniques: Lexical representation of code, syntactic redundancy, sequence matching
Sushant
SourcererCC: Scaling Code Clone Detection to Big Code
Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K Roy, Cristina V Lopes
Analysis Techniques: Clone detection
Brad
A Study of Repetitiveness of Code Changes in Software Evolution
Hoan Anh Nguyen, Anh Tuan Nguyen, Tung Thanh Nguyen, Tien N. Nguyen, and Hridesh Rajan
Analysis Techniques: AST matching
Mike
Week 5 (Oct 13): Class on Friday instead of Wednesday - API Mining
Proposal Presentation (5 min presentation)
API Code Recommendation Using Statistical Learning from Fine-grained Changes
Anh Tuan Nguyen, Michael Hilton, Mihai Codoban, Hoan Nguyen, Lily Mast, Eli Rademacher, Tien N. Nguyen, Danny Dig
Analysis Techniques:
Chen
Large-scale, AST-based API-usage analysis of open-source Java projects
Ralf Lammel, Ekaterina Pek, and Jurgen Starek
Analysis Techniques: AST mining
Shivam
Software Bertillonage: Finding the Provenance of an Entity
Julius Davies, Daniel M. German, Michael W. Godfrey, and Abram Hindle
Analysis Techniques:
Reading
Week 6 (Oct 18): Testing
Coverage Is Not Strongly Correlated With Test Suite Effectiveness
Laura Inozemtseva and Reid Holmes
Analysis Techniques: Correlation
Noah
Techniques for improving regression testing in continuous integration development environments
Sebastian Elbaum, Gregg Rothermel, and John Penix
Analysis Techniques:
Cassiano
Automatic Identification of Load Testing Problems
Zhen Ming Jiang, Ahmed E. Hassan, Parminder Flora, and Gilbert Hamann
Analysis Techniques: Log Decomposition, Dominant Behavior Identfication, Anomaly Detection, z-stats
Dishant
Week 7 (Oct 25): Bugs
Debugging in the (very) large: ten years of implementation and experience
Kirk Glerum, Kinshuman Kinshumann, Steve Greenberg, Gabriel Aul, Vince Orgovan, Greg Nichols, David Grant, Gretchen Loihle, and Galen Hunt
Analysis Techniques: Bucketing Algorithms
An Entropy Evaluation Approach for Triaging Field Crashes: A Case Study of Mozilla Firefox
Foutse Khomh, Brian Chan, Ying Zou, and Ahmed E. Hassan
Analysis Techniques: Entropy Analysis
Yuqing
Statistical Debugging: A Hypothesis Testing-Based Approach
Chao Liu, Long Fei, Xifeng Yan, Jiawei Han, and Samuel P. Midkiff
Analysis Techniques: Predicate Ranking Models, Statistical Debugging
Adam
Scalable statistical bug isolation
Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan
Reading
Week 8 (Nov 1): Mobile
Feature Lifecycles as They Spread, Migrate, Remain and Die in App Stores
Federica Sarro, Afnan Al-Subaihin, Mark Harman, Yue Jia, William Martin, and Yuanyuan Zhang
Analysis Techniques:
Kapil
API Change and Fault Proneness: A Threat to the Success of Android Apps
Mario Linares-Vasquez, Gabriele Bavota, Carlos Bernal-Cardenas, Massimiliano Di Penta, Rocco Oliveto, and Denys Poshyvanyk
Analysis Techniques:
Nikhita
IccTA: Detecting Inter-Component Privacy Leaks in Android Apps
Li Li, Alexandre Bartel, Tegawende F. Bissyande, Jacques Klein, Yves Le Traon, Steven Arzt, Siegfried Rasthofer, Eric Bodden, Damien Octeau, and Patrick McDaniel
Analysis Techniques:
Daniel
Week 9 (Nov 8): Project Progress Prep
No Class
Week 10 (Nov 15): Programming Languages
Progress report due (2 pages IEEE format)
On the naturalness of software
Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu
Analysis Techniques: Statistical language models
Zucheng
A Large Scale Study of Programming Languages and Code Quality in Github
Baishakhi Ray, Daryl Posnett, Vladimir Filkov, Premkumar Devanbu
Analysis Techniques:
Jichen
An Empirical Study of Goto in C Code from GitHub Repositories
Meiyappan Nagappan, Romain Robbes, Yasutaka Kamei, Eric Tanter, Shane McIntosh, Audris Mockus, and Ahmed E. Hassan
Analysis Techniques:
Week 11 (Nov 22): CI
Quality and Productivity Outcomes Relating to Continuous Integration in GitHub
Bogdan Vasilescu, Yue Yu, Huaimin Wang, Premkumar Devanbu, and Vladimir Filkov
Analysis Techniques: Statistical language models
Jeremy
Usage, Costs, and Benefits of Continuous Integration in Open-Source Projects
Michael Hilton, Timothy Tunnell, Kai Huang, Darko Marinov, and Danny Dig
Analysis Techniques:
Arman
A Large Scale Empirical Study of the Relationship Between Build Technology and Build Maintenance
Shane Mcintosh, Meiyappan Nagappan, Bram Adams, Audris Mockus, and Ahmed E. Hassan
Analysis Techniques:
Week 12 (Nov 29): Project Presentations
Project Report DUE -- Dec 20 (10 page IEEE report)