Week 1 (Jan 09): Introduction and Course Overview

Admin details
Intro Lecture based on app store mining by Ruiz et.al. A primer in R and building models. Slides, R Scripts

Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results
Jelte M. Wicherts, Marjan Bakker, Dylan Molenaar
Reading
Predicting fault incidence using software change history
Todd L. Graves, Alan F. Karr, J. S. Marron, and Harvey P. Siy
Analysis Techniques: Basic linear regression, GLM, R2, model error, exponential decay
Reading
Week 2 (Jan 16): Infrastructure

Intro to Azure - Guest Lecture by Sage Franch. Slides

Using Pig as a data preparation language for large-scale mining software repositories studies: An experience report
Weiyi Shang, Bram Adams, and Ahmed E. Hassan
Analysis Techniques: Pig Queries
Reading
MapReduce as a General Framework to Support Research in Mining Software Repositories (MSR)
Weiyi Shang, Zhen Ming Jiang, Bram Adams, Ahmed E. Hassan
Analysis Techniques: MapReduce
Reading
Week 3 (Jan 23): Collecting Large Datasets (The process used to collect the dataset)
Amassing and indexing a large sample of version control systems: Towards the census of public source code history
Audris Mockus
Analysis Techniques: Indexing techniques
Aaron Sarson
GHTorrent: Github's data from a firehose
Georgios Gousios and Diomidis Spinellis
Junyu Lai
The Ultimate Debian Database: Consolidating bazaar metadata for Quality Assurance and data mining
Lucas Nussbaum, and Stefano Zacchiroli
Yuefei Liu
The Qualitas Corpus: A Curated Collection of Java Code for Empirical Studies
Ewan Tempero, Craig Anslow, Jens Dietrich, Ted Han, Jing Li, Markus Lumpe, Hayden Melton, and James Noble
Reading/Ying Yu
Week 4 (Jan 30): Source code similarity
A study of the uniqueness of source code
Mark Gabel and Zhendong Su
Analysis Techniques: Lexical representation of code, syntactic redundancy, sequence matching
Yuan Xi
SourcererCC: Scaling Code Clone Detection to Big Code
Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K Roy, Cristina V Lopes
Analysis Techniques: Clone detection
Ten Bradley
A Study of Repetitiveness of Code Changes in Software Evolution
Hoan Anh Nguyen, Anh Tuan Nguyen, Tung Thanh Nguyen, Tien N. Nguyen, and Hridesh Rajan
Analysis Techniques: AST matching
Andrew Palmer
Week 5 (Feb 6): API Mining
Proposal Presentation (5 min presentation)
API Code Recommendation Using Statistical Learning from Fine-grained Changes
Anh Tuan Nguyen, Michael Hilton, Mihai Codoban, Hoan Nguyen, Lily Mast, Eli Rademacher, Tien N. Nguyen, Danny Dig
Analysis Techniques:
Deepak Rishi
Large-scale, AST-based API-usage analysis of open-source Java projects
Ralf Lammel, Ekaterina Pek, and Jurgen Starek
Analysis Techniques: AST mining
Sweta Barman
Software Bertillonage: Finding the Provenance of an Entity
Julius Davies, Daniel M. German, Michael W. Godfrey, and Abram Hindle
Analysis Techniques:
Reading/Diya Burman
Week 6 (Feb 13): Testing
Coverage Is Not Strongly Correlated With Test Suite Effectiveness
Laura Inozemtseva and Reid Holmes
Analysis Techniques: Correlation
Junyi Shen
Techniques for improving regression testing in continuous integration development environments
Sebastian Elbaum, Gregg Rothermel, and John Penix
Analysis Techniques:
Nikita Volodin
Automatic Identification of Load Testing Problems
Zhen Ming Jiang, Ahmed E. Hassan, Parminder Flora, and Gilbert Hamann
Analysis Techniques: Log Decomposition, Dominant Behavior Identfication, Anomaly Detection, z-stats
Atinderdeep Saini
Week 7 (Feb 20): No Class
Study Break
Week 8 (Feb 27): Bugs
Debugging in the (very) large: ten years of implementation and experience
Kirk Glerum, Kinshuman Kinshumann, Steve Greenberg, Gabriel Aul, Vince Orgovan, Greg Nichols, David Grant, Gretchen Loihle, and Galen Hunt
Analysis Techniques: Bucketing Algorithms
Shahin Rahbariasl
An Entropy Evaluation Approach for Triaging Field Crashes: A Case Study of Mozilla Firefox
Foutse Khomh, Brian Chan, Ying Zou, and Ahmed E. Hassan
Analysis Techniques: Entropy Analysis
Akshay Chopra
Statistical Debugging: A Hypothesis Testing-Based Approach
Chao Liu, Long Fei, Xifeng Yan, Jiawei Han, and Samuel P. Midkiff
Analysis Techniques: Predicate Ranking Models, Statistical Debugging
Angshuman Ghosh
Scalable statistical bug isolation
Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan
Reading/Yongi An
Week 9 (Mar 6): Project Progress Presentations
Progress report due (2 pages IEEE format)
Week 10 (Mar 13): Mobile
Feature Lifecycles as They Spread, Migrate, Remain and Die in App Stores
Federica Sarro, Afnan Al-Subaihin, Mark Harman, Yue Jia, William Martin, and Yuanyuan Zhang
Analysis Techniques:
Naga Malleswara Rao
API Change and Fault Proneness: A Threat to the Success of Android Apps
Mario Linares-Vasquez, Gabriele Bavota, Carlos Bernal-Cardenas, Massimiliano Di Penta, Rocco Oliveto, and Denys Poshyvanyk
Analysis Techniques:
Chang Ge
IccTA: Detecting Inter-Component Privacy Leaks in Android Apps
Li Li, Alexandre Bartel, Tegawende F. Bissyande, Jacques Klein, Yves Le Traon, Steven Arzt, Siegfried Rasthofer, Eric Bodden, Damien Octeau, and Patrick McDaniel
Analysis Techniques:
Jumyung Chang
Week 11 (Mar 20): Programming Languages
On the naturalness of software
Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu
Analysis Techniques: Statistical language models
Rizwan Aarif
A Large Scale Study of Programming Languages and Code Quality in Github
Baishakhi Ray, Daryl Posnett, Vladimir Filkov, Premkumar Devanbu
Analysis Techniques:
Ripul Jain
An Empirical Study of Goto in C Code from GitHub Repositories
Meiyappan Nagappan, Romain Robbes, Yasutaka Kamei, Eric Tanter, Shane McIntosh, Audris Mockus, and Ahmed E. Hassan
Analysis Techniques:
Reading
Week 12 (Mar 27): CI
Quality and Productivity Outcomes Relating to Continuous Integration in GitHub
Bogdan Vasilescu, Yue Yu, Huaimin Wang, Premkumar Devanbu, and Vladimir Filkov
Analysis Techniques: Statistical language models
Aimal Khan
Usage, Costs, and Benefits of Continuous Integration in Open-Source Projects
Michael Hilton, Timothy Tunnell, Kai Huang, Darko Marinov, and Danny Dig
Analysis Techniques:
Weicong Ma
A Large Scale Empirical Study of the Relationship Between Build Technology and Build Maintenance
Shane Mcintosh, Meiyappan Nagappan, Bram Adams, Audris Mockus, and Ahmed E. Hassan
Analysis Techniques:
Reading/Xinyi Zhang
Week 13 (Apr 3): Project Presentations
Project Report DUE -- Apr 24 (10 page IEEE report)