Week 1 (Sept 06): Introduction and Course Overview

Admin details
Intro, a primer in stats and building models, and a study on importance of data in SE. Slides, R Scripts

Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results
Jelte M. Wicherts, Marjan Bakker, Dylan Molenaar
Predicting fault incidence using software change history
Todd L. Graves, Alan F. Karr, J. S. Marron, and Harvey P. Siy
Analysis Techniques: Basic linear regression, GLM, R2, model error, exponential decay
Week 2 (Sept 13): Infrastructure

Intro to Azure.

Paper 1: Using Pig as a data preparation language for large-scale mining software repositories studies: An experience report
Weiyi Shang, Bram Adams, and Ahmed E. Hassan
Analysis Techniques: Pig Queries
Cosmos Zhu
Paper 2: MapReduce as a General Framework to Support Research in Mining Software Repositories (MSR)
Weiyi Shang, Zhen Ming Jiang, Bram Adams, Ahmed E. Hassan
Analysis Techniques: MapReduce
Week 3 (Sept 20): Collecting Large Datasets (The process used to collect the dataset)
Paper 3: Amassing and indexing a large sample of version control systems: Towards the census of public source code history
Audris Mockus
Analysis Techniques: Indexing techniques
Amine Mehdhbi
Paper 4: GHTorrent: Github's data from a firehose
Georgios Gousios and Diomidis Spinellis
Paper 5: The Ultimate Debian Database: Consolidating bazaar metadata for Quality Assurance and data mining
Lucas Nussbaum, and Stefano Zacchiroli
Zheng Kun Chen
The Qualitas Corpus: A Curated Collection of Java Code for Empirical Studies
Ewan Tempero, Craig Anslow, Jens Dietrich, Ted Han, Jing Li, Markus Lumpe, Hayden Melton, and James Noble
Week 4 (Sept 27): Project Proposal Prep
No Class (Spend time on selecting paper and Proposal)
Week 5 (Oct 4): Source code similarity
2 Page Proposal Paper Due
Paper 6: A study of the uniqueness of source code
Mark Gabel and Zhendong Su
Analysis Techniques: Lexical representation of code, syntactic redundancy, sequence matching
Aarti Malhotra
Paper 7: SourcererCC: Scaling Code Clone Detection to Big Code
Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K Roy, Cristina V Lopes
Analysis Techniques: Clone detection
Paper 8: A Study of Repetitiveness of Code Changes in Software Evolution
Hoan Anh Nguyen, Anh Tuan Nguyen, Tung Thanh Nguyen, Tien N. Nguyen, and Hridesh Rajan
Analysis Techniques: AST matching
Chu Feng
Week 6 (Oct 11)
No Class Thanksgiving break
Week 7 (Oct 18): API Mining
Paper 9: API Code Recommendation Using Statistical Learning from Fine-grained Changes
Anh Tuan Nguyen, Michael Hilton, Mihai Codoban, Hoan Nguyen, Lily Mast, Eli Rademacher, Tien N. Nguyen, Danny Dig
Analysis Techniques:
Paper 10: Large-scale, AST-based API-usage analysis of open-source Java projects
Ralf Lammel, Ekaterina Pek, and Jurgen Starek
Analysis Techniques: AST mining
Joshua Boluwatife
Paper 11: Software Bertillonage: Finding the Provenance of an Entity
Julius Davies, Daniel M. German, Michael W. Godfrey, and Abram Hindle
Analysis Techniques:
Week 8 (Oct 25): Testing
Paper 12: Coverage Is Not Strongly Correlated With Test Suite Effectiveness
Laura Inozemtseva and Reid Holmes
Analysis Techniques: Correlation
Amin Bandali
Paper 13: Techniques for improving regression testing in continuous integration development environments
Sebastian Elbaum, Gregg Rothermel, and John Penix
Analysis Techniques:
Paper 14: Automatic Identification of Load Testing Problems
Zhen Ming Jiang, Ahmed E. Hassan, Parminder Flora, and Gilbert Hamann
Analysis Techniques: Log Decomposition, Dominant Behavior Identfication, Anomaly Detection, z-stats
Achyudh Ram
Week 9 (Nov 1): Mobile
Paper 15: Feature Lifecycles as They Spread, Migrate, Remain and Die in App Stores
Federica Sarro, Afnan Al-Subaihin, Mark Harman, Yue Jia, William Martin, and Yuanyuan Zhang
Analysis Techniques:
Reza Nadri
Paper 16: API Change and Fault Proneness: A Threat to the Success of Android Apps
Mario Linares-Vasquez, Gabriele Bavota, Carlos Bernal-Cardenas, Massimiliano Di Penta, Rocco Oliveto, and Denys Poshyvanyk
Analysis Techniques:
Margaret Foley
Paper 17: IccTA: Detecting Inter-Component Privacy Leaks in Android Apps
Li Li, Alexandre Bartel, Tegawende F. Bissyande, Jacques Klein, Yves Le Traon, Steven Arzt, Siegfried Rasthofer, Eric Bodden, Damien Octeau, and Patrick McDaniel
Analysis Techniques:
Hung Viet Pham
Week 10 (Nov 8): Project Progress Prep
No Class
Week 11 (Nov 15): Programming Languages
Progress report due (2 pages IEEE format)
Paper 18: On the naturalness of software
Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu
Analysis Techniques: Statistical language models
Edmund Wong
Paper 19: A Large Scale Study of Programming Languages and Code Quality in Github
Baishakhi Ray, Daryl Posnett, Vladimir Filkov, Premkumar Devanbu
Analysis Techniques:
Kilby Baron
Paper 20: An Empirical Study of Goto in C Code from GitHub Repositories
Meiyappan Nagappan, Romain Robbes, Yasutaka Kamei, Eric Tanter, Shane McIntosh, Audris Mockus, and Ahmed E. Hassan
Analysis Techniques:
Week 12 (Nov 22): CI
Paper 21: Quality and Productivity Outcomes Relating to Continuous Integration in GitHub
Bogdan Vasilescu, Yue Yu, Huaimin Wang, Premkumar Devanbu, and Vladimir Filkov
Analysis Techniques: Statistical language models
Yao Lei
Paper 22: Usage, Costs, and Benefits of Continuous Integration in Open-Source Projects
Michael Hilton, Timothy Tunnell, Kai Huang, Darko Marinov, and Danny Dig
Analysis Techniques:
Spandan Chowdhury
Paper 23: A Large Scale Empirical Study of the Relationship Between Build Technology and Build Maintenance
Shane Mcintosh, Meiyappan Nagappan, Bram Adams, Audris Mockus, and Ahmed E. Hassan
Analysis Techniques:
Week 13 (Nov 29): Project Presentations
Project Report DUE -- Dec 7 (10 page IEEE report)