COURSE OVERVIEW
Empirical Software Engineering is a well established research area in Software Engineering research. However, most of such research has been focused on examining a small set of projects. More recently there is an increased interest in mining data from ultra large scale software repositories. For example there has been efforts to extract all the software projects in SourceForge, GitHub, etc. The main reasons for studying such large repositories are to (a) understand the effects of the ecosystem, (b) to discover the underlying similarities between the different projects in the repository, and (c) to arrive at more generalizable conclusions.

COURSE SCHEDULE

There will be no physical classes this term as you already know.
We will also not have a single time at which we will meet online.
Classes starts May 11 2020.
No office hours - Email me when you have a question.

Contact Instructor: Mei Nagappan . Please prefix the subject line of your email with the code [CS846] for a timely reply.

COURSE OBJECTIVES
This seminar course explores leading research in Empirical Software Engineering using Ultra Large Repositories, discusses challenges associated with mining such repositories, highlights success stories, and outlines future research directions. We include in this course, studies on a variety of repositories like projects in GitHub, apps in the Android Market, discussion in Stack Overflow, libraries in npm etc. Students will acquire the knowledge needed to perform research or conduct practice in the field. Once completed, students should be able to integrate Empirical Software Engineering in their own research or practice. We have two main objectives in this course:

  • Be able to critique a paper
  • Replicate the results of a paper on new data

COURSE DELIVERY
The entire class will be done asynchronously and virtually. We will discuss each paper like a virtual program committee of a conference. All class interactions on each paper will be on HotCRP. All times below are in EST.

For Weekly Paper Discussions

  • Every week: We will discuss the 2 papers assigned to that week. A detailed schedule is available here.
  • Every Monday by 5 PM: Each student will upload a critique (details below on what is expected in a critique) as a review for the paper on HotCRP.
  • Monday 5 PM - Wednesday 5PM: Discussion of the paper lead by one student (who does not submit a critique). More details on what is expected in a discussion is below.
  • Every Friday 5PM: One student who is assigned to the paper will submit a summary (details below on what is expected in a summary).
For Project
  • There will be no presentations. Each project will be reviewed like a conference paper by three other students. Submissions will be double blind. The author student will not know who reviewed their paper and the reviewing students will not know whose paper they are reviewing.
  • This shall be done 3 times: proposal, mid-term report and final paper. (Expectations for each are below).
Important Note: While students are peer-reviewing other students' work, only I am responsible for grades. So please provide detailed feedback. Each student is graded for the feedback they give. Announcements: Any and all announcements will be made at the bottom of this page. Please book mark it and keep track.

COURSE REQUIREMENTS
Students are expected to have some background in software development and software engineering. Knowledge of machine learning or data mining techniques will be beneficial.

Students will be evaluated using the following breakdown:
1. Weekly critique (15% - 1% for each paper):
Each week, each student should critique all the papers for that week and submit one 500 word critique of the paper (YOU CAN RESUBMIT, BUT BOTH REVIEWS WILL BE VISIBLE) via HotCRP by Monday 5 PM. You will be assigned as a reviewer for all the papers in the system (except the one that you are summarizing).

You do not need to submit a critique for the paper you are summarizing, but need to submit critiques for the other papers that week. There are 16 total papers, and so you will get graded for 15 papers.

The critique is limited to 500 words. Do not upload a PDF. Upload your review for the corressponding paper at HotCRP.. THIS IS SERIOUS AND STRICTLY ENFORCED!
TEMPLATE FOR CRITIQUE

  1. Problem: What is the problem being solved. (* Note this is not asking for the solution but the PROBLEM *)
  2. New Idea: What is the New Idea(s) -- [or why was this paper accepted and published]
  3. Positive points [what I liked about this paper, at least 3 bullet items]
  4. Negative points [what I don't like about this paper, at least 3 bullet items]
  5. Future Work: If you were forced to do some follow on work related to this paper what do you think would be two things to work on? Try to think of at least one that YOU could actually do (e.g., does not require access to data/hardware you can't possibly get access to).

NOTE THAT YOU CANNOT SEE OTHER REVIEWS TILL YOU SUBMIT THE REVIEW.
2. Weekly Discussion (15% - 1% for each paper):
Discussions will happen every week from Monday 5PM to Wednesday 5PM. Students are expected to read every critique that has been submitted. The goal for this is to see which points you missed in the critique or have a contradictory point with another critique. Engage the students to understand where they inferred the information from in the paper. Each student needs to be able to justify what they have written in the critique with text directly from the paper or from another published paper.
  • Missed: If there is a point that another student has written but you did not, then engage that student to find out where they inferred that point from?
  • Contradictory point: Engage that student in discussion to see how they inferred what they inferred and present your reasoning as well.
Given that within the template of the critique above, no student will be able to write up every aspect of a paper. Hence, every student is expected to participate in the discussion.

Note: You do not lose grade points for missing something or for misunderstanding something. So please question and discuss your reviews thoroughly. I will be asking some students questions here as well.

3. Paper Summary (10%):
Due each Friday at 5 PM. Each paper will be assigned to one student who will act as a summarizer. In the paper, they will assigned as a meta-reviewer. In this role, the student will do two things:
  1. Lead the discussion from Mon-Wed in the previous step. Identify and call out students with differing opinions.
  2. From all the critiques and discussion you will summarize and use the same template as a critique above (except the word count or item count limitations) and submit this as a meta-review in HotCRP by Friday 5 PM.


4. Project (50%):
One project (5 pages IEEE format) done alone. The project will be a replication of one of the papers covered in the course or something similar.
Milestone 1: You need to submit a project proposal (1 pages IEEE format) by week 5. The proposal should:
  1. Provide a brief description of the paper you want to replicate.
  2. A motivation for why you want to replicate this paper.
  3. It should also have a detailed discussion of the data that will be used in the project: where will you get the data originally used in the paper, and what is the new data that you want to replicate the experiment on.
  4. The proposal paper is 10%.

Milestone 2: About 1 month before end of term, you update the proposal and add 2 pages about the replication of the paper with the original data. This report is 10% of the grade - 5% for completeness and 5% for the written report. In this report, you present the data from the replication. If the results are different then why do you think they were? Report on what changes you did to the methodology or the data?

Milestone 3: By the end of the term you will complete the replication of the paper on new data. This report is 30% of the grade - 15% for the completeness of the project and 15% for the written report. You will update the mid-term report to add two more pages. Thus there will be 5 pages (1 from proposal + 2 from mid-term report + 2 new pages). In the final report you are expected to not only present methodology you followed to get the new data and apply the old experiment. You are expected to present the results and also discuss what this means for the conclusions of the original study. Did the results hold or not? If not, then why do you think not? If it did then why do you think the results hold on a different dataset? What insights do we get from the replication?

Each milestone will be graded according to depth of your work, correctness of your analysis, and the presentation quality of your written report.

In case you choose to do a new project instead of a replication, the same grade breakdown will be used: 10% for proposal, 10% for mid term progress report, 30% for final paper. The GitHub, SourceForge, World of Code, Stackoverflow, Android and Apache datasets are possible sources of data to use for your project. Advice on writing a project report are here.

4. Project Review (10%):
Each student will get three reviews for the final paper that they submit. This means that each student will have to review three other papers. Template for Review:
  1. Paper summary
  2. Strengths and weaknesses - at least 3 each (summary of comments below)
  3. Detailed comments (to justify each point above)

You as a reviewer are looking for the following aspects: importance and quality of contribution, study methodology, depth of the discussion on the implications of the replicated results, amount of useful and actionable insights, and clarity of the presentation.
You will get 5% of the grade for the reviews. You will be given 1 week from the final paper submission date for submitting reviews.

There will be a 1 week discussion phase where all three students who reviewed a paper will discuss the merits of the paper on their reviews and come to a conclusion whether this paper can be accepted or not.
You will get 5% of the grade for the discussion of the three papers that you are reviewing.

Note: The reviews or the decision does not affect the grade of the student. I will solely make this conclusion. So please be critical since you are evaluated for how well you peer-review a paper.

ACADEMIC INTEGRITY
UWaterloo policy on academic integrity will be strictly enforced.