Please note: This PhD defence will take place in DC 2564 and virtually over Zoom.
Wenhan (Cosmos) Zhu, PhD candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Michael Godfrey
Software development is a complex process. To serve the final software product to the end user, developers need to rely on a variety of software artifacts throughout the development process. The term software repository used to denote only containers of source code such as version control systems; more recent usage has generalized the concept to include a plethora of software development artifact kinds and their related meta-data.
Broadly speaking, software repositories include version control systems, technical documentation, issue trackers, question and answer sites, distribution information, etc. The software repositories can be based on a specific project (e.g., bug tracker for Firefox), or be crowd-sourced (e.g., questions and answers on technical Q&A websites). Crowd-based software artifacts are created as by-products of developer-user interactions which are sometimes referred to as communication channels. In this thesis, we investigate three distinct crowd-based software repositories that follow different models of developer-user interactions. We believe through a better understanding of the crowd-based software repositories, we can identify challenges in software development and provide insights to improve the software development process.
In our first study, we investigate Stack Overflow. It is the largest collection of programming related questions and answers. On Stack Overflow, developers interact with other developers to create crowd-sourced knowledge in the form of questions and answers. The results of the interactions (i.e., the question threads) become valuable information to the entire developer community. Prior research on Stack Overflow tacitly assume that questions receive answers directly on the platform and no need of interaction is required during the process. Meanwhile, the platform allows attaching comments to questions which forms discussions of the question. Our study found that question discussions occur for 59.2% of questions on Stack Overflow. For discussed and solved questions on Stack Overflow, 80.6% of the questions have the discussion begin before the accepted answer is submitted. The results of our study show the importance and nuances of interactions in technical Q&A.
We then study dotfiles, a set of publicly shared user-specific configuration files for software tools. There is a culture of sharing dotfiles, where the idea is to learn from other developers’ dotfiles and share your variants. The interaction of dotfiles sharing can be viewed as developers sources information from other developers, adapt the information to their own needs, and share their adaptations back to the community. Our study on dotfiles suggests that is a common practice among developers to share dotfiles where 25.8% of the most stared users on GitHub have a dotfiles repository We provide a taxonomy of the commonly tracked dotfiles and a qualitative study on the commits in dotfiles repositories. We also leveraged the state-of-the-art time-series clustering technique (K-shape) to identify code churn pattern for dotfile edits. This study is the first step towards understanding the practices of maintaining and sharing dotfiles.
Finally, we study app stores, the platforms that distribute software products and contain many non-technical attributes (e.g., ratings and reviews) of software products. Three major stakeholders interact with each other in app stores: the app store owner who governs the operation of the app store; developers who publish applications on the app store; and users who browse and download applications in the app store. App stores often provide means of interaction between all three actors (e.g., app reviews, store policy) and sometimes interactions within the same actor (e.g., developer forum). We surveyed existing app stores to extract key features from app store operation. We then labelled a representative set of app store collected by web queries. K-means is applied to the labelled app stores to detect natural groupings of app stores. We observed a diverse set of app stores through the process. Instead of a single model that describes all app stores, fundamentally, our observations show that app stores operate differently. This study provides insights in understanding how app stores can affect software development.
In summary, we investigated software repositories containing software artifacts created from different developer-user interactions. These software repositories are essential for software development in providing referencing information (i.e., Stack Overflow), improving development productivity (i.e., dotfiles) and help distributing the software products to end users (i.e., app stores).
To attend this PhD defence in person, please go to DC 2564. You can also attend virtually using Zoom at https://uwaterloo.zoom.us/j/91346051827.