Please note: This PhD seminar will take place in DC 2564.
Wenhan
(Cosmos)
Zhu,
PhD
candidate
David
R.
Cheriton
School
of
Computer
Science
Supervisor: Professor Michael Godfrey
Storing user-specific configuration files in a “dotfiles” repository is a common practice among software developers. Hundreds of thousands of developers have chosen to host their own dotfile repository publicly on GitHub. Storing these configuration files allows developers to have a backup of their important configurations, and sharing them publicly enables developers to learn from other developers’ configurations. Templates for user-configuration files, such as .gitignore for git, are very popular among developers, and are among the top 10 most forked projects on GitHub.
Currently, we have only a limited and anecdotal understanding of the practices of storing and sharing user-specific configuration files. In this study, we aim to gain a better understanding of this phenomenon. We start by collecting a set of publicly hosted dotfile repositories from GitHub. We then investigate the prevalence of owning dotfiles of developers by verifying whether popular GitHub accounts have a dotfile repository. We further trace the dotfile repositories owners’ public profile to confirm they are actual developers. Based on the most common dotfiles from the dotfile repositories, we construct a taxonomy of dotfiles. We examine commits in dotfile repositories to understand why developers update their dotfiles. We then leverage state-of-the-art time-series clustering method to understand the code churn pattern of highly edited dotfiles.
We find that maintaining dotfiles is a common practice among developers. For example, 25.8% of the top 500 most-starred users on GitHub own a variant of a dotfile repository. Configurations for text editors (e.g., Vim) and shells (e.g., bash, zsh) are the most commonly tracked dotfiles. We found that adjusting configurations (63.3%) and dotfiles project-meta management (25.4%) are the most common reasons for updating the dotfile repository. We also found no significant difference in the type of dotfiles observed across the code churn history patterns, suggesting that how often are dotfiles modified depends more on the developer than properties of the specific dotfile. Finally, we discuss the challenges in managing dotfiles, such as the need for a reliable and effective deployment mechanism, and how the information contained in dotfiles can benefit tool designers by providing real-world usage information.