Please note: This PhD defence will take place in DC 2314 and online.
Mengxiao (Max) Zhang, PhD candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Chengnian Sun
Given a program P and a property ψ it preserves, the goal of program reduction is to minimize this program while ensuring that the minimized version P𝑚𝑖𝑛 still preserves ψ. Program reduction is a widely used technique to facilitate compiler debugging, as most compiler bugs are triggered by program inputs, which often contain a significant amount of code unrelated to the bug. Although program reduction automates eliminating such bug-irrelevant components from a program, it can take hours to complete due to its trial-and-error strategy. Furthermore, even if the reduction process produces a smaller program after a long wait, the resulting program is not guaranteed to be optimal, as program reduction is an NP-complete problem. As a result, compiler developers consistently seek program reduction approaches that offer faster reduction speeds or produce smaller outputs. It is critical to design superior program reduction approaches to further explore the potential of this field.
This thesis aims to help enhance program reduction approaches in the following three aspects.
The first study aims to enhance the versatility of program reduction approaches. While existing techniques, such as C-Reduce and Perses, can effectively reduce a bug-triggering program as a whole, they fail to consider the varied degrees of relevance each remaining element has to the bug. To address this limitation, this study proposes PPR, a new program reduction technique designed to minimize a pair of programs w.r.t. certain properties. Given a seed program Ps and its variant Pv with differing properties (e.g., Pv crashes a compiler while Ps does not), PPR produces a pair of minimized programs that preserve these properties separately, with reduced differences highlighting bug-related elements. Evaluation results demonstrate that PPR effectively reduces pairs of programs, producing outputs of comparably small size to those generated by Perses and C-Reduce. In addition, PPR significantly outperforms the Delta Debugging algorithm in isolating bug-related differences.
The second study concentrates on the simplicity of a state-of-the-art program reduction approach, i.e., ProbDD. ProbDD, a variant of ddmin, employs a probabilistic model to estimate the likelihood of each element being relevant to ψ, achieving superior performance compared to the Delta Debugging algorithm. However, the theoretical probabilistic model underlying ProbDD is intricate and not fully understood. To address this, the study conducts the first in-depth theoretical analysis of ProbDD, comprehending and demystifying this state-of-the-art approach from multiple perspectives. Building upon these insights, the study further proposes CDD, a simplified counter-based model that retains the core strengths of ProbDD. The evaluation demonstrates that CDD can achieve the same performance as ProbDD, despite being much simplified.
The third study integrates large language models (LLMs) to improve the efficacy of program reduction. Existing program reduction techniques typically fall into two categories: generic approaches applicable to a wide range of languages but lacking domain-specific knowledge, and language-specific approaches that exploit detailed language knowledge but fail to generalize across multiple languages. However, effectively combining both language generality and language-specific expertise in program reduction is yet to be explored. To this end, this paper proposes LPR, the first LLM-aided technique leveraging LLMs to perform language-specific program reduction for multiple languages. The key insight is to synergize the language generality of existing program reducers, such as Perses, with the language-specific semantics learned by LLMs. The evaluation shows that LPR surpasses Vulcan by producing 24.93%, 4.47%, and 11.71% smaller programs, while reducing the reduction time by 10.77%, 34.88%, 36.96% on benchmarks in C, Rust and JavaScript, separately.
To attend this PhD defence in person, please go to DC 2314. You can attend virtually on Zoom.