Master’s Thesis Presentation • Software Engineering • Code Generation and Testing in the Era of AI-Native Software Engineering

Monday, January 13, 2025 11:30 am - 12:30 pm EST (GMT -05:00)

Please note: This master’s thesis presentation will take place in DC 3317 and online.

Noble Saji Mathews, Master’s candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Mei Nagappan

Large Language Models (LLMs) like GPT-4 and Llama 3 are transforming software development by automating code generation and test case creation. This thesis investigates two pivotal aspects of LLM-assisted development: the integration of Test-Driven Development (TDD) principles into code generation workflows and the limitations of LLM-based test-generation tools in detecting bugs.

LLMs have demonstrated significant capabilities in generating code snippets directly from problem statements. This increasingly automated process mirrors traditional human-led software development, where code is often written in response to a requirement. Historically, Test-Driven Development (TDD) has proven its merit, requiring developers to write tests before the functional code, ensuring alignment with the initial problem statements. We investigate if and how TDD can be incorporated into AI-assisted code-generation processes. We experimentally evaluate our hypothesis that providing LLMs with tests in addition to the problem statements enhances code generation outcomes. Our results consistently demonstrate that including test cases leads to higher success in solving programming challenges.

As we progress toward AI-native software engineering, a logical follow-up question arises: Why not allow LLMs to generate these tests as well? An increasing amount of research and commercial tools now focus on automated test case generation using LLMs. However, a concerning trend is that these tools often generate tests by inferring requirements from code. Using real human-written buggy code as input, we evaluate these tools, showing how LLM-generated tests can fail to detect bugs and, more alarmingly, how their design can worsen the situation by validating bugs in the generated test suite and rejecting bug-revealing tests.

Together, these studies provide critical insights into the promise and pitfalls of integrating LLMs into software development processes, offering guidelines for improving their reliability and impact on software quality.


To attend this master’ thesis presentation in person, please go to DC 3317. You can also attend virtually on Zoom.