CS846 — Weekly Papers and Topics (Jan 6, 2026 onwards)

Week 1 — Intro to Class and LLMs for SE

2026-01-06

Week 2 — Vibe Coding

2026-01-13

Week 3 — GitHub Copilot tutorial + Code Comprehension without LLMs

2026-01-20

Week 4 — Requirements

2026-01-27

Advancing Requirements Engineering through Generative AI: Assessing the Role of LLMs — PDF
Investigating ChatGPT’s Potential to Assist in Requirements Elicitation Processes — arXiv
SpecGen: Automated Generation of Formal Program Specifications via Large Language Models — arXiv
Requirements Elicitation Follow-Up Question Generation — arXiv
Requirements Engineering using Generative AI: Prompts and Prompting Patterns — arXiv
Requirements Satisfiability with In-Context Learning — arXiv
An Automated Model of Software Requirement Engineering Using GPT-3.5 — IEEE

Presenters: Gavin, Artemiy, Savira

Week 5 — Code Summarization / Comprehension

2026-02-03

Few-shot training LLMs for project-specific code-summarization — arXiv
Automatic Semantic Augmentation of Language Model Prompts (for Code Summarization) — arXiv
Automatic Code Summarization via ChatGPT: How Far Are We? — arXiv
Can Large Language Models Serve as Evaluators for Code Summarization? — arXiv
Icing on the Cake: Automatic Code Summarization at Ericsson — arXiv
Is Multi‑Agent Debate (MAD) the Silver Bullet? An Empirical Analysis of MAD in Code Summarization and Translation arXiv
What You Need is What You Get: Theory of Mind for an LLM‑Based Code Understanding Assistant — arXiv

Presenters: Yiran, Carter, Basit

Week 6 — CodeGen / Planning

2026-02-10

Leveraging Print Debugging To Improve Code Generation In Large Language Models — PDF
LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation — DOI
Self-Collaboration Code Generation via ChatGPT — ACM
SOEN-101: Code Generation by Emulating Software Process Models Using LLM Agents — arXiv
Self-Organized Agents: A LLM Multi-Agent Framework — arXiv
Empowering Agile-Based Generative Software Development through Human-AI Teamwork — ACM
Guidelines to Prompt Large Language Models for Code Generation: An Empirical Characterization — arXiv
How AI-assisted coding will change software engineering: hard truths — Blog
OpenSpec: A Spec-Driven Workflow for AI Coding Assistants — Medium

Presenters: Xavier, Greg, Max

Week 7 — Reading Week

2026-02-17

Week 8 — DevReview / Debug

2026-02-24

Presenters: Henry, Anudeep, Mariia

Week 9 — Testing / QA

2026-03-03

Software Testing With Large Language Models: Survey, Landscape, and Vision — IEEE
On the Evaluation of Large Language Models in Unit Test Generation — arXiv
Software Testing with Large Language Models: An Interview Study with Practitioners — PDF
Using Large Language Models to Generate JUnit Tests: An Empirical Study — PDF
Evaluating and Improving ChatGPT for Unit Test Generation — ACM
Large Language Models for Software Testing: A Research Roadmap — arXiv

Presenters: Liliana, Alina, Ethan, Sofiia

Week 10 — CodeReview / PR

2026-03-10

Accountability in Code Review: The Role of Intrinsic Drivers and the Impact of LLMs — ACM
Prompting and Fine-tuning Large Language Models for Automated Code Review Comment Generation — arXiv
Rethinking Code Review Workflows with LLM Assistance: An Empirical Study — arXiv
The Impact of Large Language Models (LLMs) on Code Review Process — arXiv
LLMs as Code Review Agents: A Rapid Review — Springer
Evaluating Large Language Models for Code Review — arXiv
Automated Code Review In Practice — arXiv
GitHub blog: Code review in the age of AI — Blog
Unlocking the full power of Copilot code review: Master your instructions files — Blog
Using GitHub Copilot code review — Docs
uReview: Scalable, Trustworthy GenAI for Code Review at Uber — Blog
Detecting malicious pull requests at scale with LLMs — Blog

Presenters: Felix, Neel, Ibrahim

Week 11 — Performance

2026-03-17

More Than Just Functional: "LLM-as-a-Critique for Efficient Code Generation" — PDF
PerfCodeGen — Improving Performance of LLM-Generated Code with Execution Feedback — arXiv
EFFI-LEARNER — Enhancing Efficiency of Generated Code via Self-Optimization — PDF
Learning Performance-improving Code Edits — PDF
Optimizing Code Runtime Performance through Context-Aware RAG — arXiv

Presenters: Maksym, Taha, Omar

Week 12 — Logging

2026-03-24

AL-Bench: A Benchmark for Automatic Logging — arXiv
Automated File-Level Logging Generation for Machine Learning Applications using LLMs: A Case Study using GPT-4o Mini — arXiv
Go Static: Contextualized Logging Statement Generation — arXiv
PDLogger: Automated Logging Framework for Practical Software Development — arXiv
Studying and Benchmarking Large Language Models For Log Level Suggestion — arXiv
Larger Is Not Always Better: Exploring Small Open-source Language Models in Logging Statement Generation — arXiv
Automated Proactive Logging Quality Improvement for Large-Scale Codebases — PDF

Presenters: Yifan, Disen, Xiangrui

Week 13 — Vibe Coding

2026-03-31