CS846 — Weekly Papers and Topics (Jan 6, 2026 onwards)

Week 1 — Intro to Class and LLMs for SE
2026-01-06
Week 2 — Vibe Coding
2026-01-13
Week 3 — GitHub Copilot tutorial + Code Comprehension without LLMs
2026-01-20
Week 4 — Requirements
2026-01-27
  • Advancing Requirements Engineering through Generative AI: Assessing the Role of LLMs — PDF
  • Investigating ChatGPT’s Potential to Assist in Requirements Elicitation Processes — arXiv
  • SpecGen: Automated Generation of Formal Program Specifications via Large Language Models — arXiv
  • Requirements Elicitation Follow-Up Question Generation — arXiv
  • Requirements Engineering using Generative AI: Prompts and Prompting Patterns — arXiv
  • Requirements Satisfiability with In-Context Learning — arXiv
  • An Automated Model of Software Requirement Engineering Using GPT-3.5 — IEEE
Presenters: Gavin, Artemiy, Savira
Week 5 — Code Summarization / Comprehension
2026-02-03
  • Few-shot training LLMs for project-specific code-summarization — arXiv
  • Automatic Semantic Augmentation of Language Model Prompts (for Code Summarization) — arXiv
  • Automatic Code Summarization via ChatGPT: How Far Are We? — arXiv
  • Can Large Language Models Serve as Evaluators for Code Summarization? — arXiv
  • Icing on the Cake: Automatic Code Summarization at Ericsson — arXiv
  • Is Multi‑Agent Debate (MAD) the Silver Bullet? An Empirical Analysis of MAD in Code Summarization and Translation arXiv
  • What You Need is What You Get: Theory of Mind for an LLM‑Based Code Understanding Assistant — arXiv
Presenters: Yiran, Carter, Basit
Week 6 — CodeGen / Planning
2026-02-10
  • Leveraging Print Debugging To Improve Code Generation In Large Language Models — PDF
  • LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation — DOI
  • Self-Collaboration Code Generation via ChatGPT — ACM
  • SOEN-101: Code Generation by Emulating Software Process Models Using LLM Agents — arXiv
  • Self-Organized Agents: A LLM Multi-Agent Framework — arXiv
  • Empowering Agile-Based Generative Software Development through Human-AI Teamwork — ACM
  • Guidelines to Prompt Large Language Models for Code Generation: An Empirical Characterization — arXiv
  • How AI-assisted coding will change software engineering: hard truths — Blog
  • OpenSpec: A Spec-Driven Workflow for AI Coding Assistants — Medium
Presenters: Xavier, Greg, Max
Week 7 — Reading Week
2026-02-17
Week 8 — DevReview / Debug
2026-02-24
  • Let’s Fix this Together: Conversational Debugging with GitHub Copilot — IEEE
  • Teaching Large Language Models to Self-Debug — arXiv
  • Explainable automated debugging via LLM-driven scientific debugging — Springer
  • If you are good at code review, you will be good at using AI agents — Blog
  • The Essential Guide to Reviewing AI-Generated Code — Blog
  • Developer and AI Code Reviewer: Reviewing AI-Generated Code in .NET — Blog
Presenters: Henry, Anudeep, Mariia
Week 9 — Testing / QA
2026-03-03
  • Software Testing With Large Language Models: Survey, Landscape, and Vision — IEEE
  • On the Evaluation of Large Language Models in Unit Test Generation — arXiv
  • Software Testing with Large Language Models: An Interview Study with Practitioners — PDF
  • Using Large Language Models to Generate JUnit Tests: An Empirical Study — PDF
  • Evaluating and Improving ChatGPT for Unit Test Generation — ACM
  • Large Language Models for Software Testing: A Research Roadmap — arXiv
Presenters: Liliana, Alina, Ethan, Sofiia
Week 10 — CodeReview / PR
2026-03-10
  • Accountability in Code Review: The Role of Intrinsic Drivers and the Impact of LLMs — ACM
  • Prompting and Fine-tuning Large Language Models for Automated Code Review Comment Generation — arXiv
  • Rethinking Code Review Workflows with LLM Assistance: An Empirical Study — arXiv
  • The Impact of Large Language Models (LLMs) on Code Review Process — arXiv
  • LLMs as Code Review Agents: A Rapid Review — Springer
  • Evaluating Large Language Models for Code Review — arXiv
  • Automated Code Review In Practice — arXiv
  • GitHub blog: Code review in the age of AI — Blog
  • Unlocking the full power of Copilot code review: Master your instructions files — Blog
  • Using GitHub Copilot code review — Docs
  • uReview: Scalable, Trustworthy GenAI for Code Review at Uber — Blog
  • Detecting malicious pull requests at scale with LLMs — Blog
Presenters: Felix, Neel, Ibrahim
Week 11 — Performance
2026-03-17
  • More Than Just Functional: "LLM-as-a-Critique for Efficient Code Generation" — PDF
  • PerfCodeGen — Improving Performance of LLM-Generated Code with Execution Feedback — arXiv
  • EFFI-LEARNER — Enhancing Efficiency of Generated Code via Self-Optimization — PDF
  • Learning Performance-improving Code Edits — PDF
  • Optimizing Code Runtime Performance through Context-Aware RAG — arXiv
Presenters: Maksym, Taha, Omar
Week 12 — Logging
2026-03-24
  • AL-Bench: A Benchmark for Automatic Logging — arXiv
  • Automated File-Level Logging Generation for Machine Learning Applications using LLMs: A Case Study using GPT-4o Mini — arXiv
  • Go Static: Contextualized Logging Statement Generation — arXiv
  • PDLogger: Automated Logging Framework for Practical Software Development — arXiv
  • Studying and Benchmarking Large Language Models For Log Level Suggestion — arXiv
  • Larger Is Not Always Better: Exploring Small Open-source Language Models in Logging Statement Generation — arXiv
  • Automated Proactive Logging Quality Improvement for Large-Scale Codebases — PDF
Presenters: Yifan, Disen, Xiangrui
Week 13 — Vibe Coding
2026-03-31