Key Research Papers on AI Coding
A curated bibliography of landmark research papers that shaped AI-assisted software development.
Foundation Models
Codex (2021) — Chen et al.
Citation: "Evaluating Large Language Models Trained on Code" — OpenAI. This paper introduced Codex, the model behind GitHub Copilot. Key finding: a 12B parameter model fine-tuned on code solves 28.8% of programming problems on the first attempt, rising to 70% with repeated sampling. This established that code generation was practically useful, not just a research curiosity.
AlphaCode (2022) — Li et al.
Citation: "Competition-Level Code Generation with AlphaCode" — DeepMind. AlphaCode competed in Codeforces programming competitions, ranking in the top 54% of human participants. Key insight: generating millions of candidate solutions and filtering them produces competitive results, establishing "sample and filter" as a viable code generation strategy.
Developer Productivity Studies
GitHub Copilot Productivity Study (2022)
Citation: "Productivity Assessment of Neural Code Completion" — Ziegler et al. A controlled study with 950 developers found Copilot users completed tasks 55.8% faster than non-users. Critically, the quality of completed tasks was statistically equivalent — speed increased without sacrificing correctness.
Google's AI-Assisted Code Review (2024)
Citation: Internal Google study on ML-assisted code review. Found that AI-powered code suggestions during review were accepted 45% of the time, reducing review cycles by 30% and improving code quality scores.
Security Research
Asleep at the Keyboard (2023) — Pearce et al.
Citation: "Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions." Found that approximately 40% of AI-generated code snippets contained security weaknesses, primarily in authentication, input validation, and cryptographic implementations. This paper established the importance of security review for AI-generated code.