Summarizing Source Code using a Neural Attention Model

Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Luke Zettlemoyer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

514 Citations (Scopus)
324 Downloads (Pure)


High quality source code is often paired with high level summaries of the computation it performs, for example in code documentation or in descriptions posted in online forums. Such summaries are extremely useful for applications such as code search but are expensive to manually author, hence only done for a small fraction of all code that is produced. In this paper, we present the first completely datadriven approach for generating high level summaries of source code. Our model, CODE-NN , uses Long Short Term Memory (LSTM) networks with attention to produce sentences that describe C# code snippets and SQL queries. CODE-NN is trained on a new corpus that is automatically collected from StackOverflow, which we release. Experiments demonstrate strong performance on two tasks: (1) code summarization, where we establish the first end-to-end learning results and outperform strong baselines, and (2) code retrieval, where our learned model improves the state of the art on a recently introduced C# benchmark by a large margin.
Original languageEnglish
Title of host publicationProceedings of the 54th Annual Meeting of the Association for Computational Linguistics
Subtitle of host publicationLong Papers
PublisherAssociation for Computational Linguistics
Number of pages11
ISBN (Print)9781510827585
Publication statusPublished - 1 Aug 2016
Event54th Annual Meeting of the Association for Computational Linguistics 2016 - Berlin, Germany
Duration: 7 Aug 201612 Aug 2016


Conference54th Annual Meeting of the Association for Computational Linguistics 2016
Abbreviated titleACL 2016


Dive into the research topics of 'Summarizing Source Code using a Neural Attention Model'. Together they form a unique fingerprint.

Cite this