Abstract
Few machine learning applications applied to the domain of programming languages make use of transfer learning. It has been shown that in other domains, such as natural language processing, that transfer learning improves performance on various tasks and leads to faster convergence. This paper investigates the use of transfer learning on machine learning models for programming languages - focusing on two tasks: method name prediction and code retrieval. We find that, for these tasks, transfer learning provides improved performance, as it does to natural languages. We also find that these models
can be pre-trained on programming languages that are different from the downstream task language and that even pre-training models on English language data is sufficient to provide similar performance as pre-training on programming languages. We believe this is because these models ignore syntax and instead look for semantic similarity between the named variables in source
code.
can be pre-trained on programming languages that are different from the downstream task language and that even pre-training models on English language data is sufficient to provide similar performance as pre-training on programming languages. We believe this is because these models ignore syntax and instead look for semantic similarity between the named variables in source
code.
Original language | English |
---|---|
Publication status | Published - 27 Jul 2020 |
Event | 16th International Conference on Data Science 2020 - Las Vegas, United States Duration: 27 Jul 2020 → 30 Jul 2020 Conference number: 16 https://icdatascience.org/ |
Conference
Conference | 16th International Conference on Data Science 2020 |
---|---|
Abbreviated title | ICDATA'20 |
Country/Territory | United States |
City | Las Vegas |
Period | 27/07/20 → 30/07/20 |
Internet address |