Code Generation from Supervised Code Embeddings

Hu, Han; Chen, Qiuyuan; Liu, Zhaoyi

doi:10.1007/978-3-030-36808-1_42

Han Hu⁹,
Qiuyuan Chen¹⁰ &
Zhaoyi Liu¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1142))

Included in the following conference series:

International Conference on Neural Information Processing

2722 Accesses
5 Citations

Abstract

Code generation, which generates source code from natural language, is beneficial for constructing smarter Integrated Development Environments (IDEs), retrieving code more effectively and so on. Traditional approaches are based on matching similar code snippets, and recently researchers pay more attention to machine learning, especially the encoder-decoder framework. Faced with code generation, most encoder-decoder frameworks suffer from two drawbacks: (a) The length of the code snippet is always much longer than the length of its corresponding natural language, which makes it hard to align them, especially for encoders at word level; (b) Code snippets with the same functionality could be implemented in various ways, even completely different at word level. For drawback (a), we propose a new Supervised Code Embedding (SCE) model to promote the alignment between natural language and code. For drawback (b), with the help of Abstract Syntax Tree (AST), we propose a new distributed representation of code snippets which overcomes this drawback. To evaluate our approaches, we build a variant of the encoder-decoder model to generates code with the help of pre-trained code embedding. We perform experiments on several open source datasets. The experiment results indicate that our approaches are effective and outperform the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
More details about tokenization phase, please refer to Sect. 3.2.
2.
https://drive.google.com/open?id=1nOuZjSS9lUqWfQptUOhfX9kNKd_FeCkn.
3.
https://github.com/akullpp/awesome-java.
4.
https://drive.google.com/drive/folders/1kC6fe7JgOmEHhVFaXjzOmKeatTJy1I1W.
5.
https://github.com/clonebench/BigCloneBench/blob/master/README.md.
6.
https://github.com/javaparser/javaparser.
7.
http://help.eclipse.org/mars/index.jsp.
8.
https://www.oracle.com/technetwork/articles/java/index-137868.html.

References

Allamanis, M., Tarlow, D., Gordon, A.D., Wei, Y.: Bimodal modelling of source code and natural language. In: ICML (2015)
Google Scholar
Alon, U., Brody, S., Levy, O., Yahav, E.: code2seq: generating sequences from structured representations of code. In: International Conference on Learning Representations (2019)
Google Scholar
Alon, U., Zilberstein, M., Levy, O., Yahav, E.: Code2vec: learning distributed representations of code. In: Proceedings ACM Program Language 3(POPL), 40:1–40:29 (2019). https://doi.org/10.1145/3290353, https://doi.acm.org/10.1145/3290353
Article Google Scholar
Balog, M., Gaunt, A.L., Brockschmidt, M., Nowozin, S., Tarlow, D.: Deepcoder: learning to write programs. arXiv preprint (2016). arXiv:1611.01989
Dieumegard, A., Toom, A., Pantel, M.: Model-based formal specification of a DSL library for a qualified code generator. In: Proceedings of the 12th Workshop on OCL and Textual Modelling, Innsbruck, Austria, September 30, 2012, pp. 61–62 (2012). https://doi.org/10.1145/2428516.2428527
Glück, R., Lowry, M.R. (eds.): Generative Programming and Component Engineering, 4th International Conference, GPCE 2005, Tallinn, Estonia, September 29 – October 1, 2005, Proceedings, Lecture Notes in Computer Science, vol. 3676, Springer (2005). https://doi.org/10.1007/11561347
Google Scholar
Hemel, Z., Kats, L.C.L., Groenewegen, D.M., Visser, E.: Code generation by model transformation: a case study in transformation modularity. Softw. Syst. Model. 9(3), 375–402 (2010)
Article Google Scholar
Iyer, S., Konstas, I., Cheung, A., Zettlemoyer, L.: Mapping language to code in programmatic context. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 1643–1652 (2018)
Google Scholar
Liang, P., Jordan, M.I., Klein, D.: Learning dependency-based compositional semantics. Comput. Linguist. 39(2), 389–446 (2013)
Article MathSciNet Google Scholar
Ling, W., et al.: Latent predictor networks for code generation. arXiv preprint (2016). arXiv:1603.06744
Ling, W., et al.: Latent Predictor Networks for Code Generation (2016)
Google Scholar
Locascio, N., Narasimhan, K., DeLeon, E., Kushman, N., Barzilay, R.: Neural generation of regular expressions from natural language with minimal domain knowledge. arXiv preprint (2016). arXiv:1608.03000
Manshadi, M.H., Gildea, D., Allen, J.F.: Integrating programming by example and natural language programming. In: AAAI (2013)
Google Scholar
Oda, Y., et al.: Learning to generate pseudo-code from source code using statistical machine translation. In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 574–584. IEEE (2015)
Google Scholar
Quirk, C., Mooney, R., Galley, M.: Language to code: learning semantic parsers for if-this-then-that recipes. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (vol. 1: Long Papers). vol. 1, pp. 878–888 (2015)
Google Scholar
Yin, P., Neubig, G.: A syntactic neural model for general-purpose code generation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (volume 1: Long Papers). vol. 1, pp. 440–450 (2017)
Google Scholar
Zettlemoyer, L.S., Collins, M.: Learning to map sentences to logical form: structured classification with probabilistic categorial grammars. arXiv preprint (2012). arXiv:1207.1420
Zhong, V., Xiong, C., Socher, R.: Seq2sql: generating structured queries from natural language using reinforcement learning. arXiv preprint (2017). arXiv:1709.00103

Download references

Author information

Authors and Affiliations

School of Software, Tsinghua University, Beijing, China
Han Hu
College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Qiuyuan Chen
School of Shenzhen Graduate, Peking University, Shenzhen, 518055, China
Zhaoyi Liu

Authors

Han Hu
View author publications
You can also search for this author in PubMed Google Scholar
Qiuyuan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhaoyi Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Han Hu,Qiuyuan Chen and Zhaoyi Liu

Corresponding author

Correspondence to Qiuyuan Chen .

Editor information

Editors and Affiliations

Australian National University, Canberra, ACT, Australia
Tom Gedeon
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, H., Chen, Q., Liu, Z. (2019). Code Generation from Supervised Code Embeddings. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Communications in Computer and Information Science, vol 1142. Springer, Cham. https://doi.org/10.1007/978-3-030-36808-1_42

Download citation

DOI: https://doi.org/10.1007/978-3-030-36808-1_42
Published: 05 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36807-4
Online ISBN: 978-3-030-36808-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics