Skip to main content

LSTM-based argument recommendation for non-API methods

Abstract

Automatic code completion is one of the most useful features provided by advanced IDEs. Argument recommendation, as a special kind of code completion, is widely used as well. While existing approaches focus on argument recommendation for popular APIs, a large number of non-API invocations are requesting for accurate argument recommendation as well. To this end, we propose an LSTM-based approach to recommending non-API arguments instantly when method calls are typed in. With data collected from a large corpus of open-source applications, we train an LSTM neural network to recommend actual arguments based on identifiers of the invoked method, the corresponding formal parameter, and a list of syntactically correct candidate arguments. To feed these identifiers into the LSTM neural network, we convert them into fixed-length vectors by Paragraph Vector, an unsupervised neural network based learning algorithm. With the resulting LSTM neural network trained on sample applications, for a given call site we can predict which of the candidate arguments is more likely to be the correct one. We evaluate the proposed approach with tenfold validation on 85 open-source C applications. Results suggest that the proposed approach outperforms the state-of-the-art approaches in recommending non-API arguments. It improves the precision significantly from 71.46% to 83.37%.

This is a preview of subscription content, access via your institution.

References

  1. Robillard M, Walker R, Zimmermann T. Recommendation systems for software engineering. IEEE Softw, 2010, 27: 80–86

    Article  Google Scholar 

  2. Murphy G C, Kersten M, Findlater L. How are Java software developers using the Eclipse IDE? IEEE Softw, 2006, 23: 76–83

    Article  Google Scholar 

  3. Liu H, Liu Q, Staicu C A, et al. Nomen est omen: exploring and exploiting similarities between argument and parameter names. In: Proceedings of the 38th International Conference on Software Engineering. New York: ACM, 2016. 1063–1073

    Google Scholar 

  4. Zhang C, Yang J, Zhang Y, et al. Automatic parameter recommendation for practical API usage. In: Proceedings of the 2012 International Conference on Software Engineering. Piscataway: IEEE Press, 2012. 826–836

    Google Scholar 

  5. Asaduzzaman M, Roy C K, Monir S, et al. Exploring API method parameter recommendations. In: Proceedings of 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME), 2015. 271–280

  6. Raychev V, Vechev M, Yahav E. Code completion with statistical language models. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation. New York: ACM, 2014. 419–428

    Google Scholar 

  7. Hellendoorn V J, Devanbu P. Are deep neural networks the best choice for modeling source code? In: Proceedings of Joint Meeting on Foundations of Software Engineering, 2017. 763–773

  8. Le Q, Mikolov T. Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning, Beijing, 2014. 1188–1196

  9. Kuoa R J, Chen Z Y, Tien F C. Integration of particle swarm optimization and genetic algorithm for dynamic clustering. Inf Sci, 2012, 195: 124–140

    Article  Google Scholar 

  10. Pradel M, Sen K. Deepbugs: a learning approach to name-based bug detection. In: Proceedings of the ACM on Programming Languages, 2018. 1–25

  11. Liu H, Xu Z, Zou Y. Deep learning based feature envy detection. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. New York: ACM, 2018. 385–396

    Google Scholar 

  12. Liu H, Jin J, Xu Z, et al. Deep learning based code smell detection. IEEE Trans Softw Eng, 2019. doi: https://doi.org/10.1109/TSE.2019.2936376

  13. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput, 1997, 9: 1735–1780

    Article  Google Scholar 

  14. Wu D, Chi M. Long short-term memory with quadratic connections in recursive neural networks for representing compositional semantics. IEEE Access, 2017, 5: 16077–16083

    Article  Google Scholar 

  15. Theano Development Team. Theano: a Python framework for fast computation of mathematical expressions. 2016. ArXiv: 1605.02688

  16. Sears A, Shneiderman B. Split menus: effectively using selection frequency to organize menus. ACM Trans Comput-Human Interaction, 1994, 1: 27–51

    Article  Google Scholar 

  17. Butler S, Wermelinger M, Yu Y, et al. Improving the tokenisation of identifier names. In: Proceedings of European Conference on Object-Oriented Programming. Berlin: Springer, 2011. 130–154

    Google Scholar 

  18. Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space. 2013. ArXiv: 1301.3781

  19. Pennington J, Socher R, Manning C. Glove: global vectors for word representation. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, 2014. 1532–1543

  20. Joulin A, Grave E, Bojanowski P, et al. Fasttext: compressing text classification models. 2016. ArXiv: 1612.03651

  21. Hindle A, Barr E T, Gabel M, et al. On the naturalness of software. Commun ACM, 2016, 59: 122–131

    Article  Google Scholar 

  22. Tu Z, Su Z, Devanbu P. On the localness of software. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. New York: ACM, 2014. 269–280

    Google Scholar 

  23. Allamanis M, Barr E T, Bird C, et al. Learning natural coding conventions. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. New York: ACM, 2014. 281–293

    Google Scholar 

  24. Allamanis M, Barr E T, Bird C, et al. Suggesting accurate method and class names. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. New York: ACM, 2015. 38–49

    Google Scholar 

  25. Raychev V, Vechev M, Krause A. Predicting program properties from “big code”. In: Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. New York: ACM, 2015. 111–124

    Google Scholar 

  26. Lafferty J, McCallum A, Pereira F C. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning. New York: ACM, 2001. 282–289

    Google Scholar 

  27. White M, Vendome C, Linares-Vásquez M, et al. Toward deep learning software repositories. In: Proceedings of the 12th Working Conference on Mining Software Repositories. Piscataway: IEEE Press, 2015. 334–345

    Google Scholar 

  28. Murali V, Qi L, Chaudhuri S, et al. Neural sketch learning for conditional program generation. 2017. ArXiv: 1703.05698

  29. Wang K, Singh R, Su Z. Dynamic neural program embedding for program repair. 2017. ArXiv: 1711.07163

  30. Harris Z S. Distributional structure. Word, 1954, 10: 146–162

    Article  Google Scholar 

Download references

Acknowledgements

The work was supported by National Natural Science Foundation of China (Grant Nos. 61772071, 61690205, 61832009) and National Key R&D Program (Grant Nos. 2018YFB1003904).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Liu.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Li, G., Liu, H., Li, G. et al. LSTM-based argument recommendation for non-API methods. Sci. China Inf. Sci. 63, 190101 (2020). https://doi.org/10.1007/s11432-019-2830-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-019-2830-8

Keywords

  • argument recommendation
  • LSTM
  • deep learning
  • non-API