Advertisement

TBCNN for Programs’ Abstract Syntax Trees

  • Lili MouEmail author
  • Zhi Jin
Chapter
Part of the SpringerBriefs in Computer Science book series (BRIEFSCOMPUTER)

Abstract

In this chapter, we will apply the tree-based convolutional neural network (TBCNN) to the source code of programming languages, which we call programming language processing. In fact, programming language processing is a hot research topic in the field of software engineering; it has also aroused growing interest in the artificial intelligence community. A distinct characteristic of a program is that it contains rich, explicit, and complicated structural information, necessitating more intensive modeling of structures. In this chapter, we propose a TBCNN variant for programming language processing, where a convolution kernel is designed for programs’ abstract syntax trees. We show the effectiveness of TBCNN in two different program analysis tasks: classifying programs according to functionality, and detecting code snippets of certain patterns. TBCNN outperforms baseline methods, including several neural models for NLP.

Keywords

Tree-based convolution Representation learning Programming language processing Program analysis 

References

  1. 1.
    Baxter, I., Yahin, A., Moura, L., Sant’Anna, M., Bier, L.: Clone detection using abstract syntax trees. In: Proceedings of the International Conference on Software Maintenance, pp. 368–377 (1998)Google Scholar
  2. 2.
    Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)CrossRefGoogle Scholar
  3. 3.
    Bettenburg, N., Begel, A.: Deciphering the story of software development through frequent pattern mining. In: Proceedings of the 35th International Conference on Software Engineering, pp. 1197–1200 (2013)Google Scholar
  4. 4.
    Chilowicz, M., Duris, E., Roussel, G.: Syntax tree fingerprinting for source code similarity detection. In: Proceedings of the IEEE International Conference on Program Comprehension, pp. 243–247 (2009)Google Scholar
  5. 5.
    Collobert, R., Weston, J.: A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of 25th International Conference on Machine Learning, pp. 160–167 (2008)Google Scholar
  6. 6.
    Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)Google Scholar
  7. 7.
    Dahl, G., Mohamed, A., Hinton, G.: Phone recognition with the mean-covariance restricted Boltzmann machine. In: Advances in Neural Information Processing Systems, pp. 469–477 (2010)Google Scholar
  8. 8.
    Dietz, L., Dallmeier, V., Zeller, A., Scheffer, T.: Localizing bugs in program executions with graphical models. In: Advances in Neural Information Processing Systems, pp. 468–476 (2009)Google Scholar
  9. 9.
    Ghabi, A., Egyed, A.: Code patterns for automatically validating requirements-to-code traces. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, pp. 200–209 (2012)Google Scholar
  10. 10.
    Hao, D., Lan, T., Zhang, H., Guo, C., Zhang, L.: Is this a bug or an obsolete test? In: Proceedings of the European Conference on Object-Oriented Programming, pp. 602–628 (2013)CrossRefGoogle Scholar
  11. 11.
    Hermann, K.M., Blunsom, P.: Multilingual models for compositional distributed semantics. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 58–68 (2014)Google Scholar
  12. 12.
    Hindle, A., Barr, E.T., Su, Z., Gabel, M., Devanbu, P.: On the naturalness of software. In: Proceedings of the 34th International Conference on Software Engineering, pp. 837–847 (2012)Google Scholar
  13. 13.
    Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 655–665 (2014)Google Scholar
  14. 14.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  15. 15.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  16. 16.
    Mou, L., Li, G., Zhang, L., Wang, T., Jin, Z.: Convolutional neural networks over tree structures for programming language processing. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 1287–1293 (2016)Google Scholar
  17. 17.
    Mou, L., Peng, H., Li, G., Xu, Y., Zhang, L., Jin, Z.: Discriminative neural sentence modeling by tree-based convolution. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 2315–2325 (2015)Google Scholar
  18. 18.
    Pane, J., Ratanamahatana, C., Myers, B.: Studying the language and structure in non-programmers’ solutions to programming problems. Int. J. Hum. Comput. Stud. 54(2), 237–264 (2001)CrossRefGoogle Scholar
  19. 19.
    Peng, H., Mou, L., Li, G., Liu, Y., Zhang, L., Jin, Z.: Building program vector representations for deep learning. In: Proceedings of the 8th International Conference on Knowledge Science, Engineering and Management, pp. 547–553 (2015)CrossRefGoogle Scholar
  20. 20.
    Pinker, S.: The Language Instinct: The New Science of Language and Mind. Pengiun Press (1994)Google Scholar
  21. 21.
    Socher, R., Huang, E., Pennin, J., Manning, C., Ng, A.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: Advances in Neural Information Processing Systems, pp. 801–809 (2011)Google Scholar
  22. 22.
    Socher, R., Karpathy, A., Le, Q., Manning, C., Ng, A.Y.: Grounded compositional semantics for finding and describing images with sentences. Trans. Assoc. Comput. Linguist. 2, 207–218 (2014)Google Scholar
  23. 23.
    Socher, R., Pennington, J., Huang, E., Ng, A., Manning, C.: Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 151–161 (2011)Google Scholar
  24. 24.
    Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C., Ng, A., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)Google Scholar
  25. 25.
    Steidl, D., Gode, N.: Feature-based detection of bugs in clones. In: Proceedings of the 7th International Workshop on Software Clones, pp. 76–82 (2013)Google Scholar
  26. 26.
    Yamaguchi, F., Lottmann, M., Rieck, K.: Generalized vulnerability extrapolation using abstract syntax trees. In: Proceedings of 28th Annual Computer Security Applications Conference, pp. 359–368 (2012)Google Scholar

Copyright information

© The Author(s) 2018

Authors and Affiliations

  1. 1.AdeptMind ResearchTorontoCanada
  2. 2.Institute of SoftwarePeking UniversityBeijingChina

Personalised recommendations