Abstract
Identifier names in readable and maintainable source codes are always descriptive. These names are given based on the implicit knowledge of experienced programmers. In this paper, we propose a structural pattern mining method based on support vector machines (SVM) for source codes. We extract 1,000 method names in object-oriented source codes collected from online software repositories and create 1,000 datasets labeled by positive and negative class. The structural features used for the input feature vectors to the SVM learning are designed for representing partial characteristics in the abstract syntax tree (AST) parsed from a source code. Applying this method, we made an F1 score list of the 1,000 method names, which shows the degree of patterning of each name, by using our structural features. From the list, we confirmed structural patterns were strongly associated with specific method names. A qualitative evaluation of method names was also conducted by mapping the structural feature vector of each program example to the two-dimensional plane in the same way as a previous major study. From the evaluation, we confirmed that the contrasting structure among the programs corresponds to the names given to programs. Furthermore, we show examples of visualization of structural patterns using structural features extracted by feature selection.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Adachi, Y., Onimura, N., Yamashita, T., Hirokawa, S.: Standard measure and SVM measure for feature selection and their performance effect for text classification. In: Proceedings of iiWAS, pp. 262–266 (2016)
Allamanis, M., Barr, E.T., Bird, C., Sutton, C.: Suggesting accurate method and class names. In: Proceedings of ESEC/FSE, pp. 38–49. ACM (2015)
Allamanis, M., Brockschmidt, M., Khademi, M.: Learning to represent programs with graphs. In: International Conference on Learning Representations (2018)
Allamanis, M., Peng, H., Sutton, C.: A convolutional attention network for extreme summarization of source code. In: International Conference on Machine Learning, pp. 2091–2100 (2016)
Alon, U., Zilberstein, M., Levy, O., Yahav, E.: A general path-based representation for predicting program properties. In: Proceedings of the 39th ACM SIGPLAN Conference on PLDI, pp. 404–419. ACM (2018)
Cortes, C., Vapnik, V.: Support vector networks. Mach. Learn. 20(3), 273–297 (1995)
Gvero, T., Kuncak, V.: Synthesizing java expressions from free-form queries. In: Proceedings OOPSLA, pp. 416–432 (2015)
Iyer, S., Konstas, I., Cheung, A., Zettlemoyer, L.: Summarizing source code using a neural attention model. In: Proceedings AMACL, pp. 2073–2083. ACL (2016)
Nguyen, T.T., Nguyen, A.T., Nguyen, H.A., Nguyen, T.N.: A statistical semantic language model for source code. In: Proceedings ESEC/FSE, pp. 532–542. ACM (2013)
Raychev, V., Vechev, M., Yahav, E.: Code completion with statistical language models. In: Proceedings of the 35th ACM SIGPLAN Conference on PLDI, pp. 419–428. ACM (2014)
Sakai, T., Hirokawa, S.: Feature words that classify problem sentence in scientific article. In: Proceedings iiWAS, pp. 360–367 (2012)
Yamashita, H., Takeuchi, K., Hashimoto, K.: Word usage in programming codes for software repository mining. In: Proceedings ACIS, pp. 351–357 (2014)
Yamashita, H., Takeuchi, K., Hashimoto, K.: Resolving functional ambiguities in labeled graph representation of programs: an application of dictionary construction based on software repository mining. In: Proceedings KICSS, pp. 536–545 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Mashima, Y., Hirokawa, S., Takeuchi, K. (2019). Ties Between Mined Structural Patterns in Program and Their Identifier Names. In: Seki, H., Nguyen, C., Huynh, VN., Inuiguchi, M. (eds) Integrated Uncertainty in Knowledge Modelling and Decision Making. IUKM 2019. Lecture Notes in Computer Science(), vol 11471. Springer, Cham. https://doi.org/10.1007/978-3-030-14815-7_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-14815-7_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14814-0
Online ISBN: 978-3-030-14815-7
eBook Packages: Computer ScienceComputer Science (R0)