Skip to main content

Ties Between Mined Structural Patterns in Program and Their Identifier Names

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11471))

Abstract

Identifier names in readable and maintainable source codes are always descriptive. These names are given based on the implicit knowledge of experienced programmers. In this paper, we propose a structural pattern mining method based on support vector machines (SVM) for source codes. We extract 1,000 method names in object-oriented source codes collected from online software repositories and create 1,000 datasets labeled by positive and negative class. The structural features used for the input feature vectors to the SVM learning are designed for representing partial characteristics in the abstract syntax tree (AST) parsed from a source code. Applying this method, we made an F1 score list of the 1,000 method names, which shows the degree of patterning of each name, by using our structural features. From the list, we confirmed structural patterns were strongly associated with specific method names. A qualitative evaluation of method names was also conducted by mapping the structural feature vector of each program example to the two-dimensional plane in the same way as a previous major study. From the evaluation, we confirmed that the contrasting structure among the programs corresponds to the names given to programs. Furthermore, we show examples of visualization of structural patterns using structural features extracted by feature selection.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Adachi, Y., Onimura, N., Yamashita, T., Hirokawa, S.: Standard measure and SVM measure for feature selection and their performance effect for text classification. In: Proceedings of iiWAS, pp. 262–266 (2016)

    Google Scholar 

  2. Allamanis, M., Barr, E.T., Bird, C., Sutton, C.: Suggesting accurate method and class names. In: Proceedings of ESEC/FSE, pp. 38–49. ACM (2015)

    Google Scholar 

  3. Allamanis, M., Brockschmidt, M., Khademi, M.: Learning to represent programs with graphs. In: International Conference on Learning Representations (2018)

    Google Scholar 

  4. Allamanis, M., Peng, H., Sutton, C.: A convolutional attention network for extreme summarization of source code. In: International Conference on Machine Learning, pp. 2091–2100 (2016)

    Google Scholar 

  5. Alon, U., Zilberstein, M., Levy, O., Yahav, E.: A general path-based representation for predicting program properties. In: Proceedings of the 39th ACM SIGPLAN Conference on PLDI, pp. 404–419. ACM (2018)

    Google Scholar 

  6. Cortes, C., Vapnik, V.: Support vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  7. Gvero, T., Kuncak, V.: Synthesizing java expressions from free-form queries. In: Proceedings OOPSLA, pp. 416–432 (2015)

    Google Scholar 

  8. Iyer, S., Konstas, I., Cheung, A., Zettlemoyer, L.: Summarizing source code using a neural attention model. In: Proceedings AMACL, pp. 2073–2083. ACL (2016)

    Google Scholar 

  9. Nguyen, T.T., Nguyen, A.T., Nguyen, H.A., Nguyen, T.N.: A statistical semantic language model for source code. In: Proceedings ESEC/FSE, pp. 532–542. ACM (2013)

    Google Scholar 

  10. Raychev, V., Vechev, M., Yahav, E.: Code completion with statistical language models. In: Proceedings of the 35th ACM SIGPLAN Conference on PLDI, pp. 419–428. ACM (2014)

    Google Scholar 

  11. Sakai, T., Hirokawa, S.: Feature words that classify problem sentence in scientific article. In: Proceedings iiWAS, pp. 360–367 (2012)

    Google Scholar 

  12. Yamashita, H., Takeuchi, K., Hashimoto, K.: Word usage in programming codes for software repository mining. In: Proceedings ACIS, pp. 351–357 (2014)

    Google Scholar 

  13. Yamashita, H., Takeuchi, K., Hashimoto, K.: Resolving functional ambiguities in labeled graph representation of programs: an application of dictionary construction based on software repository mining. In: Proceedings KICSS, pp. 536–545 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yoshiki Mashima .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mashima, Y., Hirokawa, S., Takeuchi, K. (2019). Ties Between Mined Structural Patterns in Program and Their Identifier Names. In: Seki, H., Nguyen, C., Huynh, VN., Inuiguchi, M. (eds) Integrated Uncertainty in Knowledge Modelling and Decision Making. IUKM 2019. Lecture Notes in Computer Science(), vol 11471. Springer, Cham. https://doi.org/10.1007/978-3-030-14815-7_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-14815-7_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-14814-0

  • Online ISBN: 978-3-030-14815-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics