Advertisement

Mining Patterns in Source Code Using Tree Mining Algorithms

  • Hoang Son PhamEmail author
  • Siegfried Nijssen
  • Kim Mens
  • Dario Di Nucci
  • Tim Molderez
  • Coen De Roover
  • Johan Fabry
  • Vadim Zaytsev
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11828)

Abstract

Discovering regularities in source code is of great interest to software engineers, both in academia and in industry, as regularities can provide useful information to help in a variety of tasks such as code comprehension, code refactoring, and fault localisation. However, traditional pattern mining algorithms often find too many patterns of little use and hence are not suitable for discovering useful regularities. In this paper we propose FREQTALS, a new algorithm for mining patterns in source code based on the FREQT tree mining algorithm. First, we introduce several constraints that effectively enable us to find more useful patterns; then, we show how to efficiently include them in FREQT. To illustrate the usefulness of the constraints we carried out a case study in collaboration with software engineers, where we identified a number of interesting patterns in a repository of Java code.

Keywords

Pattern mining Frequent tree mining Source code regularities 

Notes

Acknowledgments

This work was conducted in the context of an industry-university research project between UCLouvain, Vrije Universiteit Brussel and Raincode Labs, funded by the Belgian Innoviris TeamUp project INTiMALS (2017-TEAM-UP-7).

References

  1. 1.
    Chi, Y., Muntz, R.R., Nijssen, S., Kok, J.N.: Frequent subtree mining-an overview. Fundamenta Informaticae 66(1–2), 161–198 (2005)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Jiménez, A., Berzal, F., Talavera, J.C.C.: Frequent tree pattern mining: a survey. Intell. Data Anal. 14(6), 603–622 (2010)CrossRefGoogle Scholar
  3. 3.
    Aggarwal, C.C., Han, J. (eds.): Frequent Pattern Mining. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-07821-2CrossRefzbMATHGoogle Scholar
  4. 4.
    Allamanis, M., Sutton, C.: Mining idioms from source code. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 472–483. ACM (2014)Google Scholar
  5. 5.
    Asai, T., Abe, K., Kawasoe, S., Sakamoto, H., Arimura, H., Arikawa, S.: Efficient substructure discovery from large semi-structured data. IEICE Trans. Inf. Syst. 87(12), 2754–2763 (2004)zbMATHGoogle Scholar
  6. 6.
    Tempero, E., et al.: The qualitas corpus: a curated collection of java code for empirical studies. In: 2010 17th AsiaPacific Software Engineering Conference, pp. 336–345. IEEE (2010)Google Scholar
  7. 7.
    Pasquier, C., Sanhes, J., Flouvat, F., Selmaoui-Folcher, N.: Frequent pattern mining in attributed trees: algorithms and applications. Knowl. Inf. Syst. 46(3), 491–514 (2016)CrossRefGoogle Scholar
  8. 8.
    Mens, K., Tourwé, T.: Delving source code with formal concept analysis. Comput. Lang. Syst. Struct. 31(3–4), 183–197 (2005)Google Scholar
  9. 9.
    Lozano, A., Kellens, A., Mens, K., Arevalo, G.: Mining source code for structural regularities. In: Proceedings of the 2010 17th Working Conference on Reverse Engineering, pp. 22–31. IEEE Computer Society (2010)Google Scholar
  10. 10.
    Bhatia, S., Singh, R.: Automated correction for syntax errors in programming assignments using recurrent neural networks. arXiv preprint arXiv:1603.06129 (2016)

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Hoang Son Pham
    • 1
    Email author
  • Siegfried Nijssen
    • 1
  • Kim Mens
    • 1
  • Dario Di Nucci
    • 2
  • Tim Molderez
    • 2
  • Coen De Roover
    • 2
  • Johan Fabry
    • 3
  • Vadim Zaytsev
    • 3
  1. 1.ICTEAM, UCLouvainLouvain-la-NeuveBelgium
  2. 2.Software Languages LabVrije Universiteit BrusselBrusselsBelgium
  3. 3.Raincode LabsBrusselsBelgium

Personalised recommendations