Mining Patterns in Source Code Using Tree Mining Algorithms

Pham, Hoang Son; Nijssen, Siegfried; Mens, Kim; Di Nucci, Dario; Molderez, Tim; De Roover, Coen; Fabry, Johan; Zaytsev, Vadim

doi:10.1007/978-3-030-33778-0_35

Hoang Son Pham¹¹,
Siegfried Nijssen¹¹,
Kim Mens¹¹,
Dario Di Nucci¹²,
Tim Molderez¹²,
Coen De Roover¹²,
Johan Fabry¹³ &
…
Vadim Zaytsev¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11828))

Included in the following conference series:

International Conference on Discovery Science

1911 Accesses
3 Citations

Abstract

Discovering regularities in source code is of great interest to software engineers, both in academia and in industry, as regularities can provide useful information to help in a variety of tasks such as code comprehension, code refactoring, and fault localisation. However, traditional pattern mining algorithms often find too many patterns of little use and hence are not suitable for discovering useful regularities. In this paper we propose FREQTALS, a new algorithm for mining patterns in source code based on the FREQT tree mining algorithm. First, we introduce several constraints that effectively enable us to find more useful patterns; then, we show how to efficiently include them in FREQT. To illustrate the usefulness of the constraints we carried out a case study in collaboration with software engineers, where we identified a number of interesting patterns in a repository of Java code.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chi, Y., Muntz, R.R., Nijssen, S., Kok, J.N.: Frequent subtree mining-an overview. Fundamenta Informaticae 66(1–2), 161–198 (2005)
MathSciNet MATH Google Scholar
Jiménez, A., Berzal, F., Talavera, J.C.C.: Frequent tree pattern mining: a survey. Intell. Data Anal. 14(6), 603–622 (2010)
Article Google Scholar
Aggarwal, C.C., Han, J. (eds.): Frequent Pattern Mining. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07821-2
Book MATH Google Scholar
Allamanis, M., Sutton, C.: Mining idioms from source code. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 472–483. ACM (2014)
Google Scholar
Asai, T., Abe, K., Kawasoe, S., Sakamoto, H., Arimura, H., Arikawa, S.: Efficient substructure discovery from large semi-structured data. IEICE Trans. Inf. Syst. 87(12), 2754–2763 (2004)
MATH Google Scholar
Tempero, E., et al.: The qualitas corpus: a curated collection of java code for empirical studies. In: 2010 17th AsiaPacific Software Engineering Conference, pp. 336–345. IEEE (2010)
Google Scholar
Pasquier, C., Sanhes, J., Flouvat, F., Selmaoui-Folcher, N.: Frequent pattern mining in attributed trees: algorithms and applications. Knowl. Inf. Syst. 46(3), 491–514 (2016)
Article Google Scholar
Mens, K., Tourwé, T.: Delving source code with formal concept analysis. Comput. Lang. Syst. Struct. 31(3–4), 183–197 (2005)
Google Scholar
Lozano, A., Kellens, A., Mens, K., Arevalo, G.: Mining source code for structural regularities. In: Proceedings of the 2010 17th Working Conference on Reverse Engineering, pp. 22–31. IEEE Computer Society (2010)
Google Scholar
Bhatia, S., Singh, R.: Automated correction for syntax errors in programming assignments using recurrent neural networks. arXiv preprint arXiv:1603.06129 (2016)

Download references

Acknowledgments

This work was conducted in the context of an industry-university research project between UCLouvain, Vrije Universiteit Brussel and Raincode Labs, funded by the Belgian Innoviris TeamUp project INTiMALS (2017-TEAM-UP-7).

Author information

Authors and Affiliations

ICTEAM, UCLouvain, Louvain-la-Neuve, Belgium
Hoang Son Pham, Siegfried Nijssen & Kim Mens
Software Languages Lab, Vrije Universiteit Brussel, Brussels, Belgium
Dario Di Nucci, Tim Molderez & Coen De Roover
Raincode Labs, Brussels, Belgium
Johan Fabry & Vadim Zaytsev

Authors

Hoang Son Pham
View author publications
You can also search for this author in PubMed Google Scholar
Siegfried Nijssen
View author publications
You can also search for this author in PubMed Google Scholar
Kim Mens
View author publications
You can also search for this author in PubMed Google Scholar
Dario Di Nucci
View author publications
You can also search for this author in PubMed Google Scholar
Tim Molderez
View author publications
You can also search for this author in PubMed Google Scholar
Coen De Roover
View author publications
You can also search for this author in PubMed Google Scholar
Johan Fabry
View author publications
You can also search for this author in PubMed Google Scholar
Vadim Zaytsev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hoang Son Pham .

Editor information

Editors and Affiliations

Jožef Stefan Institute, Ljubljana, Slovenia
Petra Kralj Novak
Rudjer Bošković Institute, Zagreb, Croatia
Tomislav Šmuc
Jožef Stefan Institute, Ljubljana, Slovenia
Sašo Džeroski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pham, H.S. et al. (2019). Mining Patterns in Source Code Using Tree Mining Algorithms. In: Kralj Novak, P., Šmuc, T., Džeroski, S. (eds) Discovery Science. DS 2019. Lecture Notes in Computer Science(), vol 11828. Springer, Cham. https://doi.org/10.1007/978-3-030-33778-0_35

Download citation

DOI: https://doi.org/10.1007/978-3-030-33778-0_35
Published: 16 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33777-3
Online ISBN: 978-3-030-33778-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics