Tag that issue: applying API-domain labels in issue tracking systems

Santos, Fabio; Vargovich, Joseph; Trinkenreich, Bianca; Santos, Italo; Penney, Jacob; Britto, Ricardo; Pimentel, João Felipe; Wiese, Igor; Steinmacher, Igor; Sarma, Anita; Gerosa, Marco A.

doi:10.1007/s10664-023-10329-4

Tag that issue: applying API-domain labels in issue tracking systems

Published: 31 August 2023

Volume 28, article number 116, (2023)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Fabio Santos ORCID: orcid.org/0000-0001-8069-3158¹,
Joseph Vargovich¹,
Bianca Trinkenreich¹,
Italo Santos¹,
Jacob Penney¹,
Ricardo Britto⁴,
João Felipe Pimentel¹,
Igor Wiese²,
Igor Steinmacher¹,
Anita Sarma³ &
…
Marco A. Gerosa¹

160 Accesses
5 Altmetric
Explore all metrics

Abstract

Labeling issues with the skills required to complete them can help contributors to choose tasks in Open Source Software projects. However, manually labeling issues is time-consuming and error-prone, and current automated approaches are mostly limited to classifying issues as bugs/non-bugs. We investigate the feasibility and relevance of automatically labeling issues with what we call “API-domains,” which are high-level categories of APIs. Therefore, we posit that the APIs used in the source code affected by an issue can be a proxy for the type of skills (e.g., DB, security, UI) needed to work on the issue. We ran a user study (n=74) to assess API-domain labels’ relevancy to potential contributors, leveraged the issues’ descriptions and the project history to build prediction models, and validated the predictions with contributors (n=20) of the projects. Our results show that (i) newcomers to the project consider API-domain labels useful in choosing tasks, (ii) labels can be predicted with a precision of 84% and a recall of 78.6% on average, (iii) the results of the predictions reached up to 71.3% in precision and 52.5% in recall when training with a project and testing in another (transfer learning), and (iv) project contributors consider most of the predictions helpful in identifying needed skills. These findings suggest our approach can be applied in practice to automatically label issues, assisting developers in finding tasks that better match their skills.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Identifying self-admitted technical debt in issue tracking systems using machine learning

Article Open access 10 July 2022

Can Source Code Analysis Indicate Programming Skills? A Survey with Developers

Predicting the objective and priority of issue reports in software repositories

Article 01 February 2022

Data Availibility Statement

The datasets generated during and/or analyzed during the current study are available in the zenodo repository.

Notes

http://bit.ly/NewToOSS
In this study, the words “tasks” and “issues” are used interchangeably.
https://wiki.documentfoundation.org/Development/EasyHacks
https://community.kde.org/KDE/Junior_Jobs
https://wiki.mozilla.org/Good_first_bug
https://doi.org/10.5281/zenodo.6869246
https://doi.org/10.5281/zenodo.6869246
http://bit.ly/NewToOSS
https://doi.org/10.5281/zenodo.6869246

References

Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. in Proceedings of the 1993 ACM SIGMOD international conference on management of data, pp 207–216
Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc Y-G (2008) Is it a bug or an enhancement? a text-based approach to classify change requests. in Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds, pp 304–318
API definition (2022) Available: https://languages.oup.com/google-dictionary-en/
Balali S, Steinmacher I, Annamalai U, Sarma A, Gerosa MA (2018) Newcomer’ barriers. . . is that all? an analysis of mentors’ and newcomers’ barriers in OSS projects. Comput Supported Coop Work 27(3-6):679–714
Barcomb A, Stol K, Fitzgerald B, Riehle D (2020) Managing episodic volunteers in free/libre/open source software communities. IEEE Trans Softw Eng:1–1
Behl D, Handa S, Arora A (2014) A bug mining tool to identify and analyze security bugs using naive bayes and tf-idf. in 2014 International Conference on Reliability Optimization and Information Technology (ICROIT). IEEE, pp 294–299
Bettenburg N, Just S, Schröter A, WeißC, Premraj R, Zimmermann T (2007) Quality of bug reports in eclipse. in Proceedings of the 2007 OOPSLA workshop on eclipse technology exchange, ser. eclipse ’07. New York, NY, USA: ACM, pp 21–25
Blanco A, Casillas A, Pérez A, de Ilarraza AD (2019) Multi-label clinical document classification: impact of label-density. Expert Syst Appl 138:112835
Article Google Scholar
Charte F, Rivera AJ, del Jesus MJ, Herrera F (2015) Mlsmote: approaching imbalanced multilabel learning through synthetic instance generation. Knowl-Based Syst 89:385–397
Article Google Scholar
Devlin J, Chang M-W, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. in NAACL,
Ducasse S, Pollet D (2009) Software architecture reconstruction: a process-oriented taxonomy. IEEE Trans Softw Eng 35(4):573–591
Article Google Scholar
El Zanaty F, Rezk C, Lijbrink S, van Bergen W, Côté M, McIntosh S (2020) Automatic recovery of missing issue type labels. IEEE Softw
Fast bert repository (2021) Available: https://github.com/utterworks/fast-bert
Feng Y, Jones J, Chen Z, Fang C (2018) An empirical study on software failure classification with multi-label and problem-transformation techniques. in 2018 IEEE 11th International Conference on Software Testing, verification and validation (ICST). IEEE, pp 320–330
Ferreira Moreno M, Sousa Dos Santos WH, Costa Mesquita Santos R, Fontoura De Gusmao Cerqueira R (2018) Supporting knowledge creation through has: the hyperknowledge annotation system. in 2018 IEEE International Symposium on Multimedia (ISM), 239–246
Flach PA, Kull M (2015) Precision-recall-gain curves: Pr analysis done right. in NIPS 15
Goel E, Abhilasha E, Goel E, Abhilasha E (2017) Random forest: a review. Int J Adv Res Comput Sc Softw Eng 7(1)
Guggulothu T, Moiz SA (2020) Code smell detection using multi-label classification approach. Softw Quality J 28(3):1063–1086
Article Google Scholar
Herrera F, Charte F, Rivera AJ, del Jesus MJ (2016) Multilabel classification: problem analysis, metrics and techniques, 1st ed. Springer publishing company, incorporated
Huang Y, Wang J, Wang S, Liu Z, Wang D, Wang Q (2021) Characterizing and predicting good first issues. in Proceedings of the 15th ACM/IEEE international symposium on Empirical Software Engineering and Measurement (ESEM), pp 1–12
Izadi M, Ganji S, Heydarnoori (2021) Topic recommendation for software repositories using multi-label classification algorithms. Empir Softw Eng 26:93
Izadi M, Heydarnoori A, Gousios G (2021) Topic recommendation for software repositories using multi-label classification algorithms. Empirical Softw Eng 26:09
Article Google Scholar
Izadi M, Akbari K, Heydarnoori A (2022) Predicting the objective and priority of issue reports in software repositories. Empirical Softw Eng 27(2):1–37
Article Google Scholar
Kallis R, Di Sorbo A, Canfora G, Panichella S (2019) Ticket tagger: machine learning driven issue classification. in 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, pp 406–409
Klock ACT, Gasparini I, Pimenta MS (2016) 5W2H framework: a guide to design, develop and evaluate the user-centered gamification. in Proceedings of the 15th Brazilian symposium on human factors in computing systems, pp 1–10
Lin B, Zampetti F, Bavota G, Di Penta M, Lanza M (2019) Pattern-based mining of opinions in q &a websites. in 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, pp 548–559
Lu Y, Li G, Zhao Z, Wen L, Jin Z (2017) Learning to infer API mappings from API documents. in International conference on knowledge science, engineering and management. Springer, pp 237–248
Nam J, Pan SJ, Kim S (2013) Transfer defect learning. in 2013 35th international conference on software engineering (ICSE). IEEE, pp 382–391
Ohno T (1982) How the toyota production system was created. Japanese Econ Studies 10(4):83–101
Article Google Scholar
Pacaiova H (2015) Analysis and identification of nonconforming products by 5W2H method. Center for Quality
Park Y, Jensen C (2009) Beyond pretty pictures: examining the benefits of code visualization for open source newcomers. in Proceedings of the 5th IEEE international workshop on visualizing software for understanding and analysis, ser. VISSOFT ’09. IEEE, pp 3–10
Pereira RB, Plastino A, Zadrozny B, Merschmann LH (2018) Correlation analysis of performance measures for multi-label classification. Inf Process Manag 54(3):359–369
Article Google Scholar
Perez Q, Jean P-A, Urtado C, Vauttier S (2021) Bug or not bug? that is the question. in 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC). IEEE, pp 47–58
Petkovic D, Sosnick-Pérez M, Okada K, Todtenhoefer R, Huang S, Miglani N, Vigil A (2016) Using the random forest classifier to assess and predict student learning of software engineering teamwork. in 2016 IEEE Frontiers in Education Conference (FIE). IEEE, pp 1–7
Pingclasai N, Hata H, Matsumoto K-I (2013) Classifying bug reports to bugs and other requests using topic modeling. in 2013 20Th asia-pacific software engineering conference (APSEC), vol 2. IEEE, pp 13–18
Pinto GHL, Figueira Filho F, Steinmacher I, Gerosa MA (2017) Training software engineers using open-source software: the professors’ perspective. in 2017 IEEE 30th Conference on Software Engineering Education and Training (CSEE &T). IEEE, pp 117–121
Pinto G, Steinmacher I, Gerosa MA (2016) More common than you think: an in-depth study of casual contributors. in IEEE 23rd international conference on software analysis, evolution, and reengineering, SANER 2016, Suita, Osaka, Japan, 14-18 March 2016 - vol 1, pp 112–123
Pushphavathi T, Suma V, Ramaswamy V (2014) A novel method for software defect prediction: hybrid of fcm and random forest. in 2014 International Conference on Electronics and Communication Systems (ICECS). IEEE, pp 1–5
Qiu D, Li B, Leung H (2016) Understanding the API usage in Java. Inf Softw Technol 73:81–100
Article Google Scholar
Ramos J, et al (2003) Using TF-IDF to determine word relevance in document queries. in Proceedings of the first instructional conference on machine learning, vol 242. Piscataway, NJ, pp 133–142
Ravichandiran S (2021) Getting started with google BERT: build and train state-of-the-art natural language processing models using BERT. Packt Publishing Ltd
Romano J, Kromrey J, Coraggio J, Skowronek J (2006) Appropriate statistics for ordinal level data: Should we really be using t-test and Cohen’sd for evaluating group differences on the NSSE and other surveys? in Annual meeting of the florida association of institutional research, pp 1–3
Saito T, Rehmsmeier M (2015) The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PloS One 10(3):e0118432
Article Google Scholar
Santos F, Trinkenreich B, Nicolati Pimentel JF, Wiese I, Steinmacher I, Sarma A, Gerosa M (2022) How to choose a task? mismatches in perspectives of newcomers and existing contributors. Empirical Softw Eng Meas
Santos I, Wiese I, Steinmacher I, Sarma A, Gerosa MA (2022) Hits and misses: newcomers’ ability to identify skills needed for OSS tasks. in 2022 IEEE international conference on software analysis, evolution and reengineering (SANER), pp 174–183
Santos F, Wiese I, Trinkenreich B, Steinmacher I, Sarma A, Gerosa MA (2021) Can i solve it? identifying apis required to complete OSS tasks. in 2021 IEEE/ACM 18th international conference on Mining Software Repositories (MSR). IEEE, pp 346–257
Sarma A, Gerosa MA, Steinmacher I, Leano R (2016) Training the future workforce through task curation in an OSS ecosystem. in Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering, pp 932–935
Satapathy SM, Acharya BP, Rath SK (2016) Early stage software effort estimation using random forest technique based on use case points. IET Softw 10(1):10–17
Article Google Scholar
Savidis A, Savaki C (2021) Software architecture mining from source code with dependency graph clustering and visualization. in IVAPP, 12
Seah C-W, Tsang IW, Ong Y-S (2013) Transfer ordinal label learning. IEEE Trans Neural Netw Learn Syst 24(11):1863–1876
Article Google Scholar
Sheskin D (2020) Handbook of parametric and nonparametric statistical procedures, 5th edn. Chapman & Hall
Book MATH Google Scholar
Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45(4):427–437
Article Google Scholar
spacy industrial-strength natural language processing (2021) https://spacy.io/. Accessed 04 Oct 2021
Stanik C, Montgomery L, Martens D, Fucci D, Maalej W (2018) A simple nlp-based approach to support onboarding and retention in open source communities. in 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, pp 172–182
Steinmacher I, Silva MAG, Gerosa MA, Redmiles DF (2015) A systematic literature review on the barriers faced by newcomers to open source software projects. Inf Softw Technol 59:67–85
Article Google Scholar
Steinmacher I, Conte TU, Gerosa MA (2015) Understanding and supporting the choice of an appropriate task to start with in open source software communities. in 2015 48th Hawaii international conference on system sciences. IEEE, pp 5299–5308
Steinmacher I, Conte T, Gerosa MA, Redmiles D (2015) Social barriers faced by newcomers placing their first contribution in open source software projects. in Proceedings of the 18th ACM conference on computer supported cooperative work & social computing, ser. CSCW’15. New York, NY, USA: Association for computing machinery, pp 1379–1392
Steinmacher I, Treude C, Gerosa MA (2018) Let me in: guidelines for the successful onboarding of newcomers to open source projects. IEEE Softw, vol 36(4):41–49
Strauss A, Corbin J (1998) Basics of qualitative research techniques. Sage publications, Thousand oaks, CA
Google Scholar
Szumilas M (2010) Explaining odds ratios. J Canadian Acad Child Adolescent Psych 19(3):227
Google Scholar
Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2019) The impact of automated parameter optimization on defect prediction models. IEEE Trans Softw Eng 45(7):683–711
Article Google Scholar
Transformers documentation (2021). Available: https://huggingface.co/docs/transformers/index
Tsoumakas G, Katakis I, Vlahavas I (2009) Mining multi-label data. Data Mining Knowl Discover Handbook:667–685
Uddin G, Khomh F (2019) Automatic mining of opinions expressed about apis in stack overflow. IEEE Trans Softw Eng:1–1
Vadlamani SL, Baysal O (2020) Studying software developer expertise and contributions in stack overflow and GitHub. in 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, pp 312–323
Van Gompel M, Van Den Bosch A (2016) Efficient n-gram, skipgram and flexgram modelling with colibri core. J Open Res Softw 4(1)
Vargas-Baldrich S, Linares-Vásquez M, Poshyvanyk D (2015) Automated tagging of software projects using bytecode and dependencies. in 2015 30th IEEE/ACM international conference on Automated Software Engineering (ASE). IEEE, pp 289–294
Vaz L, Steinmacher I, Marczak S (2019) An empirical study on task documentation in software crowdsourcing on topcoder. in 2019 ACM/IEEE 14th International Conference on Global Software Engineering (ICGSE). IEEE, pp 48–57
Wang J, Sarma A (2011) Which bug should i fix: helping new developers onboard a new project. In Proceedings of the 4th international workshop on cooperative and human aspects of software engineering, ACM, pp 76–79
Wang J, Zhang X, Chen L (2021) How well do pre-trained contextual language representations recommend labels for GitHub issues?. Knowl-Based Syst 232:107476. Available: https://www.sciencedirect.com/science/article/pii/S0950705121007383
Wiese IS, Ré R, Steinmacher I, Kuroda RT, Oliva GA, Treude C, Gerosa MA (2017) Using contextual information to predict co-changes. J Syst Softw 128:220–235
Article Google Scholar
Xia X, Lo D, Wang X, Zhou B (2013) Tag recommendation in software information sites. in 2013 10th Working conference on mining software repositories (MSR). IEEE, pp 287–296
You Y, Li J, Reddi S, Hseu J, Kumar S, Bhojanapalli S, Song X, Demmel J, Keutzer K, Hsieh C-J (2020) Large batch optimization for deep learning: training bert in 76 minutes. in International conference on learning representations. Available: https://openreview.net/forum?id=Syx4wnEtvH
Zhang M-L, Zhou Z-H (2007) Ml-knn: a lazy learning approach to multi-label learning. Pattern Recogni 40(7):2038–2048
Article MATH Google Scholar
Zhou Y, Tong Y, Gu R, Gall H (2016) Combining text mining and data mining for bug report classification. J Softw Evol Process 28(3):150–176
Article Google Scholar
Zhu Y, Pan M, Pei Y, Zhang T (2019) A bug or a suggestion? an automatic way to label issues. arXiv:1909.00934
Zimmermann T, Premraj R, Bettenburg N, Just S, Schroter A, Weiss C (2010) What makes a good bug report IEEE Trans Softw Eng, vol 36(5), pp 618–643

Download references

Acknowledgements

This work is partially supported by the National Science Foundation under Grant numbers 1815486, 1815503, 1900903, and 1901031, CNPq grant #313067/2020-1. CNPq/MCTI/FNDCT grant #408812/2021-4 and MCTIC/CGI/FAPESP (grant #2021/06662-1). We also thank the developers who spent their time answering our questionnaire.

Author information

Authors and Affiliations

Northern Arizona Unversity, Arizona, Flagstaff, USA
Fabio Santos, Joseph Vargovich, Bianca Trinkenreich, Italo Santos, Jacob Penney, João Felipe Pimentel, Igor Steinmacher & Marco A. Gerosa
Universidade Tecnológica Federal do Paraná, Curitiba, Paraná, Brazil
Igor Wiese
Oregon State University, Corvallis, Oregon, USA
Anita Sarma
Ericsson - Blekinge Institute of Technology, Karlskrona, Götaland, Sweden
Ricardo Britto

Authors

Fabio Santos
View author publications
You can also search for this author in PubMed Google Scholar
Joseph Vargovich
View author publications
You can also search for this author in PubMed Google Scholar
Bianca Trinkenreich
View author publications
You can also search for this author in PubMed Google Scholar
Italo Santos
View author publications
You can also search for this author in PubMed Google Scholar
Jacob Penney
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Britto
View author publications
You can also search for this author in PubMed Google Scholar
João Felipe Pimentel
View author publications
You can also search for this author in PubMed Google Scholar
Igor Wiese
View author publications
You can also search for this author in PubMed Google Scholar
Igor Steinmacher
View author publications
You can also search for this author in PubMed Google Scholar
Anita Sarma
View author publications
You can also search for this author in PubMed Google Scholar
Marco A. Gerosa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fabio Santos.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by: Gabriele Bavota.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

Additional data from RQ2 results. Some data were presented with box plots in Section 5.2. The redundant data (and more detailed) about the experiments are available here in tables.

Table 17 Overall performance from models created to evaluate the corpus

Full size table

Table 18 Overall performance from models created to evaluate the number of grams

Full size table

Table 19 Overall performance from models created to evaluate the algorithms

Full size table

Table 20 Overall performance from models created using the dataset with all projects merged to evaluate the algorithms

Full size table

We also include the confusion matrix for all projects trained and tested alone (Tables 21, 22, 23, 24, 25 and 26). The confusion matrix for the RTTS project is in Table 14 on Section 7.

Table 21 Overall performance from the selected model - JabRef project

Full size table

Table 22 Overall performance from the selected model - Powertoys project

Full size table

Table 23 Overall performance from the selected model - Audacity project

Full size table

Table 24 Overall performance from the selected model - Cronos/MTT project

Full size table

Table 25 Confusion matrix and performance: Project JabRef - transfer learning

Full size table

Table 26 Confusion matrix and performance: Project Audacity - transfer learning

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Santos, F., Vargovich, J., Trinkenreich, B. et al. Tag that issue: applying API-domain labels in issue tracking systems. Empir Software Eng 28, 116 (2023). https://doi.org/10.1007/s10664-023-10329-4

Download citation

Accepted: 03 April 2023
Published: 31 August 2023
DOI: https://doi.org/10.1007/s10664-023-10329-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tag that issue: applying API-domain labels in issue tracking systems

Abstract

Access this article

Similar content being viewed by others

Identifying self-admitted technical debt in issue tracking systems using machine learning

Can Source Code Analysis Indicate Programming Skills? A Survey with Developers

Predicting the objective and priority of issue reports in software repositories

Data Availibility Statement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Appendix A

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Tag that issue: applying API-domain labels in issue tracking systems

Abstract

Access this article

Similar content being viewed by others

Identifying self-admitted technical debt in issue tracking systems using machine learning

Can Source Code Analysis Indicate Programming Skills? A Survey with Developers

Predicting the objective and priority of issue reports in software repositories

Data Availibility Statement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Appendix A

Appendix A

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation