Skip to main content

Predicting the objective and priority of issue reports in software repositories


Software repositories such as GitHub host a large number of software entities. Developers collaboratively discuss, implement, use, and share these entities. Proper documentation plays an important role in successful software management and maintenance. Users exploit Issue Tracking Systems, a facility of software repositories, to keep track of issue reports, to manage the workload and processes, and finally, to document the highlight of their team’s effort. An issue report is a rich source of collaboratively-curated software knowledge, and can contain a reported problem, a request for new features, or merely a question about the software product. As the number of these issues increases, it becomes harder to manage them manually. GitHub provides labels for tagging issues, as a means of issue management. However, about half of the issues in GitHub’s top 1000 repositories do not have any labels. In this work, we aim at automating the process of managing issue reports for software teams. We propose a two-stage approach to predict both the objective behind opening an issue and its priority level using feature engineering methods and state-of-the-art text classifiers. To the best of our knowledge, we are the first to fine-tune a Transformer for issue classification. We train and evaluate our models in both project-based and cross-project settings. The latter approach provides a generic prediction model applicable for any unseen software project or projects with little historical data. Our proposed approach can successfully predict the objective and priority level of issue reports with \(82\%\) (fine-tuned RoBERTa) and \(75\%\) (Random Forest) accuracy, respectively. Moreover, we conducted human labeling and evaluation on unlabeled issues from six unseen GitHub projects to assess the performance of the cross-project model on new data. The model achieves \(90\%\) accuracy on the sample set. We measure inter-rater reliability and obtain an average Percent Agreement of \(85.3\%\) and Randolph’s free-marginal Kappa of 0.71 that translate to a substantial agreement among labelers.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11











  10. A complete list of these 66 clusters is available in our repository.



  • Aghamohammadi A, Izadi M, Heydarnoori A (2020) Generating summaries for methods of event-driven programs: an android case study. J Syst Softw 170:110,800

  • Al Shalabi L, Shaaban Z, Kasasbeh B (2006) Data mining: a preprocessing engine. J Comp Sci 2(9):735–739

    Article  Google Scholar 

  • Alenezi M, Banitaan S (2013) Bug reports prioritization: which features and classifier to use? In: 2013 12th international conference on machine learning and applications. IEEE, Miami, FL, USA, pp 112–116.

  • Alonso O, Marshall C, Najork M (2014) Crowdsourcing a subjective labeling task: a human-centered framework to ensure reliable results. Microsoft Res., Redmond, WA, USA, Tech. Rep. MSR-TR-2014–91

  • Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc YG (2008) Is it a bug or an enhancement? A text-based approach to classify change requests. In: Proceedings of the 2008 conference of the center for advanced studies on collaborative research meeting of minds - CASCON ’08. ACM Press, Ontario, Canada, pp. 304.

  • Baltes S, Diehl S (2019) Usage and attribution of stack overflow code snippets in github projects. Emp Softw Eng 24(3):1259–1295

    Article  Google Scholar 

  • Baltes S, Treude C, Diehl S (2019) Sotorrent: studying the origin, evolution, and usage of stack overflow code snippets. In: 2019 IEEE/ACM 16th international conference on mining software repositories (MSR). IEEE, pp 191–194

  • Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(1):281–305

    MathSciNet  MATH  Google Scholar 

  • Bissyandé TF, Lo D, Jiang L, Réveillere L, Klein J, Le Traon Y (2013) Got issues? Who cares about it? a large scale investigation of issue trackers from github. In: 2013 IEEE 24th international symposium on software reliability engineering (ISSRE). IEEE, pp 188–197

  • Brennan RL, Prediger DJ (1981) Coefficient kappa: some uses, misuses, and alternatives. Educ Psychol Measure 41(3):687–699

    Article  Google Scholar 

  • Cabot J, Izquierdo JLC, Cosentino V, Rolandi B (2015) Exploring the use of labels to categorize issues in open-source software projects. In: 2015 IEEE 22nd international conference on software analysis, evolution, and reengineering (SANER). IEEE, pp 550–554

  • Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  Google Scholar 

  • Chen AR, Chen THP, Wang S (2021) Demystifying the challenges and benefits of analyzing user-reported logs in bug reports. Emp Softw Eng 26(1):1–30

    Article  Google Scholar 

  • da Costa DA, McIntosh S, Treude C, Kulesza U, Hassan AE (2018) The impact of rapid release cycles on the integration delay of fixed issues. Emp Softw Eng 23(2):835–904

    Article  Google Scholar 

  • Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(3):131–156

  • Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

  • Dhasade AB, Venigalla ASM, Chimalakonda S (2020) Towards prioritizing github issues. In: Proceedings of the 13th innovations in software engineering conference on formerly known as India software engineering conference, pp 1–5

  • Di Sorbo A, Grano G, Aaron Visaggio C, Panichella S (2020) Investigating the criticality of user-reported issues through their relations with app rating. J Softw Evol Process, pp e2316

  • Fan Q, Yu Y, Yin G, Wang T, Wang H (2017) Where is the road for issue reports classification based on text mining? In: 2017 ACM/IEEE international symposium on Emp Softw Eng and measurement (ESEM). IEEE, pp 121–130

  • Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychological Bulletin 76(5):378

    Article  Google Scholar 

  • Fleiss JL, Cohen J (1973) The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ Psychol Measure 33(3):613–619

    Article  Google Scholar 

  • Gao C, Wang B, He P, Zhu J, Zhou Y, Lyu MR (2015) PAID: prioritizing app issues for developers by tracking user reviews over versions. In: 2015 IEEE 26th international symposium on software reliability engineering (ISSRE). IEEE, Gaithersbury, MD, USA, pp 35–45.

  • Gousios G, Zaidman A, Storey MA, Van Deursen A (2015) Work practices and challenges in pull-based development: the integrator’s perspective. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering. IEEE, vol 1, pp 358–368

  • Gwet KL (2008) Computing inter-rater reliability and its variance in the presence of high agreement. Brit J Mathemat Stat Psychol 61(1):29–48

    MathSciNet  Article  Google Scholar 

  • Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: 2013 35th international conference on software engineering (ICSE). IEEE, pp 392–401

  • Hu H, Wang S, Bezemer CP, Hassan AE (2019) Studying the consistency of star ratings and reviews of popular free hybrid android and ios apps. Emp Softw Eng 24(1):7–32

    Article  Google Scholar 

  • Huang Q, Xia X, Lo D, Murphy GC (2018) Automating intention mining. IIEEE Trans. Software Eng, pp 1–1.

  • Izadi M, Heydarnoori A, Gousios G (2021) Topic recommendation for software repositories using multi-label classification algorithms. Emp Softw Eng 26(5):1–33

    Google Scholar 

  • Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining github. In: Proceedings of the 11th working conference on mining software repositories, pp 92–101

  • Kallis R, Di Sorbo A, Canfora G, Panichella S (2019) Ticket tagger: machine learning driven issue classification. In: 2019 IEEE international conference on software maintenance and evolution (ICSME). IEEE, Cleveland, OH, USA, pp 406–409.

  • Kanwal J, Maqbool O (2012) Bug prioritization to facilitate bug report triage. J Comput Sci Technol. 27(2):397–412 (2012).

  • Khandkar SH (2009) University of Calgary. Open coding 23:2009

    Google Scholar 

  • Kikas R, Dumas M, Pfahl D (2016) Using dynamic and contextual features to predict issue lifetime in github projects. In: 2016 IEEE/ACM 13th working conference on mining software repositories (MSR). IEEE, pp 291–302

  • Kitchenham BA, Mendes E, Travassos GH (2007) Cross versus within-company cost estimation studies: a systematic review. IEEE Trans Softw Eng 33(5):316–329

    Article  Google Scholar 

  • Landis JR, Koch GG (1977) An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics, pp 363–374

  • Li C, Xu L, Yan M, He J, Zhang Z (2019) Tagdeeprec: tag recommendation for software information sites using attention-based bi-lstm. In: International conference on knowledge science, engineering and management. Springer, pp 11–24

  • Liao Z, He D, Chen Z, Fan X, Zhang Y, Liu S (2018) Exploring the characteristics of issue-related behaviors in github using visualization techniques. IEEE Access 6:24003–24015

    Article  Google Scholar 

  • Limsettho N, Hata H, Monden A, Matsumoto K (2016) Unsupervised bug report categorization using clustering and labeling algorithm. Int J Softw Eng Knowledge Eng 26(07):1027–1053

    Article  Google Scholar 

  • Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692

  • McHugh ML (2012) Interrater reliability: the kappa statistic. Biochemia medica 22(3):276–282

    MathSciNet  Article  Google Scholar 

  • Merten T, Falis M, Hübner P, Quirchmayr T, Bürsner S, Paech B (2016) Software feature request detection in issue tracking systems. In: 2016 IEEE 24th international requirements engineering conference (RE). IEEE, pp 166–175

  • Noei E, Zhang F, Wang S, Zou Y (2019) Towards prioritizing user-related issue reports of mobile applications. Empir Software Eng 24(4):1964–1996.

  • Noei E, Zhang F, Zou Y (2019) Too many user-reviews, what should app developers look at first? IIEEE Trans Software Eng, pp 1–1.

  • Pandey N, Sanyal DK, Hudait A, Sen A (2017) Automated classification of software issue reports using machine learning techniques: an empirical study. Innov Syst Softw Eng 13(4):279–297

    Article  Google Scholar 

  • Peters F, Menzies T, Marcus A (2013) Better cross company defect prediction. In: 2013 10th working conference on mining software repositories (MSR). IEEE, pp 409–418

  • Pingclasai N, Hata H, Matsumoto KI (2013) Classifying bug reports to bugs and other requests using topic modeling. In: 2013 20Th asia-pacific software engineering conference (APSEC). IEEE, vol 2, pp 13–18

  • Randolph JJ (2005) Free-marginal multirater kappa (multirater k [free]): An alternative to fleiss’ fixed-marginal multirater kappa. Online submission

  • Sharma M, Bedi P, Chaturvedi K, Singh V (2012) Predicting the priority of a reported bug using machine learning techniques and cross project validation. In: 2012 12th International Conference on Intelligent Systems Design and Applications (ISDA). IEEE, pp 539–545

  • Sohrawardi SJ, Azam I, Hosain S (2014) A comparative study of text classification algorithms on user submitted bug reports. In: Ninth International Conference on Digital Information Management (ICDIM 2014). IEEE, Phitsanulok, Thailand, pp 242–247.

  • Song Y, Chaparro O (2020) Bee: a tool for structuring and analyzing bug reports. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp 1551–1555

  • Svyatkovskiy A, Deng SK, Fu S, Sundaresan N (2020) Intellicode compose: code generation using transformer. arXiv preprint arXiv:2005.08025

  • Tavakoli M, Izadi M, Heydarnoori A (2020) Improving quality of a post’s set of answers in stack overflow. In: 46th Euromicro conference on software engineering and advanced applications, SEAA 2020, Portoroz, Slovenia, August 26-28, 2020. IEEE, pp 504–512.

  • Terdchanakul P, Hata H, Phannachitta P, Matsumoto K (2017) Bug or not? Bug report classification using n-gram idf. In: 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, pp 534–538

  • Tian Y, Lo D, Sun C (2013) DRONE: predicting priority of reported bugs by multi-factor analysis. In: 2013 IEEE International Conference on Software Maintenance. IEEE, Eindhoven, Netherlands, pp 200–209.

  • Uddin J, Ghazali R, Deris MM, Naseem R, Shah H (2017) A survey on bug prioritization. Artif Intell Rev 47(2):145–180.

  • Vasilescu B, Filkov V, Serebrenik A (2013) Stackoverflow and github: associations between software development and crowdsourced knowledge. In: 2013 international conference on social computing. IEEE, pp 188–195

  • Vasilescu B, Serebrenik A, Devanbu P, Filkov V (2014) How social q&a sites are changing knowledge sharing in open source software communities. In: Proceedings of the 17th ACM conference on computer supported cooperative work & social computing, pp 342–354

  • Vee EVD, Gousios G, Zaidman A (2015) Automatically prioritizing pull requests. In: 2015 IEEE/ACM 12th working conference on mining software repositories. IEEE, Florence, Italy, pp 357–361.

  • Wan Y, Zhao Z, Yang M, Xu G, Ying H, Wu J, Yu PS (2018) Improving automatic source code summarization via deep reinforcement learning. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering, pp 397–407

  • Wang S, Lo D, Vasilescu B, Serebrenik A (2018) Entagrec++: an enhanced tag recommendation system for software information sites. Emp Softw Eng 23(2):800–832

    Article  Google Scholar 

  • Weiss GM, Provost F (2001) The effect of class distribution on classifier learning: an empirical study

  • Wu Y, Wang S, Bezemer CP, Inoue K (2019) How do developers utilize source code from stack overflow? Emp Softw Eng 24(2):637–673

    Article  Google Scholar 

  • Yu Y, Wang H, Filkov V, Devanbu P, Vasilescu B (2015) Wait for it: determinants of pull request evaluation latency on github. In: 2015 IEEE/ACM 12th working conference on mining software repositories. IEEE, pp 367–371

  • Yu Y, Zeng Y, Fan Q, Wang H (2018) Transferring well-trained models for cross-project issue classification: a large-scale empirical study. In: Proceedings of the Tenth Asia-Pacific Symposium on Internetware, pp 1–6

  • Zeng Y, Chen J, Shang W, Chen THP (2019) Studying the characteristics of logging practices in mobile apps: a case study on f-droid. Emp Softw Eng 24(6):3394–3434

    Article  Google Scholar 

  • Zhang J, Wang X, Hao D, Xie B, Zhang L, Mei H (2015) A survey on bug-report analysis. Sci China Inform Sci 58(2):1–24

    Article  Google Scholar 

  • Zhou J, Wang S, Bezemer CP, Zou Y, Hassan AE (2020) Studying the association between bountysource bounties and the issue-addressing likelihood of github issue reports. IEEE Trans Softw Eng

  • Zhou Y, Tong Y, Gu R, Gall H (2016) Combining text mining and data mining for bug report classification. J Softw Evolut Process 28(3):150–176

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Maliheh Izadi.

Additional information

Dr. Abbas Heydarnoori is also a corresponding author for this work.

Communicated by: Shaowei Wang, Tse-Hsun (Peter) Chen, Sebastian Baltes, Ivano Malavolta, Christoph Treude, and Alexander Serebrenik

Appendix: Priority Labels

Appendix: Priority Labels

Table 8 presents the list of manually extracted labels from top GitHub repositories (most star) for categories of high and low priority issues.

Table 8 Selected labels for each category of issue priority

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Izadi, M., Akbari, K. & Heydarnoori, A. Predicting the objective and priority of issue reports in software repositories. Empir Software Eng 27, 50 (2022).

Download citation

  • Accepted:

  • Published:

  • DOI:


  • Software evolution and maintenance
  • Mining software repositories
  • Issue reports
  • Classification
  • Prioritization
  • Machine learning
  • Natural language processing