Using knowledge units of programming languages to recommend reviewers for pull requests: an empirical study

Ahasanuzzaman, Md; Oliva, Gustavo A.; Hassan, Ahmed E.

doi:10.1007/s10664-023-10421-9

Using knowledge units of programming languages to recommend reviewers for pull requests: an empirical study

Published: 29 December 2023

Volume 29, article number 33, (2024)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

228 Accesses
1 Citation
Explore all metrics

Abstract

Determining the right code reviewer for a given code change requires understanding the characteristics of the changed code, identifying the skills of each potential reviewer (expertise profile), and finding a good match between the two. To facilitate this task, we design a code reviewer recommender that operates on the knowledge units (KUs) of a programming language. We define a KU as a cohesive set of key capabilities that are offered by one or more building blocks of a given programming language. We operationalize our KUs using certification exams for the Java programming language. We detect KUs from 10 actively maintained Java projects from GitHub, spanning 290K commits and 65K pull requests (PRs). We generate developer expertise profiles based on the detected KUs. We use these KU-based expertise profiles to build a code reviewer recommender (KUREC). We compare KUREC’s performance to that of seven baseline recommenders. KUREC ranked first along with the top-performing baseline recommender (RF) in a Scott-Knott ESD analysis of recommendation accuracy (the top-5 accuracy of KUREC is 0.84 (median) and the MAP@5 is 0.51 (median)). From a practical standpoint, we highlight that KUREC’s performance is more stable (lower interquartile range) than that of RF, thus making it more consistent and potentially more trustworthy. We also design three new recommenders by combining KUREC with our baseline recommenders. These new combined recommenders outperform both KUREC and the individual baselines. Finally, we evaluate how reasonable the recommendations from KUREC and the combined recommenders are when those deviate from the ground truth. We observe that KUREC is the recommender with the highest percentage of reasonable recommendations (63.4%). Overall we conclude that KUREC and one of the combined recommenders (e.g., AD_HYBRID) are overall superior to the baseline recommenders that we studied. Future work in the area should thus (i) consider KU-based recommenders as baselines and (ii) experiment with combined recommenders.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Profile based recommendation of code reviewers

Article Open access 15 August 2017

Multi-objective code reviewer recommendations: balancing expertise, availability and collaborations

Article 05 September 2020

RevRec: A two-layer reviewer recommendation algorithm in pull-based development model

Article 18 May 2018

Data Availability Statement (DAS)

A supplementary material package is provided online in the following link: http://www.bit.ly/3bhSFux. The contents will be made available on a public GitHub repository once the paper is accepted.

Notes

http://www.bit.ly/3bhSFux. The contents will be made available on a public GitHub repository once the paper is accepted.
The grammar employed by JDT can be seen at https://github.com/eclipse/eclipse.jdt.core/blob/master/org.eclipse.jdt.core/grammar/java.g
https://docs.github.com/en/rest/reference
https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/prcomp
https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/kmeans

References

Adomavicius G, Zhang J (2014) Improving stability of recommender systems: a meta-algorithmic approach. IEEE Trans Knowl Data Eng 27(6):1573–1587
Article Google Scholar
Al-Subaihin AA, Sarro F, Black S, Capra L, Harman M, Jia Y, Zhang Y (2016) Clustering Mobile Apps Based on Mined Textual Features. In: Proceedings of the 10th ACM/IEEE international symposium on empirical software engineering and measurement, ESEM’16
Al-Zubaidi WHA, Thongtanunam P, Dam HK, Tantithamthavorn C, Ghose A (2020) Workload-aware reviewer recommendation using a multi-objective search-based approach. In: Proceedings of the 16th ACM international conference on predictive models and data analytics in software engineering, pp 21–30
Anagnostopoulos A, Kumar R, Mahdian M (2008) Influence and correlation in social networks. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 7–15
Anvik J, Murphy GC (2007) Determining Implementation Expertise from Bug Reports. In: Proceedings of the 4th international workshop on mining software repositories, pp 1–8
Asthana S, Kumar R, Bhagwan R, Bird C, Bansal C, Maddila C, Mehta S, Ashok B (2019) Whodo: automating reviewer suggestions at scale. In: Proceedings of the 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 937–945
Avelino G, Passos L, Hora A, Valente MT (2016) A novel approach for estimating truck factors. In: Proceedings of the 24th international conference on program comprehension (ICPC), IEEE, pp 1–1
Bacchelli A, Bird C (2013) Expectations, outcomes, and challenges of modern code review. In: Proceedings of the 2013 international conference on software engineering, IEEE Press, ICSE’13, p 712–721
Balachandran V (2013) Reducing human effort and improving quality in peer code reviews using automatic static analysis and reviewer recommendation. In: Proceedings of the 35th international conference on software engineering (ICSE), pp 931–940
Bishnu PS, Bhattacherjee V (2012) Software Fault Prediction Using Quad Tree-Based K-Means Clustering Algorithm. IEEE Trans Knowl Data Eng 24(6):1146–1150
Article Google Scholar
Bishop M, Burley D, Buck S, Ekstrom JJ, Futcher L, Gibson D, Hawthorne EK, Kaza S, Levy Y, Mattord H et al. (2017) Cybersecurity curricular guidelines. In: IFIP world conference on information security education, pp 3–13
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022
Campos PG, Díez F, Sánchez-Montañés M (2011) Towards a more realistic evaluation: testing the ability to predict future tastes of matrix factorization-based recommenders. In: Proceedings of the 5th ACM conference on recommender systems, pp 309–312
Chouchen M, Ouni A, Mkaouer MW, Kula RG, Inoue K (2021) Whoreview: A multi-objective search-based approach for code reviewers recommendation in modern code review. Appl Soft Comput 100:106908
Article Google Scholar
Cogo FR, Xia X, Hassan AE (2022) Assessing the alignment between the information needs of developers and the documentation of programming languages: A case study on rust. arXiv:2202.04431
de Lima Júnior ML, Soares DM, Plastino A, Murta L (2015) Developers assignment for analyzing pull requests. In: Proceedings of the 30th annual ACM symposium on applied computing, pp 1567–1572
Dey T, Karnauch A, Mockus A (2021) Representation of Developer Expertise in Open Source Software. In: Proceedings of the 43rd international conference on software engineering, p 995–1007
Ding C, He X (2004) K-Means Clustering via Principal Component Analysis. In: Proceedings of the 21st international conference on machine learning, pp 29–3
Eclipse (2020) Eclipse Java development tools (JDT) . http://www.eclipse.org/jdt/, (Last accessed: April 2023)
Fejzer M, Przymus P, Stencel K (2018) Profile based recommendation of code reviewers. J Intell Inf Syst 50:597–619
Article Google Scholar
Ferreira M, Mombach T, Valente MT, Ferreira K (2019) Algorithms for estimating truck factors: a comparative study. Softw Qual J 27(4):1583–1617
Article Google Scholar
Fritz T, Murphy GC, Hill E (2007) Does a Programmer’s Activity Indicate Knowledge of Code? In: Proceedings of the 6th joint meeting of the european software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, p 341–350
Fritz T, Murphy GC, Murphy-Hill E, Ou J, Hill E (2014) Degree-of-Knowledge: Modeling a Developer’s Knowledge of Code. ACM Trans Softw Eng Methodol 23(2)
Fritz T, Ou J, Murphy GC, Murphy-Hill E (2010) A Degree-of-Knowledge Model to Capture Source Code Familiarity. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering, p 385–394
Gauthier IX, Lamothe M, Mussbacher G, McIntosh S (2021) Is historical data an appropriate benchmark for reviewer recommendation systems?: A case study of the gerrit community. In: Proceedings of the 36th IEEE/ACM international conference on automated software engineering (ASE), IEEE, pp 30–41
Ghotra B, McIntosh S, Hassan AE (2015) Revisiting the Impact of Classification Techniques on the Performance of Defect Prediction Models. In: Proceedings of the 37th IEEE international conference on software engineering, pp 789–800
Ghotra B, Mcintosh S, Hassan AE (2017) A large-scale study of the impact of feature selection techniques on defect classification models. In: Porceedings of the 14th international conference on mining software repositories, p 146–157
Girba T, Kuhn A, Seeberger M, Ducasse S (2005) How developers drive software evolution. In: Proceedings of the 8th international workshop on principles of software evolution, pp 113–122
Goebl S, He X, Plant C, Böhm C (2014) Finding the optimal subspace for clustering. In: Proceedings of the 2014 IEEE international conference on data mining, pp 130–139
Google (2020) k-Means Advantages and Disadvantages. https://developers.google.com/machine-learning/clustering/algorithm/advantages-disadvantages, (Accessed April 2023)
Greene GJ, Fischer B (2016) Cvexplorer: Identifying candidate developers by mining and exploring their open source contributions. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering, pp 804–809
Hannebauer C, Patalas M, Stünkel S, Gruhn V (2016) Automatically recommending code reviewers based on their expertise: an empirical comparison. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering, p 99–110
Hauff C, Gousios G (2015) Matching GitHub developer profiles to job advertisements. In: 2015 IEEE/ACM 12th working conference on mining software repositories, pp 362–366
He Q, Li B, Chen F, Grundy J, Xia X, Yang Y (2020) Diversified third-party library prediction for mobile app development. IEEE Trans Software Eng 48(1):150–165
Article Google Scholar
Jiang J, He JH, Chen XY (2015) Coredevrec: automatic core member recommendation for contribution evaluation. J Comput Sci Technol 30:998–1016
Article Google Scholar
Jiang J, Yang Y, He J, Blanc X, Zhang L (2017) Who should comment on this pull request? analyzing attributes for more accurate commenter recommendation in pull based development. Inf Softw Technol 84:48–62
Article Google Scholar
Jiang J, Lo D, Zheng J, Xia X, Yang Y, Zhang L (2019) Who should make decision on this pull request? analyzing time-decaying relationships and file similarities for integrator prediction. J Syst Softw 154:196–210
Article Google Scholar
Jureczko M, Madeyski L (2010) Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the 6th international conference on predictive models in software engineering, PROMISE’10
Kagdi H, Hammad M, Maletic JI (2008) Who can help me with this source code change? In: 2008 IEEE international conference on software maintenance, pp 157–166. https://doi.org/10.1109/ICSM.2008.4658064
Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and Perils of mining GitHub. In: Proceedings of the 11th working conference on mining software repositories, pp 92–101
Kassambara A, Mundt F (2017) Package ‘factoextra’. Extract and visualize the results of multivariate data analyses 76
Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM (JACM) 46(5):604–632
Kondo M, Bezemer CP, Kamei Y, Hassan AE, Mizuno O (2019) The impact of feature reduction techniques on defect prediction models. Empirical Softw Eng 24(4):1925–1963
Lewis C, Lin Z, Sadowski C, Zhu X, Ou R, Whitehead Jr EJ (2013) Does bug prediction support human developers? Findings from a Google Case Study. In: Proceedings of the 35th international conference on software engineering, p 372–381
Li T, Zhu S, Ogihara M (2003) Algorithms for clustering high dimensional and distributed data. Intell Data Anal 7(4):305–326
Article Google Scholar
Liang JT, Zimmermann T, Ford D (2022) Towards mining oss skills from github activity. In: Proceedings of the 44th international conference on software engineering - new ideas and emerging results (NIER track)
Liao Z, Wu Z, Wu J, Zhang Y, Liu J, Long J (2019) Tirr: a code reviewer recommendation algorithm with topic model and reviewer influence. In: Proceedings of the 2019 IEEE global communications conference, pp 1–6
Li X, Peng S, Du J (2021) Towards medical knowmetrics: representing and computing medical knowledge using semantic predications as the knowledge unit and the uncertainty as the knowledge context. Scientometrics pp 1–27
Malik H, Hassan AE (2008) Supporting software evolution using adaptive change propagation heuristics. In: Proceedings of the 24th IEEE international conference on software maintenance, pp 177–18
Ma D, Schuler D, Zimmermann T, Sillito J (2009) Expert recommendation with usage expertise. In: Proceedings of the 25th IEEE international conference on software maintenance, pp 535–538
McDonald DW, Ackerman MS (2000) Expertise recommender: a flexible recommendation system and Architecture. In: Proceedings of the 2000 ACM conference on computer supported cooperative work, pp 231–240
McDonald DW (2001) Evaluating expertise recommendations. In: Proceedings of the 2001 International ACM SIGGROUP conference on supporting group work, p 214–223
Mirsaeedi E, Rigby PC (2020) Mitigating turnover with code review recommendation: balancing expertise, workload, and knowledge distribution. In: Proceedings of the 42nd international conference on software engineering, pp 1183–1195
Mittas N, Angelis L (2013) Ranking and clustering software cost estimation models through a multiple comparisons algorithm. IEEE Trans Software Eng 39(4):537–551
Article Google Scholar
Mockus A, Herbsleb JD (2002) Expertise browser: a quantitative approach to identifying expertise. In: Proceedings of the 24th international conference on software engineering, pp 503–512
Montandon JE, Silva LL, Valente MT (2019) Identifying experts in software libraries and frameworks among github users. In: Porceedings of the 2019 16th international conference on mining software repositories (MSR), pp 276–287
Moradi Dakhel A, C Desmarais M, Khomh F (2021) Assessing developer expertise from the statistical distribution of programming syntax patterns. In: Proceedings of the 25th international conference on evaluation and assessment in software engineering, pp 90–99
Munaiah N, Kroh S, Cabrey C, Nagappan M (2017) Curating GitHub for engineered software projects. Empir Softw Eng 22(6):3219–3253
Nidheesh N, Nazeer K, Ameer P (2020) A Hierarchical Clustering algorithm based on Silhouette Index for cancer subtype discovery from genomic data. Neural Comput Appl 32(15):11459–11476
Article Google Scholar
Oracle (2022) Oracle Certified Associate, Java SE 8 Programmer. https://education.oracle.com/oracle-certified-associate-java-se-8-programmer/trackp_333, (Accessed April 2023)
Oracle (2022) Oracle Certified Professional, Java EE 7 Application Developer. https://education.oracle.com/oracle-certified-professional-java-ee-7-application-developer/pexam_1Z0-900, (Accessed April 2023)
Oracle (2022) Oracle Certified Professional, Java SE 8 Programmer. https://education.oracle.com/oracle-certified-professional-java-se-8-programmer/ trackp_357, (Accessed April 2023)
Ouni A, Kula RG, Inoue K (2016) Search-based peer reviewers recommendation in modern code review. In: Proceedings of the 32nd IEEE international conference on software maintenance and evolution, pp 367–377
Pandya P, Tiwari S (2022) Corms: a github and gerrit based hybrid code reviewer recommendation approach for modern code review. In: Proceedings of the 30th ACM joint European software engineering conference and symposium on the foundations of software engineering, pp 546–557
Panichella A, Dit B, Oliveto R, Di Penta M, Poshynanyk D, De Lucia A (2013) How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms. In: Proceedings of the 35th international conference on software engineering, pp 522–531
Parsons L, Haque E, Liu H (2004) Subspace Clustering for High Dimensional Data: A Review. SIGKDD Explorations Newsletter 6(1):90–105
Article Google Scholar
Patel A, Jain S, Shandilya SK (2018) Data of semantic web as unit of knowledge. J Web Eng
Peták M, Brožová H, Houška M (2020) Modelling of knowledge via fuzzy knowledge unit in a case of the ERP systems upgrade. Autom Control Comput Sci 54(6):529–540
Peták M, Houška M (2018) Fuzzy knowledge unit. In: Proceedings of 12th international scientific conference on distance learning in applied informatics, pp 491–502
Rahman MM, Roy CK, Redl J, Collins JA (2016) Correct: code reviewer recommendation at github for vendasta technologies. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering, pp 792–797
Rigby PC, Bird C (2013) Convergent contemporary software peer review practices. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering, Association for Computing Machinery, New York, NY, USA, ESEC/FSE 2013, pp 202–212. https://doi.org/10.1145/2491411.2491444
Robbes R, Röthlisberger D (2013) Using developer interaction data to compare expertise metrics. In: Proceedings of the 10th working conference on mining software repositories, p 297–300
Rong G, Zhang Y, Yang L, Zhang F, Kuang H, Zhang H (2022) Modeling review history for reviewer recommendation: a hypergraph approach. In: Proceedings of the 44th international conference on software engineering, pp 1381–1392
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Article Google Scholar
Sklearn (2022) dfgdfg. https://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_samples.html, (Accessed April 2023)
Spadini D, Aniche M, Bacchelli A (2018) PyDriller: Python framework for mining software repositories. In: Proceedings of the 2018 26th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, p 908–911
Strand A, Gunnarson M, Britto R, Usman M (2020) Using a context-aware approach to recommend code reviewers: findings from an industrial case study. In: Proceedings of the 42nd international conference on software engineering: software engineering in practice, pp 1–10
Sülün E, Tüzün E, Doǧrusöz U (2019) Reviewer recommendation using software artifact traceability graphs. In: Proceedings of the 15th international conference on predictive models and data analytics in software engineering, pp 66–75
Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2017) An empirical comparison of model validation techniques for defect prediction models. IEEE Trans Software Eng 43(1):1–18
Article Google Scholar
Tecimer KA, Tüzün E, Dibeklioglu H, Erdogmus H (2021) Detection and elimination of systematic labeling bias in code reviewer recommendation systems. In: Evaluation and assessment in software engineering, pp 181–190
Thongtanunam P, Tantithamthavorn C, Kula RG, Yoshida N, Iida H, Matsumoto Ki (2015) Who should review my code? a file location-based code-reviewer recommendation approach for modern code review. In: Proceedings of the 22nd international conference on software analysis, evolution, and reengineering (SANER), pp 141–150
Tsantalis N, Chatzigeorgiou A, Stephanides G, Halkidis ST (2006) Design pattern detection using similarity scoring. IEEE Trans Software Eng 32(11):896–909
Article Google Scholar
Vekariya P (2018) Top 7 Programming Language Certifications for Web Developers. https://medium.com/@pdvekariya1/top-7-programming-language-certifications-for-web-developers-a29fce9508e4, (Accessed April 2023)
Vivacqua A, Lieberman H (2000) Agents to assist in finding help. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 65–72
Von Solms S, Futcher L (2018) Identifying the cybersecurity body of knowledge for a postgraduate module in systems engineering. In: IFIP world conference on information security education, pp 121–132
Wan Y, Chen L, Xu G, Zhao Z, Tang J, Wu J (2018) SCSMiner: mining social coding sites for software developer recommendation with relevance propagation. World Wide Web 21(6):1523–1543
Article Google Scholar
Xia X, Lo D, Wang X, Yang X (2015) Who should review this change?: Putting text and file location analyses together for more accurate recommendations. In: Proceedings of the 31st IEEE international conference on software maintenance and evolution, pp 261–270
Xia Z, Sun H, Jiang J, Wang X, Liu X (2017) A hybrid approach to code reviewer recommendation with collaborative filtering. In: Proceedings of the 6th international workshop on software mining, pp 24-31
Xia Z, Sun H, Jiang J, Wang X, Liu X (2017) A hybrid approach to code reviewer recommendation with collaborative filtering. In: Proceedings of the 6th on international workshop software mining, pp 24–31
Xibilia MG, Latino M, Marinković Z, Atanasković A, Donato N (2020) Soft sensors based on deep neural networks for applications in security and safety. IEEE Trans Instrum Meas 69(10):7869–7876
Article Google Scholar
Xie X, Yang X, Wang B, He Q (2021) DevRec: multi-relationship embedded software developer recommendation. IEEE Trans Softw Eng 1–1
Xiong R, Li B (2019) Accurate design pattern detection based on idiomatic implementation matching in java language context. In: 2019 IEEE 26th international conference on software analysis, evolution and reengineering (SANER), pp 163–174
Ying H, Chen L, Liang T, Wu J (2016) Earec: leveraging expertise and authority for pull-request reviewer recommendation in github. In: Proceedings of the 3rd international workshop on crowdsourcing in software engineering, pp 29–35
Yitzhaki S (1979) Relative Deprivation and the Gini Coefficient. Q J Econ 93(2):321–324
Article Google Scholar
Yoon KA, Kwon OS, Bae DH (2007) An approach to outlier detection of software measurement data using the k-means clustering method. In: First international symposium on empirical software engineering and measurement, pp 443–445
Yu Y, Wang H, Yin G, Wang T (2016) Reviewer recommendation for pull-requests in GitHub: What can we learn from code review and bug assignment? Inf Softw Technol 74:204–218
Article Google Scholar
Yu Y, Wang H, Yin G, Wang T (2016) Reviewer recommendation for pull-requests in GitHub: What can we learn from code review and bug assignment? Inf Softw Technol 74:204–2
Article Google Scholar
Yu Y, Wang H, Yin G, Ling CX (2014) Who should review this pull-request: Reviewer recommendation to expedite crowd collaboration. In: Proceedings of the 21st Asia-Pacific software engineering conference, pp 335–342
Zanjani MB, Kagdi H, Bird C (2016) Automatically recommending peer reviewers in modern code review. IEEE Trans Software Eng 42(6):530–543
Article Google Scholar
Zhang F, Zheng Q, Zou Y, Hassan AE (2016) Cross-project defect prediction using a connectivity-based unsupervised classifier. In: Proceedings of the 38th international conference on software engineering, p 309–320

Download references

Author information

Authors and Affiliations

Software Analysis and Intelligence Lab (SAIL), School of Computing, Queen’s University, Kingston, Ontario, Canada
Md Ahasanuzzaman, Gustavo A. Oliva & Ahmed E. Hassan

Authors

Md Ahasanuzzaman
View author publications
You can also search for this author in PubMed Google Scholar
Gustavo A. Oliva
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed E. Hassan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Md Ahasanuzzaman.

Ethics declarations

Conflicts of interests

The authors declared that they have no conflict of interest.

Additional information

Communicated by: Gabriele Bavota.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A Java Certification Exams and Knowledge Units

1.1 A.1 Java SE 8 Programmer I Exam

Table 6 lists the topics and subtopics covered in the Java SE 8 Programmer I Exam as per Oracle’s official webpage.

Table 6 Topics and subtopics from the Java SE 8 Programmer I Exam

Full size table

1.2 A.2 Java SE 8 Programmer II Exam

Table 7 lists the topics and subtopics covered in the Java SE 8 Programmer II Exam as per Oracle’s official webpage.

Table 7 Topics and subtopics from the Java SE 8 Programmer II Exam

Full size table

1.3 A.3 Java EE Developer Exam

Table 8 lists the topics and subtopics covered in the Java EE Developer Exam as per Oracle’s official webpage.

Table 8 Topics and subtopics from the Java EE Developer Exam

Full size table

1.4 A.4 Mapping Process From Certification Exams to Knowledge Units (KUs)

In our mapping process, the first two authors work collaboratively to minimize subjectivity bias. The authors conducted the entire mapping process together and engaged in extensive discussions regarding the handling of corner cases. The collaboration entails a shared effort between the first two authors to conduct the mapping process together. This involves jointly reviewing the topics and subtopics of exam certifications, mapping the topics to KUs, and discussing how to handle corner cases. It also includes ongoing communication to ensure a consistent and coherent approach to data analysis. It is important to note that the authors did not calculate interrater agreement for this mapping process. Our decision to forgo interrater agreement analysis was based on our collaboration approach, which aimed to achieve a shared and consistent understanding of the data (e.g., topics and subtopics of Java certification exams).

We detail our process for mapping from certification exams to KUs. To map from certification exams to KUs, we follow three steps. First, we exclude subtopics (from either exam) that are unsuitable for our study. Next, we rearrange subtopics (within exams and across exams) such that subtopics lie in the most specific topic possible. Finally, we convert each topic into a KU (one-to-one mapping) and interpret its subtopics as the key capabilities of the KU. Tables 6 and 7 show the topics and subtopics of the Java SE 8 certification exam I (E1) and the Java SE 8 certification exam II (E2), respectively. In the following, we explain the mapping process in detail (we use the notation [Ei, Tj, Sk] to refer to a subtopic Sk from topic Tj of exam Ei).

Exclusion of Subtopics We only considered the subtopics that can be automatically detected from the source code using static analysis. For instance, the subtopic “Use autoclose resources with a try-with-resources statement” can be detected by searching for “try-with-resources” code blocks using an appropriate Java parser. Our rationale is to ensure that KUs are computable for any Java system, irrespective of how those are built or used by end-users. In this vein, subtopics that are inherently conceptual or that pose great challenges to be detected in practice (e.g., involve problems that are still under investigation by the SE research community) are excluded from our operationalization. We discuss all individual cases below.

The subtopic [E1, T1, S5] under “[E1, T1] Java Basics” is about comparing and contrasting the features and components of Java, such as: platform independence, object orientation, encapsulation, etc. This topic is thus more of a concept, which can not be operationalized and extracted automatically. After discussion, the first two authors concluded that none of the subtopics of “[E1, T1] Java Basics” can be extracted automatically (e.g., [E1, T1, S5] comparing and contrasting the features and components of Java such as: platform independence, object orientation, encapsulation, etc). As a consequence, we did not include “[E1, T1] Java Basics” in our study.

Some of the subtopics of topics are about the understanding of a concept which can not be operationalized and extracted automatically. For example, the subtopic “[E1, T7, S1] Describe inheritance and its benefits” is about understanding the concept of inheritance. The subtopic “[E1, T8, S3] Describe the advantages of exception handling” is about understanding the concept of exception handling. The authors did not include such cases of subtopics as capabilities of KUs. We highlight all these cases in Table 9.

The authors decided not to include the topic “[E2, T19] Object-Oriented Design Principles” in our study. The topic “[E2, T19] Object-Oriented Design Principles” is about the implementation and detection of different design patterns (e.g., singleton design pattern and factory design pattern). Detection of such design patterns is a non-trivial task and is itself an active research area in the software engineering community (Tsantalis et al. 2006; Xiong and Li 2019). Hence, the authors did not include the topic “[E2, T19] Object-Oriented Design Principles” in this study.

The authors did not include the topic “[E3, T30] Create Java Web Application using JSPs”. To create Java web application using JSPs, developers need to create a JSP server pages (e.g., files with “.jsp” extension) containing the JSP tags which are different from Java files. In this study, we only focus on studying the Java files (e.g., files with “.java” extension). Therefore, the authors decided not to include the topic “[E3, T30]” in this study.

Rearrangement of Subtopics Certain subtopics from exam E1 are conceptually related to topics of exam E2 and vice-versa. In those cases, we prioritize the topic that is more specific and move the subtopic in question to that topic. For example, “[E2, T10] Java Class Design" contains the subtopic “[E2, T10, S2] Implement inheritance including visibility modifiers and composition". However, such a subtopic would also fit under “[E1, T7] Working with Inheritance." Since the latter topic is more specific than the former, we move [E2, T10, S2] into [E1, T7]. We also observed rare cases in which it made sense to perform subtopic moving within an exam (as opposed to cross exams). In the following, we describe all subtopic rearrangements in detail.

The subtopics of “[E1, T9] Working with selected classes from the Java API" were moved into several topics of E2:

(1)
The subtopics “[E1, T9, S1] Manipulate data using the StringBuilder class and its methods” and “[E1, T9, S2] Create and manipulate strings” were moved into “[E2, T10] String Processing”.
(2)
The subtopic “[E1, T9, S3] Create and manipulate calendar data using classes from java.time.LocalDateTime, java.time.LocalDate,java.time.LocalTime, java.time.format.DateTimeFormatter, java.time.Period.” was moved into “[E2, T16] Use Java SE 8 Date/Time API”.
(3)
The subtopic “[E1, T9, S4] Declare and use an ArrayList of a given type” was moved into “[E2, T12] Generics and Collection”.
(4)
The subtopic “[E1, T9, S5] Write a simple Lambda expression that consumes a Lambda Predicate expression” was moved into “[E2, T13] Lambda Built-in Functional Interfaces”.

The subtopics “[E1, T8, S2] Create a try-catch block” and “[E1, T8, S4] Create and invoke a method that throws an exception” were moved from “[E1.T8] Handling Exception” into “[E2.T15] Exceptions and Assertions”.

All subtopics from the topic “[E2, T10] Java Class Design” were moved into other topics of E1 and E2 as follows:

(1)
The subtopics “[E2, T10, S1] Implement encapsulation”, “[E2, T10, S4] Override hasCode, equals, and toString methods from Object class”, and “[E2, T10, S6] Develop code that uses the static keyword on initialize blocks, variables, and methods” were moved into “[E1, T6] Working with Methods and Encapsulation”.
(2)
The subtopics “[E2, T10, S2] Implement inheritance including visibility modifiers and composition” and “[E2, T10, S3] Implement polymorphism” were moved into “[E1, T7] Working with Inheritance”.
(3)
The subtopic “[E2, T10, S5] Create and use singleton classes and immutable classes” was moved into “[E2, T11] Advanced Class Design”. Three subtopics of the topic “[E2, T12] Generics and Collection KU” were moved into “[E2, T14] Java Stream API”. The subtopics (1) “[E2, T12, S5] Iterate Collection using forEach methods of Streams”, (2) “[E2, T12, S7] Filter a collection by using Stream filter API with lambda expressions”, and (3) “[E2, T12, S8] Use method references with streams” were moved into “[E2, T14] Java Stream API”.

One-to-one Mappings After performing the exclusions and subtopic rearrangements, we created a one-to-one mapping between topics and KUs. For example, the topic “[E1, T5] Using Loop Constructs” is mapped to the “[K4] Loop KU”. All of the subtopics in [E1, T5] are interpreted as key capabilities associated with [K4]. For instance, the subtopic “[E1, T5, S1] Create and use while loops” is considered a key capability associated with the Loop KU. As a result of our mapping process, we identified 28 KUs, which are shown in Table 9.

Table 9 The mapping from certification exams to Knowledge Unit (KUs)

Full size table

Table 10 Knowledge Units derived from the Java SE 8 Programmer I Exam, Java SE 8 Programmer II Exam, and Java EE Developer Exam

Full size table

B Sensitivity Analysis of Choosing a Threshold for Identifying Reasonable Recommendations

The developer who has more prior knowledge about the changed files of the PR is more capable of reviewing that PR.

Prior study also uses a threshold to identify a developer as a reasonable reviewer (Al-Zubaidi et al. 2020). Al-Zubaidi et al. (2020) identify a developer as a reasonable reviewer if the developer previously reviewed at least one of the changed files of the PR in question. We believe that such a threshold could be very low resulting that the developer may not have recent knowledge of the majority of the remaining changed files of the PR. We need to set a threshold that is not very low or very high. A low threshold value means that developers have minimum prior knowledge about the changed files. On the other hand, a high threshold value may not satisfy our objective of distributing workload to other capable reviewers. To identify an appropriate threshold we perform a sensitivity analysis. We calculate the percentage of reasonable recommendations across recommenders using three different threshold values such as 30, 50 and 80 (see Table 11). We observe that our initial assumptions hold for all of these threshold values: KUREC and the combined recommender (e.g., AD_HYBRID) achieve the highest percentage of reasonable recommendation. Finally, we select the threshold=50 as an appropriate threshold value for our study since the value is not very low or not very high.

Table 11 The percentage of reasonable reviewers across recommenders using different thresholds

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ahasanuzzaman, M., Oliva, G.A. & Hassan, A.E. Using knowledge units of programming languages to recommend reviewers for pull requests: an empirical study. Empir Software Eng 29, 33 (2024). https://doi.org/10.1007/s10664-023-10421-9

Download citation

Accepted: 06 November 2023
Published: 29 December 2023
DOI: https://doi.org/10.1007/s10664-023-10421-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using knowledge units of programming languages to recommend reviewers for pull requests: an empirical study

Abstract

Access this article

Similar content being viewed by others

Profile based recommendation of code reviewers

Multi-objective code reviewer recommendations: balancing expertise, availability and collaborations

RevRec: A two-layer reviewer recommendation algorithm in pull-based development model

Data Availability Statement (DAS)

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interests

Additional information

Publisher's Note

Appendices

Appendix A Java Certification Exams and Knowledge Units

1.1 A.1 Java SE 8 Programmer I Exam

1.2 A.2 Java SE 8 Programmer II Exam

1.3 A.3 Java EE Developer Exam

1.4 A.4 Mapping Process From Certification Exams to Knowledge Units (KUs)

B Sensitivity Analysis of Choosing a Threshold for Identifying Reasonable Recommendations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Using knowledge units of programming languages to recommend reviewers for pull requests: an empirical study

Abstract

Access this article

Similar content being viewed by others

Profile based recommendation of code reviewers

Multi-objective code reviewer recommendations: balancing expertise, availability and collaborations

RevRec: A two-layer reviewer recommendation algorithm in pull-based development model

Data Availability Statement (DAS)

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interests

Additional information

Publisher's Note

Appendices

Appendix A Java Certification Exams and Knowledge Units

1.1 A.1 Java SE 8 Programmer I Exam

1.2 A.2 Java SE 8 Programmer II Exam

1.3 A.3 Java EE Developer Exam

1.4 A.4 Mapping Process From Certification Exams to Knowledge Units (KUs)

B Sensitivity Analysis of Choosing a Threshold for Identifying Reasonable Recommendations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation