Abstract
Change-based code review is a software quality assurance technique that is widely used in practice. Therefore, better understanding what influences performance in code reviews and finding ways to improve it can have a large impact. In this study, we examine the association of working memory capacity and cognitive load with code review performance and we test the predictions of a recent theory regarding improved code review efficiency with certain code change part orders. We perform a confirmatory experiment with 50 participants, mostly professional software developers. The participants performed code reviews on one small and two larger code changes from an open source software system to which we had seeded additional defects. We measured their efficiency and effectiveness in defect detection, their working memory capacity, and several potential confounding factors. We find that there is a moderate association between working memory capacity and the effectiveness of finding delocalized defects, influenced by other factors, whereas the association with other defect types is almost non-existing. We also confirm that the effectiveness of reviews is significantly larger for small code changes. We cannot conclude reliably whether the order of presenting the code change parts influences the efficiency of code review.
Similar content being viewed by others
Notes
For consistency, we will stick to the term ‘code change’ (Baum et al. 2017b) throughout this article. We could also use ‘patch’ since every code change in the study corresponds to a single patch.
Stated simply, the ‘human cognitive system’ is the part of the human brain responsible for thinking.
Defects for which knowledge of several parts of the code needs to be combined to find them.
Size and complexity are often highly correlated (Hassan 2009), therefore, we do not treat them separately in the current article.
We had to restrict ourselves to a specific defect type to keep the experiment duration manageable and the set of defects that could be seeded into the small change was limited. We know from professional experience that swapping defects occur in practice and they can have severe consequences. We cannot quantify how prevalent they are, as studies that count defect types usually use more general categories (like “interface defects”).
We use the two-sided formulation for reasons of conservatism, even though the theory’s prediction is one-sided.
All these descriptions could be accessed again on demand by participants during the review.
The full text of these questions is contained in the replication package (Baum et al. 2018).
e.g., GitHub pull requests, Gerrit and Atlassian Stash/Bitbucket.
Available as part of CoRT: https://github.com/tobiasbaum/reviewtool
“generateTestData.r” and “SimulateExperiment.java” in the replication package (Baum et al. 2018).
For further discussion, see ‘Statistical Conclusion Validity’ in Section 5.1.
The subscripts next to the citations are participant IDs, with POn from the online setting and PCn from the more controlled company setting.
This explanation is supported by the negative correlation between company/online setting and working memory (Table 9).
References
Abdelnabi Z, Cantone G, Ciolkowski M, Rombach D (2004) Comparing code reading techniques applied to object-oriented software frameworks with regard to effectiveness and defect detection rate. In: 2004 international symposium on empirical software engineering, 2004. ISESE’04. Proceedings. IEEE, pp 239–248
Agresti A (2007) An introduction to categorical data analysis, 2nd edn. Wiley, Hoboken
Agresti A (2010) Analysis of ordinal categorical data, 2nd edn. Wiley, Hoboken
Bacchelli A, Bird C (2013) Expectations, outcomes, and challenges of modern code review. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 712–721
Balachandran V (2013) Reducing human effort and improving quality in peer code reviews using automatic static analysis and reviewer recommendation. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 931–940
Barnett M, Bird C, Brunet J, Lahiri SK (2015) Helping developers help themselves: automatic decomposition of code review changesets. In: Proceedings of the 2015 international conference on software engineering. IEEE Press
Barton K (2018) MuMIn: Multi-Model Inference. https://CRAN.R-project.org/package=MuMIn, r package version 1.42.1
Basili V, Caldiera G, Lanubile F, Shull F (1996) Studies on reading techniques. In: Proceedings of the twenty-first annual software engineering workshop, vol 96, p 002
Bates D, Maechler M, Bolker B, Walker S et al (2014) lme4: Linear mixed-effects models using eigen and s4. R package version 1(7):1–23
Baum T, Schneider K (2016) On the need for a new generation of code review tools. In: 17th international conference on product-focused software process improvement: PROFES 2016, Trondheim, Norway, November 22-24, 2016, Proceedings 17, Springer, pp 301–308. https://doi.org/10.1007/978-3-319-49094-6_19
Baum T, Liskin O, Niklas K, Schneider K (2016a) A faceted classification scheme for change-based industrial code review processes. In: 2016 IEEE international conference on software quality, reliability and security (QRS). IEEE, Vienna, Austria. https://doi.org/10.1109/QRS.2016.19
Baum T, Liskin O, Niklas K, Schneider K (2016b) Factors influencing code review processes in industry. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. ACM, New York, NY, USA, FSE 2016, pp 85–96. https://doi.org/10.1145/2950290.2950323
Baum T, Leßmann H, Schneider K (2017a) The choice of code review process: a survey on the state of the practice. In: Felderer M, Méndez Fernández D, Turhan B, Kalinowski M, Sarro F, Winkler D (eds) Product-focused software process improvement. https://doi.org/10.1007/978-3-319-69926-4_9. Springer International Publishing, Cham, pp 111–127
Baum T, Schneider K, Bacchelli A (2017b) On the optimal order of reading source code changes for review. In: 33rd IEEE international conference on software maintenance and evolution (ICSME), Proceedings, pp 329–340. https://doi.org/10.1109/ICSME.2017.28
Baum T, Schneider K, Bacchelli A (2017c) Online material for on the optimal order of reading source code changes for review. https://doi.org/10.6084/m9.figshare.5236150
Baum T, Schneider K, Bacchelli A (2018) Online material for associating working memory capacity and code change ordering with code review performance. https://doi.org/10.6084/m9.figshare.5808609
Bergersen GR, Gustafsson JE (2011) Programming skill, knowledge, and working memory among professional software developers from an investment theory perspective. J Individ Differ 32(4):201–209
Bernhart M, Grechenig T (2013) On the understanding of programs with continuous code reviews. In: 2013 IEEE 21st international conference on program comprehension (ICPC). IEEE, San Francisco, CA, USA, pp 192–198
Biegel B, Beck F, Hornig W, Diehl S (2012) The order of things: how developers sort fields and methods. In: 2012 28th IEEE international conference on software maintenance (ICSM). IEEE, pp 88–97
Biffl S (2000) Analysis of the impact of reading technique and inspector capability on individual inspection performance. In: Software engineering conference, 2000. APSEC 2000. Proceedings. Seventh Asia-Pacific, IEEE, pp 136–145
Chen F, Zhou J, Wang Y, Yu K, Arshad SZ, Khawaji A, Conway D (2016) Robust multimodal cognitive load measurement. Springer, Cham
Cohen J (1977) Statistical power analysis for the behavioral sciences. Revised edition. Academic Press
Cowan N (2010) The magical mystery four: how is working memory capacity limited, and why? Curr Dir Psychol Sci 19(1):51–57
Crk I, Kluthe T, Stefik A (2016) Understanding programming expertise: an empirical study of phasic brain wave changes. ACM Transactions on Computer-Human Interaction (TOCHI) 23(1):2
Daneman M, Carpenter PA (1980) Individual differences in working memory and reading. J Verbal Learn Verbal Behav 19(4):450–466
Daneman M, Merikle PM (1996) Working memory and language comprehension: a meta-analysis. Psychon Bull Rev 3(4):422–433
Denger C, Ciolkowski M, Lanubile F (2004) Investigating the active guidance factor in reading techniques for defect detection. In: International symposium on empirical software engineering, 2004 Proceedings. IEEE, pp 219–228
DeStefano D, LeFevre JA (2007) Cognitive load in hypertext reading: a review. Comput Hum Behav 23(3):1616–1641
Dias M, Bacchelli A, Gousios G, Cassou D, Ducasse S (2015) Untangling fine-grained code changes. In: 2015 IEEE 22nd international conference on software analysis, evolution and reengineering. IEEE, pp 341–350
Dowell J, Long J (1998) Target paper: conception of the cognitive engineering design problem. Ergonomics 41(2):126–139
Dunsmore A, Roper M, Wood M (2000) Object-oriented inspection in the face of delocalisation. In: Proceedings of the 22nd international conference on software engineering. ACM, pp 467–476
Dunsmore A, Roper M, Wood M (2001) Systematic object-oriented inspection – an empirical study. In: Proceedings of the 23rd international conference on software engineering. IEEE Computer Society, pp 135–144
Dunsmore A, Roper M, Wood M (2003) The development and evaluation of three diverse techniques for object-oriented code inspection. IEEE Trans Softw Eng 29(8):677–686. https://doi.org/10.1109/TSE.2003.1223643
Ebert F, Castor F, Novielli N, Serebrenik A (2017) Confusion detection in code reviews. In: 33rd international conference on software maintenance and evolution (ICSME), Proceedings. ICSME, pp 549–553. https://doi.org/10.1109/ICSME.2017.40
Fagan ME (1976) Design and code inspections to reduce errors in program development. IBM Syst J 15(3):182–211
Falessi D, Juristo N, Wohlin C, Turhan B, Münch J, Jedlitschka A, Oivo M (2017) Empirical software engineering experts on the use of students and professionals in experiments. Empir Softw Eng, pp 1–38
Field A, Hole G (2002) How to design and report experiments. Sage
Fritz T, Begel A, Müller SC, Yigit-Elliott S, Züger M (2014) Using psycho-physiological measures to assess task difficulty in software development. In: Proceedings of the 36th international conference on software engineering. ACM, pp 402–413
Geffen Y, Maoz S (2016) On method ordering. In: 2016 IEEE 24th international conference on program comprehension (ICPC), pp 1–10. https://doi.org/10.1109/ICPC.2016.7503711
Gilb T, Graham D (1993) Software inspection. Addison-Wesley, Wokingham
Gousios G, Pinzger M, Deursen AV (2014) An exploratory study of the pull-based software development model. In: Proceedings of the 36th international conference on software engineering. ACM, Hyderabad, India, pp 345–355
Hassan AE (2009) Predicting faults using the complexity of code changes. In: Proceedings of the 31st international conference on software engineering. IEEE Computer Society, pp 78–88
Herzig K, Zeller A (2013) The impact of tangled code changes. In: 2013 10th IEEE working conference on mining software repositories (MSR). IEEE, pp 121–130
Humble J, Farley D (2011) Continuous delivery. Addison-Wesley, Upper Saddle River
Hungerford BC, Hevner AR, Collins RW (2004) Reviewing software diagrams: a cognitive study. IEEE Trans Softw Eng 30(2):82–96
IEEE 24765 (2010) Systems and software engineering vocabulary iso/iec/ieee 24765: 2010. Standard 24765, ISO/IEC/IEEE
Jaccard P (1912) The distribution of the Flora in the alpine zone. New Phytologist 11(2):37–50
Kalyan A, Chiam M, Sun J, Manoharan S (2016) A collaborative code review platform for github. In: 2016 21st international conference on engineering of complex computer systems (ICECCS). IEEE, pp 191–196
Laguilles JS, Williams EA, Saunders DB (2011) Can lottery incentives boost web survey response rates? Findings from four experiments. Res High Educ 52 (5):537–553
Laitenberger O (2000) Cost-effective detection of software defects through perspective-based inspections. PhD thesis, Universität Kaiserslautern
MacLeod L, Greiler M, Storey MA, Bird C, Czerwonka J (2017) Code reviewing in the trenches: understanding challenges and best practices. IEEE Software 35(4):34–42. https://doi.org/10.1109/MS.2017.265100500
Mantyla MV, Lassenius C (2009) What types of defects are really discovered in code reviews? IEEE Trans Softw Eng 35(3):430–448
Matsuda J, Hayashi S, Saeki M (2015) Hierarchical categorization of edit operations for separately committing large refactoring results. In: Proceedings of the 14th international workshop on principles of software evolution. ACM, pp 19–27
McCabe TJ (1976) A complexity measure. IEEE Transactions on Software Engineering (4), pp 308–320, https://doi.org/10.1109/TSE.1976.233837
McIntosh S, Kamei Y, Adams B, Hassan AE (2015) An empirical study of the impact of modern code review practices on software quality. Empir Softw Eng 21 (5):2146–2189
McMeekin DA, von Konsky BR, Chang E, Cooper DJ (2009) Evaluating software inspection cognition levels using bloom’s taxonomy. In: 22nd conference on software engineering education and training, 2009. CSEET’09. IEEE, pp 232–239
Miller GA (1956) The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol Rev 63(2):81
Oswald FL, McAbee ST, Redick TS, Hambrick DZ (2015) The development of a short domain-general measure of working memory capacity. Behav Res Methods 47(4):1343–1355
Paas FG, Van Merriënboer JJ (1994) Instructional control of cognitive load in the training of complex cognitive tasks. Educ Psychol Rev 6(4):351–371
Parnas DL (1972) On the criteria to be used in decomposing systems into modules. Commun ACM. https://doi.org/10.1145/361598.361623
Pearl J (2001) Causality: models, reasoning, and inference. Cambridge University Press, Cambridge
Perneger TV (1998) What’s wrong with bonferroni adjustments. BMJ: Br Med J 316(7139):1236
Platz S, Taeumel M, Steinert B, Hirschfeld R, Masuhara H (2016) Unravel programming sessions with thresher: identifying coherent and complete sets of fine-granular source code changes. In: Proceedings of the 32nd JSSST annual conference, pp 24–39. https://doi.org/10.11185/imt.12.24
Pollock L, Vijay-Shanker K, Hill E, Sridhara G, Shepherd D (2009) Natural language-based software analyses and tools for software maintenance. In: Software engineering, Springer, pp 94–125
Porter A, Siy H, Mockus A, Votta L (1998) Understanding the sources of variation in software inspections. ACM Trans Softw Eng Methodol (TOSEM) 7(1):41–79
Rasmussen J (1983) Skills, rules, and knowledge; signals, signs, and symbols, and other distinctions in human performance models. IEEE Trans Syst Man Cybern SMC-13 (3):257–266. https://doi.org/10.1109/TSMC.1983.6313160
Raz T, Yaung AT (1997) Factors affecting design inspection effectiveness in software development. Inf Softw Technol 39(4):297–305
Rigby PC, Bird C (2013) Convergent contemporary software peer review practices. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering. ACM, Saint Petersburg, Russia, pp 202–212
Rigby PC, Storey MA (2011) Understanding broadcast based peer review on open source software projects. In: Proceedings of the 33rd international conference on software engineering. ACM, pp 541–550
Rigby PC, Cleary B, Painchaud F, Storey M, German DM (2012) Contemporary peer review in action: lessons from open source development. Software, IEEE 29(6):56–61
Rigby PC, German DM, Cowen L, Storey MA (2014) Peer review on open source software projects: parameters, statistical models, and theory. ACM Trans Softw Eng Methodol 23:35:1–35:33. https://doi.org/10.1145/2594458
Robbins B, Carver J (2009) Cognitive factors in perspective-based reading (pbr): a protocol analysis study. In: Proceedings of the 2009 3rd international symposium on empirical software engineering and measurement. IEEE Computer Society, pp 145–155
Rothlisberger D, Harry M, Binder W, Moret P, Ansaloni D, Villazon A, Nierstrasz O (2012) Exploiting dynamic information in ides improves speed and correctness of software maintenance tasks. IEEE Trans Softw Eng 38(3):579–591
Sauer C, Jeffery DR, Land L, Yetton P (2000) The effectiveness of software development technical reviews: A behaviorally motivated program of research. IEEE Trans Softw Eng 26(1):1–14
Siegmund J, Peitek N, Parnin C, Apel S, Hofmeister J, Kästner C, Begel A, Bethmann A, Brechmann A (2017) Measuring neural efficiency of program comprehension. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering. ACM, pp 140–150
Simon HA (1974) How big is a chunk? Science 183(4124):482–488
Singer E, Ye C (2013) The use and effects of incentives in surveys. The ANNALS of the American Academy of Political and Social Science 645(1):112–141
Sjøberg DI, Hannay JE, Hansen O, Kampenes VB, Karahasanovic A, Liborg NK, Rekdal AC (2005) A survey of controlled experiments in software engineering. IEEE Trans Softw Eng 31(9):733–753
Skoglund M, Kjellgren V (2004) An experimental comparison of the effectiveness and usefulness of inspection techniques for object-oriented programs. In: 8th international conference on empirical assessment in software engineering (EASE 2004). IET, pp 165–174. https://doi.org/10.1049/ic:20040409
Sweller J (1988) Cognitive load during problem solving: effects on learning. Cogn Sci 12(2):257–285
Tao Y, Kim S (2015) Partitioning composite code changes to facilitate code review. In: 2015 IEEE/ACM 12th working conference on mining software repositories (MSR). IEEE, pp 180–190
Thongtanunam P, McIntosh S, Hassan AE, Iida H (2015a) Investigating code review practices in defective files: An empirical study of the qt system. In: MSR ’15 Proceedings of the 12th working conference on mining software repositories, pp 168–179
Thongtanunam P, Tantithamthavorn C, Kula RG, Yoshida N, Iida H, Matsumoto KI (2015b) Who should review my code? A file location-based code-reviewer recommendation approach for modern code review. In: 2015 IEEE 22nd international conference on software analysis, evolution and reengineering (SANER), pp 141–150 https://doi.org/10.1109/SANER.2015.7081824
Unsworth N, Heitz RP, Schrock JC, Engle RW (2005) An automated version of the operation span task. Behav Res Methods 37(3):498–505
Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, New York. http://www.stats.ox.ac.uk/pub/MASS4, iSBN 0-387-95457-0
Walenstein A (2002) Theory-based analysis of cognitive support in software comprehension tools. In: Proceedings of the 10th international workshop on program comprehension, 2002. IEEE, pp 75–84
Walenstein A (2003) Observing and measuring cognitive support: steps toward systematic tool evaluation and engineering. In: 11th IEEE international workshop on program comprehension, 2003. IEEE, pp 185–194
Wilhelm O, Hildebrandt A, Oberauer K (2013) What is working memory capacity, and how can we measure it? Frontiers in Psychology 4. https://doi.org/10.3389/fpsyg.2013.00433
Zhang T, Song M, Pinedo J, Kim M (2015) Interactive code review for systematic changes. In: Proceedings of 37th IEEE/ACM international conference on software engineering. IEEE, pp 111–122 https://doi.org/10.1109/ICSE.2015.33
Acknowledgements
We thank all participants and all pre-testers for the time and effort they donated. We furthermore thank Sylvie Gasnier and Günter Faber for advice on the statistical procedures and Javad Ghofrani for help with double-checking the defect coding. We thank Bettina von Helversen from the psychology department at the University of Zurich for advice on the parts related to the theory of cognitive load. Bacchelli gratefully acknowledges the support of the Swiss National Science Foundation through the SNF Project No. PP00P2_170529.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Yasutaka Kamei
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Baum, T., Schneider, K. & Bacchelli, A. Associating working memory capacity and code change ordering with code review performance. Empir Software Eng 24, 1762–1798 (2019). https://doi.org/10.1007/s10664-018-9676-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-018-9676-8