Skip to main content
Log in

Associating working memory capacity and code change ordering with code review performance

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Change-based code review is a software quality assurance technique that is widely used in practice. Therefore, better understanding what influences performance in code reviews and finding ways to improve it can have a large impact. In this study, we examine the association of working memory capacity and cognitive load with code review performance and we test the predictions of a recent theory regarding improved code review efficiency with certain code change part orders. We perform a confirmatory experiment with 50 participants, mostly professional software developers. The participants performed code reviews on one small and two larger code changes from an open source software system to which we had seeded additional defects. We measured their efficiency and effectiveness in defect detection, their working memory capacity, and several potential confounding factors. We find that there is a moderate association between working memory capacity and the effectiveness of finding delocalized defects, influenced by other factors, whereas the association with other defect types is almost non-existing. We also confirm that the effectiveness of reviews is significantly larger for small code changes. We cannot conclude reliably whether the order of presenting the code change parts influences the efficiency of code review.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. For consistency, we will stick to the term ‘code change’ (Baum et al. 2017b) throughout this article. We could also use ‘patch’ since every code change in the study corresponds to a single patch.

  2. Stated simply, the ‘human cognitive system’ is the part of the human brain responsible for thinking.

  3. Defects for which knowledge of several parts of the code needs to be combined to find them.

  4. Size and complexity are often highly correlated (Hassan 2009), therefore, we do not treat them separately in the current article.

  5. We had to restrict ourselves to a specific defect type to keep the experiment duration manageable and the set of defects that could be seeded into the small change was limited. We know from professional experience that swapping defects occur in practice and they can have severe consequences. We cannot quantify how prevalent they are, as studies that count defect types usually use more general categories (like “interface defects”).

  6. We use the two-sided formulation for reasons of conservatism, even though the theory’s prediction is one-sided.

  7. All these descriptions could be accessed again on demand by participants during the review.

  8. The full text of these questions is contained in the replication package (Baum et al. 2018).

  9. e.g., GitHub pull requests, Gerrit and Atlassian Stash/Bitbucket.

  10. http://jedit.sourceforge.net

  11. Available as part of CoRT: https://github.com/tobiasbaum/reviewtool

  12. “generateTestData.r” and “SimulateExperiment.java” in the replication package (Baum et al. 2018).

  13. For further discussion, see ‘Statistical Conclusion Validity’ in Section 5.1.

  14. The subscripts next to the citations are participant IDs, with POn from the online setting and PCn from the more controlled company setting.

  15. This explanation is supported by the negative correlation between company/online setting and working memory (Table 9).

References

  • Abdelnabi Z, Cantone G, Ciolkowski M, Rombach D (2004) Comparing code reading techniques applied to object-oriented software frameworks with regard to effectiveness and defect detection rate. In: 2004 international symposium on empirical software engineering, 2004. ISESE’04. Proceedings. IEEE, pp 239–248

  • Agresti A (2007) An introduction to categorical data analysis, 2nd edn. Wiley, Hoboken

    Book  MATH  Google Scholar 

  • Agresti A (2010) Analysis of ordinal categorical data, 2nd edn. Wiley, Hoboken

    Book  MATH  Google Scholar 

  • Bacchelli A, Bird C (2013) Expectations, outcomes, and challenges of modern code review. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 712–721

  • Balachandran V (2013) Reducing human effort and improving quality in peer code reviews using automatic static analysis and reviewer recommendation. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 931–940

  • Barnett M, Bird C, Brunet J, Lahiri SK (2015) Helping developers help themselves: automatic decomposition of code review changesets. In: Proceedings of the 2015 international conference on software engineering. IEEE Press

  • Barton K (2018) MuMIn: Multi-Model Inference. https://CRAN.R-project.org/package=MuMIn, r package version 1.42.1

  • Basili V, Caldiera G, Lanubile F, Shull F (1996) Studies on reading techniques. In: Proceedings of the twenty-first annual software engineering workshop, vol 96, p 002

  • Bates D, Maechler M, Bolker B, Walker S et al (2014) lme4: Linear mixed-effects models using eigen and s4. R package version 1(7):1–23

    Google Scholar 

  • Baum T, Schneider K (2016) On the need for a new generation of code review tools. In: 17th international conference on product-focused software process improvement: PROFES 2016, Trondheim, Norway, November 22-24, 2016, Proceedings 17, Springer, pp 301–308. https://doi.org/10.1007/978-3-319-49094-6_19

  • Baum T, Liskin O, Niklas K, Schneider K (2016a) A faceted classification scheme for change-based industrial code review processes. In: 2016 IEEE international conference on software quality, reliability and security (QRS). IEEE, Vienna, Austria. https://doi.org/10.1109/QRS.2016.19

  • Baum T, Liskin O, Niklas K, Schneider K (2016b) Factors influencing code review processes in industry. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. ACM, New York, NY, USA, FSE 2016, pp 85–96. https://doi.org/10.1145/2950290.2950323

  • Baum T, Leßmann H, Schneider K (2017a) The choice of code review process: a survey on the state of the practice. In: Felderer M, Méndez Fernández D, Turhan B, Kalinowski M, Sarro F, Winkler D (eds) Product-focused software process improvement. https://doi.org/10.1007/978-3-319-69926-4_9. Springer International Publishing, Cham, pp 111–127

  • Baum T, Schneider K, Bacchelli A (2017b) On the optimal order of reading source code changes for review. In: 33rd IEEE international conference on software maintenance and evolution (ICSME), Proceedings, pp 329–340. https://doi.org/10.1109/ICSME.2017.28

  • Baum T, Schneider K, Bacchelli A (2017c) Online material for on the optimal order of reading source code changes for review. https://doi.org/10.6084/m9.figshare.5236150

  • Baum T, Schneider K, Bacchelli A (2018) Online material for associating working memory capacity and code change ordering with code review performance. https://doi.org/10.6084/m9.figshare.5808609

  • Bergersen GR, Gustafsson JE (2011) Programming skill, knowledge, and working memory among professional software developers from an investment theory perspective. J Individ Differ 32(4):201–209

    Article  Google Scholar 

  • Bernhart M, Grechenig T (2013) On the understanding of programs with continuous code reviews. In: 2013 IEEE 21st international conference on program comprehension (ICPC). IEEE, San Francisco, CA, USA, pp 192–198

  • Biegel B, Beck F, Hornig W, Diehl S (2012) The order of things: how developers sort fields and methods. In: 2012 28th IEEE international conference on software maintenance (ICSM). IEEE, pp 88–97

  • Biffl S (2000) Analysis of the impact of reading technique and inspector capability on individual inspection performance. In: Software engineering conference, 2000. APSEC 2000. Proceedings. Seventh Asia-Pacific, IEEE, pp 136–145

  • Chen F, Zhou J, Wang Y, Yu K, Arshad SZ, Khawaji A, Conway D (2016) Robust multimodal cognitive load measurement. Springer, Cham

    Book  Google Scholar 

  • Cohen J (1977) Statistical power analysis for the behavioral sciences. Revised edition. Academic Press

  • Cowan N (2010) The magical mystery four: how is working memory capacity limited, and why? Curr Dir Psychol Sci 19(1):51–57

    Article  Google Scholar 

  • Crk I, Kluthe T, Stefik A (2016) Understanding programming expertise: an empirical study of phasic brain wave changes. ACM Transactions on Computer-Human Interaction (TOCHI) 23(1):2

    Google Scholar 

  • Daneman M, Carpenter PA (1980) Individual differences in working memory and reading. J Verbal Learn Verbal Behav 19(4):450–466

    Article  Google Scholar 

  • Daneman M, Merikle PM (1996) Working memory and language comprehension: a meta-analysis. Psychon Bull Rev 3(4):422–433

    Article  Google Scholar 

  • Denger C, Ciolkowski M, Lanubile F (2004) Investigating the active guidance factor in reading techniques for defect detection. In: International symposium on empirical software engineering, 2004 Proceedings. IEEE, pp 219–228

  • DeStefano D, LeFevre JA (2007) Cognitive load in hypertext reading: a review. Comput Hum Behav 23(3):1616–1641

    Article  Google Scholar 

  • Dias M, Bacchelli A, Gousios G, Cassou D, Ducasse S (2015) Untangling fine-grained code changes. In: 2015 IEEE 22nd international conference on software analysis, evolution and reengineering. IEEE, pp 341–350

  • Dowell J, Long J (1998) Target paper: conception of the cognitive engineering design problem. Ergonomics 41(2):126–139

    Article  Google Scholar 

  • Dunsmore A, Roper M, Wood M (2000) Object-oriented inspection in the face of delocalisation. In: Proceedings of the 22nd international conference on software engineering. ACM, pp 467–476

  • Dunsmore A, Roper M, Wood M (2001) Systematic object-oriented inspection – an empirical study. In: Proceedings of the 23rd international conference on software engineering. IEEE Computer Society, pp 135–144

  • Dunsmore A, Roper M, Wood M (2003) The development and evaluation of three diverse techniques for object-oriented code inspection. IEEE Trans Softw Eng 29(8):677–686. https://doi.org/10.1109/TSE.2003.1223643

    Article  Google Scholar 

  • Ebert F, Castor F, Novielli N, Serebrenik A (2017) Confusion detection in code reviews. In: 33rd international conference on software maintenance and evolution (ICSME), Proceedings. ICSME, pp 549–553. https://doi.org/10.1109/ICSME.2017.40

  • Fagan ME (1976) Design and code inspections to reduce errors in program development. IBM Syst J 15(3):182–211

    Article  Google Scholar 

  • Falessi D, Juristo N, Wohlin C, Turhan B, Münch J, Jedlitschka A, Oivo M (2017) Empirical software engineering experts on the use of students and professionals in experiments. Empir Softw Eng, pp 1–38

  • Field A, Hole G (2002) How to design and report experiments. Sage

  • Fritz T, Begel A, Müller SC, Yigit-Elliott S, Züger M (2014) Using psycho-physiological measures to assess task difficulty in software development. In: Proceedings of the 36th international conference on software engineering. ACM, pp 402–413

  • Geffen Y, Maoz S (2016) On method ordering. In: 2016 IEEE 24th international conference on program comprehension (ICPC), pp 1–10. https://doi.org/10.1109/ICPC.2016.7503711

  • Gilb T, Graham D (1993) Software inspection. Addison-Wesley, Wokingham

    Google Scholar 

  • Gousios G, Pinzger M, Deursen AV (2014) An exploratory study of the pull-based software development model. In: Proceedings of the 36th international conference on software engineering. ACM, Hyderabad, India, pp 345–355

  • Hassan AE (2009) Predicting faults using the complexity of code changes. In: Proceedings of the 31st international conference on software engineering. IEEE Computer Society, pp 78–88

  • Herzig K, Zeller A (2013) The impact of tangled code changes. In: 2013 10th IEEE working conference on mining software repositories (MSR). IEEE, pp 121–130

  • Humble J, Farley D (2011) Continuous delivery. Addison-Wesley, Upper Saddle River

    Google Scholar 

  • Hungerford BC, Hevner AR, Collins RW (2004) Reviewing software diagrams: a cognitive study. IEEE Trans Softw Eng 30(2):82–96

    Article  Google Scholar 

  • IEEE 24765 (2010) Systems and software engineering vocabulary iso/iec/ieee 24765: 2010. Standard 24765, ISO/IEC/IEEE

  • Jaccard P (1912) The distribution of the Flora in the alpine zone. New Phytologist 11(2):37–50

    Article  Google Scholar 

  • Kalyan A, Chiam M, Sun J, Manoharan S (2016) A collaborative code review platform for github. In: 2016 21st international conference on engineering of complex computer systems (ICECCS). IEEE, pp 191–196

  • Laguilles JS, Williams EA, Saunders DB (2011) Can lottery incentives boost web survey response rates? Findings from four experiments. Res High Educ 52 (5):537–553

    Article  Google Scholar 

  • Laitenberger O (2000) Cost-effective detection of software defects through perspective-based inspections. PhD thesis, Universität Kaiserslautern

  • MacLeod L, Greiler M, Storey MA, Bird C, Czerwonka J (2017) Code reviewing in the trenches: understanding challenges and best practices. IEEE Software 35(4):34–42. https://doi.org/10.1109/MS.2017.265100500

    Article  Google Scholar 

  • Mantyla MV, Lassenius C (2009) What types of defects are really discovered in code reviews? IEEE Trans Softw Eng 35(3):430–448

    Article  Google Scholar 

  • Matsuda J, Hayashi S, Saeki M (2015) Hierarchical categorization of edit operations for separately committing large refactoring results. In: Proceedings of the 14th international workshop on principles of software evolution. ACM, pp 19–27

  • McCabe TJ (1976) A complexity measure. IEEE Transactions on Software Engineering (4), pp 308–320, https://doi.org/10.1109/TSE.1976.233837

  • McIntosh S, Kamei Y, Adams B, Hassan AE (2015) An empirical study of the impact of modern code review practices on software quality. Empir Softw Eng 21 (5):2146–2189

    Article  Google Scholar 

  • McMeekin DA, von Konsky BR, Chang E, Cooper DJ (2009) Evaluating software inspection cognition levels using bloom’s taxonomy. In: 22nd conference on software engineering education and training, 2009. CSEET’09. IEEE, pp 232–239

  • Miller GA (1956) The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol Rev 63(2):81

    Article  Google Scholar 

  • Oswald FL, McAbee ST, Redick TS, Hambrick DZ (2015) The development of a short domain-general measure of working memory capacity. Behav Res Methods 47(4):1343–1355

    Article  Google Scholar 

  • Paas FG, Van Merriënboer JJ (1994) Instructional control of cognitive load in the training of complex cognitive tasks. Educ Psychol Rev 6(4):351–371

    Article  Google Scholar 

  • Parnas DL (1972) On the criteria to be used in decomposing systems into modules. Commun ACM. https://doi.org/10.1145/361598.361623

  • Pearl J (2001) Causality: models, reasoning, and inference. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Perneger TV (1998) What’s wrong with bonferroni adjustments. BMJ: Br Med J 316(7139):1236

    Article  Google Scholar 

  • Platz S, Taeumel M, Steinert B, Hirschfeld R, Masuhara H (2016) Unravel programming sessions with thresher: identifying coherent and complete sets of fine-granular source code changes. In: Proceedings of the 32nd JSSST annual conference, pp 24–39. https://doi.org/10.11185/imt.12.24

  • Pollock L, Vijay-Shanker K, Hill E, Sridhara G, Shepherd D (2009) Natural language-based software analyses and tools for software maintenance. In: Software engineering, Springer, pp 94–125

  • Porter A, Siy H, Mockus A, Votta L (1998) Understanding the sources of variation in software inspections. ACM Trans Softw Eng Methodol (TOSEM) 7(1):41–79

    Article  Google Scholar 

  • Rasmussen J (1983) Skills, rules, and knowledge; signals, signs, and symbols, and other distinctions in human performance models. IEEE Trans Syst Man Cybern SMC-13 (3):257–266. https://doi.org/10.1109/TSMC.1983.6313160

    Article  Google Scholar 

  • Raz T, Yaung AT (1997) Factors affecting design inspection effectiveness in software development. Inf Softw Technol 39(4):297–305

    Article  Google Scholar 

  • Rigby PC, Bird C (2013) Convergent contemporary software peer review practices. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering. ACM, Saint Petersburg, Russia, pp 202–212

  • Rigby PC, Storey MA (2011) Understanding broadcast based peer review on open source software projects. In: Proceedings of the 33rd international conference on software engineering. ACM, pp 541–550

  • Rigby PC, Cleary B, Painchaud F, Storey M, German DM (2012) Contemporary peer review in action: lessons from open source development. Software, IEEE 29(6):56–61

    Article  Google Scholar 

  • Rigby PC, German DM, Cowen L, Storey MA (2014) Peer review on open source software projects: parameters, statistical models, and theory. ACM Trans Softw Eng Methodol 23:35:1–35:33. https://doi.org/10.1145/2594458

    Article  Google Scholar 

  • Robbins B, Carver J (2009) Cognitive factors in perspective-based reading (pbr): a protocol analysis study. In: Proceedings of the 2009 3rd international symposium on empirical software engineering and measurement. IEEE Computer Society, pp 145–155

  • Rothlisberger D, Harry M, Binder W, Moret P, Ansaloni D, Villazon A, Nierstrasz O (2012) Exploiting dynamic information in ides improves speed and correctness of software maintenance tasks. IEEE Trans Softw Eng 38(3):579–591

    Article  Google Scholar 

  • Sauer C, Jeffery DR, Land L, Yetton P (2000) The effectiveness of software development technical reviews: A behaviorally motivated program of research. IEEE Trans Softw Eng 26(1):1–14

    Article  Google Scholar 

  • Siegmund J, Peitek N, Parnin C, Apel S, Hofmeister J, Kästner C, Begel A, Bethmann A, Brechmann A (2017) Measuring neural efficiency of program comprehension. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering. ACM, pp 140–150

  • Simon HA (1974) How big is a chunk? Science 183(4124):482–488

    Article  Google Scholar 

  • Singer E, Ye C (2013) The use and effects of incentives in surveys. The ANNALS of the American Academy of Political and Social Science 645(1):112–141

    Article  Google Scholar 

  • Sjøberg DI, Hannay JE, Hansen O, Kampenes VB, Karahasanovic A, Liborg NK, Rekdal AC (2005) A survey of controlled experiments in software engineering. IEEE Trans Softw Eng 31(9):733–753

    Article  Google Scholar 

  • Skoglund M, Kjellgren V (2004) An experimental comparison of the effectiveness and usefulness of inspection techniques for object-oriented programs. In: 8th international conference on empirical assessment in software engineering (EASE 2004). IET, pp 165–174. https://doi.org/10.1049/ic:20040409

  • Sweller J (1988) Cognitive load during problem solving: effects on learning. Cogn Sci 12(2):257–285

    Article  Google Scholar 

  • Tao Y, Kim S (2015) Partitioning composite code changes to facilitate code review. In: 2015 IEEE/ACM 12th working conference on mining software repositories (MSR). IEEE, pp 180–190

  • Thongtanunam P, McIntosh S, Hassan AE, Iida H (2015a) Investigating code review practices in defective files: An empirical study of the qt system. In: MSR ’15 Proceedings of the 12th working conference on mining software repositories, pp 168–179

  • Thongtanunam P, Tantithamthavorn C, Kula RG, Yoshida N, Iida H, Matsumoto KI (2015b) Who should review my code? A file location-based code-reviewer recommendation approach for modern code review. In: 2015 IEEE 22nd international conference on software analysis, evolution and reengineering (SANER), pp 141–150 https://doi.org/10.1109/SANER.2015.7081824

  • Unsworth N, Heitz RP, Schrock JC, Engle RW (2005) An automated version of the operation span task. Behav Res Methods 37(3):498–505

    Article  Google Scholar 

  • Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, New York. http://www.stats.ox.ac.uk/pub/MASS4, iSBN 0-387-95457-0

    Book  MATH  Google Scholar 

  • Walenstein A (2002) Theory-based analysis of cognitive support in software comprehension tools. In: Proceedings of the 10th international workshop on program comprehension, 2002. IEEE, pp 75–84

  • Walenstein A (2003) Observing and measuring cognitive support: steps toward systematic tool evaluation and engineering. In: 11th IEEE international workshop on program comprehension, 2003. IEEE, pp 185–194

  • Wilhelm O, Hildebrandt A, Oberauer K (2013) What is working memory capacity, and how can we measure it? Frontiers in Psychology 4. https://doi.org/10.3389/fpsyg.2013.00433

  • Zhang T, Song M, Pinedo J, Kim M (2015) Interactive code review for systematic changes. In: Proceedings of 37th IEEE/ACM international conference on software engineering. IEEE, pp 111–122 https://doi.org/10.1109/ICSE.2015.33

Download references

Acknowledgements

We thank all participants and all pre-testers for the time and effort they donated. We furthermore thank Sylvie Gasnier and Günter Faber for advice on the statistical procedures and Javad Ghofrani for help with double-checking the defect coding. We thank Bettina von Helversen from the psychology department at the University of Zurich for advice on the parts related to the theory of cognitive load. Bacchelli gratefully acknowledges the support of the Swiss National Science Foundation through the SNF Project No. PP00P2_170529.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tobias Baum.

Additional information

Communicated by: Yasutaka Kamei

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Baum, T., Schneider, K. & Bacchelli, A. Associating working memory capacity and code change ordering with code review performance. Empir Software Eng 24, 1762–1798 (2019). https://doi.org/10.1007/s10664-018-9676-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-018-9676-8

Keywords

Navigation