Skip to main content
Log in

Measuring the impact of lexical and structural inconsistencies on developers’ cognitive load during bug localization

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript


A large portion of the cost of any software lies in the time spent by developers in understanding a program’s source code before any changes can be undertaken. Measuring program comprehension is not a trivial task. In fact, different studies use self-reported and various psycho-physiological measures as proxies. In this research, we propose a methodology using functional Near Infrared Spectroscopy (fNIRS) and eye tracking devices as an objective measure of program comprehension that allows researchers to conduct studies in environments close to real world settings, at identifier level of granularity. We validate our methodology and apply it to study the impact of lexical, structural, and readability issues on developers’ cognitive load during bug localization tasks. Our study involves 25 undergraduate and graduate students and 21 metrics. Results show that the existence of lexical inconsistencies in the source code significantly increases the cognitive load experienced by participants not only on identifiers involved in the inconsistencies but also throughout the entire code snippet. We did not find statistical evidence that structural inconsistencies increase the average cognitive load that participants experience, however, both types of inconsistencies result in lower performance in terms of time and success rate. Finally, we observe that self-reported task difficulty, cognitive load, and fixation duration do not correlate and appear to be measuring different aspects of task difficulty.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Listing 1
Listing 2
Listing 3
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others


  1. The experiment was approved through a full board review for human subject research from the Institutional Review Board (IRB) at Washington State University (IRB #16113).


  • Abebe SL, Arnaoudova V, Tonella P, Antoniol G, Guéhéneuc YG (2012) Can lexicon bad smells improve fault prediction?. In: Proceedings of the Working Conference on Reverse Engineering (WCRE), pp 235–244

  • Afergan D, Peck EM, Solovey ET, Jenkins A, Hincks SW, Brown ET, Chang R, Jacob RJ (2014) Dynamic difficulty using brain metrics of workload. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, pp 3797–3806

  • Aghajani E, Nagy C, Bavota G, Lanza M (2018) A large-scale empirical study on linguistic antipatterns affecting apis. In: 2018 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 25–35

  • Arnaoudova V, Di Penta, M, Antoniol, G, Guéhéneuc Y-G (2013) A new family of software anti-patterns: Linguistic anti-patterns. In: Proceedings of the European Conference on Software Maintenance and Reengineering (CSMR), pp 187–196

  • Arnaoudova V, Di Penta, M, Antoniol G (2016) Linguistic antipatterns: What they are and how developers perceive them. Empir Softw Eng (EMSE) 21(1):104–158

    Article  Google Scholar 

  • Baker WB, Parthasarathy AB, Busch DR, Mesquita RC, Greenberg JH, Yodh A (2014) Modified beer-lambert law for blood flow. Biomed Opt Express 5 (11):4053–4075

    Article  Google Scholar 

  • Binkley D, Davis M, Lawrie D, Maletic JI, Morrell C, Sharif B (2013) The impact of identifier style on effort and comprehension. Empir Softw Eng (EMSE) 18(2):219–276

    Article  Google Scholar 

  • Binkley D, Davis M, Lawrie D, Morrell C (2009a) To CamelCase or Under_score. In: Proceedings of the International Conference on Program Comprehension (ICPC), pp 158–167

  • Binkley D, Lawrie D, Maex S, Morrell C (2009b) Identifier length and limited programmer memory. Sci Comput Program 74(7):430–445

    Article  MathSciNet  Google Scholar 

  • BIOPAC (2018a) Biopac homepage,

  • BIOPAC (2018b) fnirsoft user manual,

  • Blackwell AF (2006) Metaphors we program by: space, action and society in java. In: PPIG, pp 8

  • Buse RP, Weimer W (2010) Learning a metric for code readability. IEEE Trans Softw Eng (TSE) 36(4):546–558

    Article  Google Scholar 

  • Butler S, Wermelinger M, Yu Y, Sharp H (2009) Relating identifier naming flaws and code quality: an empirical study. In: 2009 16Th working conference on reverse engineering. IEEE, pp 31–35

  • Castelhano J, Duarte IC, Ferreira C, Duraes J, Madeira H, Castelo-Branco M (2018) The role of the insula in intuitive expert bug detection in computer code: an fmri study. Brain Imaging and Behavior, pp 1–15

  • Causse M, Chua Z, Peysakhovich V, Del Campo N, Matton N (2017) Mental workload and neural efficiency quantified in the prefrontal cortex using fnirs. Sci Rep 7(1):5222

    Article  Google Scholar 

  • Deissenboeck F, Pizka M (2006) Concise and consistent naming. Softw Qual J 14(3):261–282

    Article  Google Scholar 

  • Delpy DT, Cope M, van der Zee P, Arridge S, Wray S, Wyatt J (1988) Estimation of optical pathlength through tissue from direct time of flight measurement. Phys Med Biol 33(12):1433

    Article  Google Scholar 

  • Duraes J, Madeira H, Castelhano J, Duarte C, Branco MC (2016) Wap: Understanding the brain at software debugging. In: 2016 IEEE 27Th international symposium on software reliability engineering (ISSRE). IEEE, pp 87–92

  • Eclipse (2018) Eclipse ide,

  • Ehlis A-C, Schneider S, Dresler T, Fallgatter AJ (2014) Application of functional near-infrared spectroscopy in psychiatry. Neuroimage 85:478–488

    Article  Google Scholar 

  • EyeTribe (2018) The eye tribe homepage,

  • Fakhoury S (2018a) Online replication package,

  • Fakhoury S, Ma Y, Arnaoudova V, Adesope O (2018b) The effect of poor source code lexicon and readability on developers’ cognitive load

  • Fishburn FA, Norr ME, Medvedev AV, Vaidya CJ (2014) Sensitivity of fnirs to cognitive state and load. Front Hum Neurosci 8:76

    Article  Google Scholar 

  • Floyd B, Santander T, Weimer W (2017) Decoding the representation of code in the brain: an fmri study of code review and expertise. In: Proceedings of the International Conference on Software Engineering (ICSE), pp 175–186

  • Fritz T, Begel A, Muller SC, Yigit-Elliott S, Zuger M (2014) Using psycho-physiological measures to assess task difficulty in software development. In: Proceedings of the International Conference on Software Engineering (ICSE), pp 402–413

  • Girouard A, Solovey ET, Hirshfield LM, Chauncey K, Sassaroli A, Fantini S, Jacob RJ (2009) Distinguishing difficulty levels with non-invasive brain activity measurements. In: IFIP Conference on human-computer interaction. Springer, pp 440–452

  • Grissom RJ, Kim JJ (2005) Effect sizes for research: A broad practical approach, 2nd edn. Lawrence Earlbaum Associates

  • Halstead MH (1977) Elements of software science

  • Herff C, Heger D, Fortmann O, Hennrich J, Putze F, Schultz T (2014) Mental workload during n-back task-quantified in the prefrontal cortex using fnirs. Front Hum Neurosci 7:935

    Article  Google Scholar 

  • Hochstein L, Basili VR, Zelkowitz MV, Hollingsworth JK, Carver J (2005) Combining self-reported and automatic data to improve programming effort measurement. SIGSOFT Softw Eng Notes 30(5):356–365

    Article  Google Scholar 

  • Ikutani Y, Uwano H (2014) Brain activity measurement during program comprehension with nirs. In: Proceedings of the International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), pp 1–6

  • Jaafar F, Guéhéneuc Y-G, Hamel S, Khomh F (2013) Mining the relationship between anti-patterns dependencies and fault-proneness. In: 2013 20Th working conference on reverse engineering (WCRE). IEEE, pp 351–360

  • Khomh F, Penta MD, Guéhéneuc Y-G, Antoniol G (2012) An exploratory study of the impact of antipatterns on class change- and fault-proneness. Empirical Softw Eng 17(3):243–275

    Article  Google Scholar 

  • Kruggel F, von Cramon DY (1999) Temporal properties of the hemodynamic response in functional mri. Hum Brain Mapp 8(4):259–271

    Article  Google Scholar 

  • Lawrie D, Morrell C, Feild H, Binkley D (2006) What’s in a name? A study of identifiers. In: Proceedings of International Conference on Program Comprehension (ICPC), pp 3–12

  • Lee S, Hooshyar D, Ji H, Nam K, Lim H (2017) Mining biometric data to predict programmer expertise and task difficulty. Clust Comput 21:1–11

  • Liblit B, Begel A, Sweetser E (2006) Cognitive perspectives on the role of naming in computer programs.. In: PPIG. Citeseer, pp 11

  • Marcus A, Poshyvanyk D, Ferenc R (2008) Using the conceptual cohesion of classes for fault prediction in object-oriented systems. IEEE Trans Softw Eng (TSE) 34(2):287–30

    Article  Google Scholar 

  • McCabe TJ (1976) A complexity measure. IEEE Transactions on software engineering (TSE) SE-2(4):308–320

    Article  MathSciNet  Google Scholar 

  • Muller SC, Fritz T (2016) Using (bio)metrics to predict code quality online. In: Proceedings of the International Conference on Software Engineering (ICSE), pp 452–463

  • Nakagawa T, Kamei Y, Uwano H, Monden A, Matsumoto K, German DM (2014) Quantifying programmers’ mental workload during program comprehension based on cerebral blood flow measurement: a controlled experiment. In: Proceedings of the International Conference on Software Engineering (ICSE), pp 448–451

  • Ooms K, Dupont L, Lapon L, Popelka S (2015) Accuracy and precision of fixation locations recorded with the low-cost eye tribe tracker in different experimental setups. J Eye Mov Res 8(1):1–24

  • Peitek N, Siegmund J, Parnin C, Apel S, Hofmeister J (2018) A Brechmann Simultaneous measurement of program comprehension with fmri and eye tracking: a case study

  • Poshyvanyk D, Guéhéneuc Y-G, Marcus A, Antoniol G, Rajlich V (2006) Combining probabilistic ranking and latent semantic indexing for feature identification. In: Proceedings of the International Conference on Program Comprehension (ICPC), pp 137–148

  • Posnett D, Hindle A, Devanbu P (2011) A simpler model of software readability. In: Proceedings of the Working Conference on Mining Software Repositories (MSR), pp 73–82

  • Rayner K (1998) Eye movements in reading and information processing: 20 years of research. Psychol Bull 124(3):372–422

    Article  Google Scholar 

  • Salman I, Misirli AT, Juristo N (2015) Are students representatives of professionals in software engineering experiments?. In: Proceedings of the International Conference on Software Engineering (ICSE), pp 666–676

  • Scanniello G, Risi M (2013) Dealing with faults in source code: Abbreviated vs. full-word identifier names. In: 2013 29th IEEE international conference on Software maintenance (ICSM). IEEE, pp 190–199

  • Scalabrino S, Linares-Vasquez M, Poshyvanyk D, Oliveto R (2016) Improving code readability models with textual features. In: Proceedings of the International Conference on Program Comprehension (ICPC), pp 1–10

  • Shaffer T, Wise JL, Walters BM, Müller SC, Falcone M, Sharif B (2015) Itrace Enabling eye tracking on software artifacts within the ide to support software engineering tasks. In: Proceedings of the Joint Meeting on Foundations of Software Engineering (ESEC/FSE), pp 954–957

  • Sharafi Z, Soh Z, Guéhéneuc Y-G, Antoniol G (2012) Women and men — different but equal: on the impact of identifier style on source code reading. In: IEEE International conference on program comprehension, pp 27–36

  • Sharafi Z, Shaffer T, Sharif B, Guéhéneuc Y-G (2015a) Eye-tracking metrics in software engineering. In: 2015 Asia-pacific software engineering conference (APSEC). IEEE, pp 96–103

  • Sharafi Z, Soh Z, Guéhéneuc Y-G (2015b) A systematic literature review on the usage of eye-tracking in software engineering. Inf Softw Technol 67:79–107

    Article  Google Scholar 

  • Sharif B, Falcone M, Maletic JI (2012) An eye-tracking study on the role of scan time in finding source code defects. In: Proceedings of the Symposium on Eye Tracking Research and Applications (ETRA), pp 381–384

  • Siegmund J, Kastner C, Apel S, Parnin C, Bethmann A, Leich T, Saake G, Brechmann A (2014) Understanding understanding source code with functional magnetic resonance imaging. In: Proceedings of the International Conference on Software Engineering (ICSE), pp 378–389

  • Siegmund J, Peitek N, Parnin C, Apel S, Hofmeister J, Kastner C, Begel A, Bethmann A, Brechmann A (2017) Measuring neural efficiency of program comprehension. In: Proceedings of the Joint Meeting on Foundations of Software Engineering (ESEC/FSE), pp 140–150

  • Sokal RR (1958) A statistical method for evaluating systematic relationship. Univ Kansas Sci Bullet 28:1409–1438

    Google Scholar 

  • Takang AA, Grubb PA, Macredie RD (1996) The effects of comments and identifier names on program comprehensibility: an experimental investigation. J Prog Lang 4(3):143–167

    Google Scholar 

  • Treacy Solovey E, Afergan D, Peck EM, Hincks SW, Jacob RJK (2015) Designing Implicit Interfaces for Physiological Computing. ACM Trans Comput-Hum Interact 21(6):1–27

    Article  Google Scholar 

  • Wohlin C, Runeson P, Martin H, Ohlsson MC, Regnell B, Wesslén A (2000) Experimentation in software engineering - an introduction. Kluwer Academic Publishers, Norwell

    Book  Google Scholar 

  • Yin RK (1994) Case Study Research: Design and Methods, 2nd edn. Sage Publications, New York

    Google Scholar 

Download references


This work is supported by the NSF (award number CCF-1755995). The authors thank Thom Hemenway, Keon Sadatian, Nehemiah Salo, and Kyle Tilton for their help in developing tools for the environment in which we conducted the experiment. We also thank all students that participated in the experiment for their time and effort.

Author information

Authors and Affiliations


Corresponding authors

Correspondence to Sarah Fakhoury or Venera Arnaoudova.

Additional information

Communicated by: Chanchal Roy, Janet Siegmund, and David Lo

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is an extension of our previous paper (Fakhoury et al. 2018b).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fakhoury, S., Roy, D., Ma, Y. et al. Measuring the impact of lexical and structural inconsistencies on developers’ cognitive load during bug localization. Empir Software Eng 25, 2140–2178 (2020).

Download citation

  • Published:

  • Issue Date:

  • DOI: