Empirical Software Engineering

, Volume 19, Issue 6, pp 1706–1753 | Cite as

An experimental investigation on the effects of context on source code identifiers splitting and expansion

  • Latifa Guerrouj
  • Massimiliano Di Penta
  • Yann-Gaël Guéhéneuc
  • Giuliano Antoniol


Recent and past studies indicate that source code lexicon plays an important role in program comprehension. Developers often compose source code identifiers with abbreviated words and acronyms, and do not always use consistent mechanisms and explicit separators when creating identifiers. Such choices and inconsistencies impede the work of developers that must understand identifiers by decomposing them into their component terms, and mapping them onto dictionary, application or domain words. When software documentation is scarce, outdated or simply not available, developers must therefore use the available contextual information to understand the source code. This paper aims at investigating how developers split and expand source code identifiers, and, specifically, the extent to which different kinds of contextual information could support such a task. In particular, we consider (i) an internal context consisting of the content of functions and source code files in which the identifiers are located, and (ii) an external context involving external documentation. We conducted a family of two experiments with 63 participants, including bachelor, master, Ph.D. students, and post-docs. We randomly sampled a set of 50 identifiers from a corpus of open source C programs and we asked participants to split and expand them with the availability (or not) of internal and external contexts. We report evidence on the usefulness of contextual information for identifier splitting and acronym/abbreviation expansion. We observe that the source code files are more helpful than just looking at function source code, and that the application-level contextual information does not help any further. The availability of external sources of information only helps in some circumstances. Also, in some cases, we observe that participants better expanded acronyms than abbreviations, although in most cases both exhibit the same level of accuracy. Finally, results indicated that the knowledge of English plays a significant effect in identifier splitting/expansion. The obtained results confirm the conjecture that contextual information is useful in program comprehension, including when developers split and expand identifiers to understand them. We hypothesize that the integration of identifier splitting and expansion tools with IDE could help to improve developers’ productivity.


Program understanding Identifier splitting and expansion Task context 



Special thanks to all the participants as this work would not be ossible without your time. Many thanks also to all the reviewers for their thorough and well considered reviews.

Supplementary material


  1. Anquetil N, Lethbridge T (1998) Assessing the relevance of identifier names in a legacy software system. In: Proceedings of CASCON, pp 213–222Google Scholar
  2. Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Softw Eng 28:970–983CrossRefGoogle Scholar
  3. Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-WesleyGoogle Scholar
  4. Baker RD (1995) Modern permutation test software. In: Edgington EG (ed) Randomization tests. Marcel DeckerGoogle Scholar
  5. Basili V, Caldiera G, Rombach DH (1994) The goal question metric paradigm encyclopedia of software engineering. John Wiley and SonsGoogle Scholar
  6. Binkley D, Davis M, Lawrie D, Morrell C (2009) To camelcase or under_score. In: The 17th IEEE international conference on program comprehension, ICPC 2009. Vancouver, British Columbia, Canada, May 17–19, 2009. IEEE Computer Society, pp 158–167Google Scholar
  7. Binkley D, Davis M, Lawrie D, Maletic JI, Morrell C, Sharif B (2013) The impact of identifier style on effort and comprehension. Empir Software Eng 2(18):219–276CrossRefGoogle Scholar
  8. Caprile B, Tonella P (1999) Nomen est omen: analyzing the language of function identifiers. In: Proc. of the working conference on reverse engineering (WCRE). Atlanta, Georgia, USA, pp 112–122Google Scholar
  9. Caprile B, Tonella P (2000) Restructuring program identifier names. In: Proc. of the International Conference on Software Maintenance (ICSM), pp 97–107Google Scholar
  10. Deißenböck F, Pizka M (2005) Concise and consistent naming. In: Proc. of the International Workshop on Program Comprehension (IWPC)Google Scholar
  11. Dit B, Guerrouj L, Poshyvanyk D, Antoniol G (2011) Can better identifier splitting techniques help feature location? In: Proc. of the International Conference on Program Comprehension (ICPC). Kingston, pp 11–20Google Scholar
  12. Enslen E, Hill E, Pollock LL, Vijay-Shanker K (2009) Mining source code to automatically split identifiers for software analysis. In: Proceedings of the 6th international working conference on mining software repositories, MSR 2009. Vancouver, BC, Canada, May 16–17, pp 71–80Google Scholar
  13. Grissom RJ, Kim JJ (2005) Effect sizes for research: a broad practical approach, 2nd edn. Lawrence Earlbaum AssociatesGoogle Scholar
  14. Guerrouj L, Di Penta M, Antoniol G, Guéhéneuc YG (2013) TIDIER: an identifier splitting approach using speech recognition techniques. J Softw Evol Process 25(6):569–661CrossRefGoogle Scholar
  15. Holm A (1979) A simple sequentially rejective Bonferroni test procedure. Scand J Stat 6:65–70MathSciNetzbMATHGoogle Scholar
  16. Kersten M, Murphy GC (2006) Using task context to improve programmer productivity. In: SIGSOFT ’06/FSE-14: proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering. ACM Press, Portland, Oregon, pp 1–11CrossRefGoogle Scholar
  17. Lawrie D, Binkley D (2011) Expanding identifiers to normalize source code vocabulary. In: Proc. of the International Conference on Software Maintenance (ICSM), pp 113–122Google Scholar
  18. Lawrie D, Feild H, Binkley D (2006a) Syntactic identifier conciseness and consistency. In: 6th IEEE international workshop on source code analysis and manipulation. Philadelphia, Pennsylvania, USA, pp 139–148Google Scholar
  19. Lawrie D, Morrell C, Feild H, Binkley D (2006b) What’s in a name? A study of identifiers. In: Proceedings of 14th IEEE international conference on program comprehension. IEEE CS Press, Athens, pp 3–12CrossRefGoogle Scholar
  20. Lawrie D, Morrell C, Feild H, Binkley D (2007) Effective identifier names for comprehension and memory. Innov Syst Softw Eng 3(4):303–318CrossRefGoogle Scholar
  21. Lawrie DJ, Binkley D, Morrell C (2010) Normalizing source code vocabulary. In: Proc. of the Working Conference on Reverse Engineering (WCRE), pp 112–122Google Scholar
  22. Liu D, Marcus A, Poshyvanyk D, Rajlich V (2007) Feature location via information retrieval based filtering of a single scenario execution trace. In: Proceedings of the 22nd IEEE/ACM international conference on automated software engineering. ACM, New York, NY, pp 234–243Google Scholar
  23. Madani N, Guerrouj L, Di Penta M, Guéhéneuc Y-G, Antoniol G (2010) Recognizing words from source code identifiers using speech recognition techniques. In: Proceedings of the conference on software maintenance and reengineering. IEEE, pp 69–78Google Scholar
  24. Maletic JI, Marcus A (2001) Supporting program comprehension using semantic and structural information. In: Proc. of 23rd international conference on software engineering. Toronto, pp 103–112Google Scholar
  25. Marc E, Alfred A, Giuliano A, Guéhéneuc Y-G (2008) Cerberus: tracing requirements to source code using information retrieval dynamic analysis and program analysis. In: ICPC ’08: Proceedings of the 2008 the 16th IEEE international conference on program comprehension. IEEE Computer Society, Washington DC pp 53–62Google Scholar
  26. Marcus A, Maletic JI, Sergeyev A (2005) Recovery of traceability links between software documentation and source code. Int J Softw Eng Knowl Eng 15(5):811–836CrossRefGoogle Scholar
  27. Merlo E, McAdam I, De Mori R (2003) Feed-forward and recurrent neural networks for source code informal information analysis. J Softw Maint 15(4):205–244CrossRefGoogle Scholar
  28. Ney H (1984) The use of a one-stage dynamic programming algorithm for connected word recognition. IEEE Trans Acoust Speech Signal Process 32(2):263–271CrossRefGoogle Scholar
  29. Poshyvanyk D, Guéhéneuc Y-G, Marcus A, Antoniol G, Rajlich V (2007) Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans Software Eng 33(6):420–432CrossRefGoogle Scholar
  30. R Core Team (2012) R: a language and environment for statistical computing. Vienna, Austria. ISBN 3-900051-07-0Google Scholar
  31. Ricca F, Di Penta M, Torchiano M, Tonella P, Ceccato M (2010) How developers’ experience and ability influence web application comprehension tasks supported by uml stereotypes: a series of four experiments. IEEE Trans Softw Eng 36(1):96–118CrossRefGoogle Scholar
  32. Robillard MP, Coelho W, Murphy GC (2004) How effective developers investigate source code: anexploratory study. IEEE Trans Softw Eng 30(12):889–903CrossRefGoogle Scholar
  33. Sharif B, Maletic JI (2010) An eye tracking study on camelcase and under_score identifier styles. In: Proceedings of the international conference on program comprehension, pp 196–205Google Scholar
  34. Sheskin DJ (2007) Handbook of parametric and nonparametric statistical procedures, 4th edn. Chapman & HallGoogle Scholar
  35. Sillito J, Murphy GC, De Volder K (2008) Asking and answering questions during a programming change task. IEEE Trans Softw Eng 34:434–451CrossRefGoogle Scholar
  36. Soloway E, Bonar J, Ehrlich K (1983) Cognitive strategies and looping constructs: an empirical study. Commun ACM 26(11):853–860CrossRefGoogle Scholar
  37. Storey MAD (1998) A cognitive framework for describing and evaluating software exploration tools. PhD thesis Simon Fraser UniversityGoogle Scholar
  38. Takang A, Grubb PA, Macredie RD (1996) The effects of comments and identifier names on program comprehensibility: an experiential study. J Program Lang 4(3):143–167Google Scholar
  39. von Mayrhauser A, Vans AM (1995) Program comprehension during software maintenance and evolution. IEEE Comput 28(8):44–55CrossRefGoogle Scholar
  40. Wohlin C, Runeson P, Host M, Ohlsson MC, Regnell B, Wesslen A (2000) Experimentation in software engineering—an introduction. Kluwer Academic PublishersGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Latifa Guerrouj
    • 1
  • Massimiliano Di Penta
    • 2
  • Yann-Gaël Guéhéneuc
    • 1
  • Giuliano Antoniol
    • 1
  1. 1.SOCCER Lab., DGIGLÉcole Polytechnique de MontréalQCCanada
  2. 2.University of SannioSannioItaly

Personalised recommendations