Skip to main content
Log in

Shorter identifier names take longer to comprehend

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Developers spend the majority of their time reading code, a process in which identifier names play a key role. Although many identifier naming styles exist, they often lack an empirical basis and it is not clear whether short or long identifier names facilitate comprehension. In this paper, we investigate the effect of different identifier naming styles (single letters, abbreviations, and words) on program comprehension. We conducted an experimental study with 72 professional C# developers who had to locate defects in source code snippets. We used a within-subjects design, such that each developer worked with all three versions of identifier naming styles, and we measured the time it took them to find a defect. We found that word identifiers led to a 19% increase in speed to find defects compared to meaningless single letters and abbreviations, but we did not find a difference between letters and abbreviations. The results of our study suggest that code is more difficult to comprehend when it contains only letters and abbreviations as identifier names. Words as identifier names facilitate program comprehension and may help to save costs and improve software quality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Listing 1
Listing 2
Fig. 1
Fig. 2
Listing 3
Listing 4
Listing 5
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://brains-on-code.org/

  2. Miller (1994) originally argued for a capacity limit of about 7 ± 2 items, while newer research shows that core working memory capacity is more likely limited to 3 to 5 items (Cowan 2001).

  3. The IQR is defined as Q3 − Q1, where the slowest 25% of response times lie below Q1 (first quartile) and the fastest 25% above Q3 (third quartile)

  4. The t-values of these two tests are by chance identical when rounded to two decimal places. The standardized effect sizes differ due to the correction for correlated observations.

References

Download references

Acknowledgements

This work has been supported by the DFG grant SI 2045/2-1. Janet Siegmund’s work is further funded by the Bavarian State Ministry of Education, Science and the Arts in the framework of the Centre Digitisation.Bavaria (ZD.B).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Johannes C. Hofmeister.

Ethics declarations

This study was performed in accordance with the ethical standards of the Department of Psychology, Heidelberg University, Germany.

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Communicated by: Andrian Marcus and Gabriele Bavota

This article extends a previous conference paper presented at the 24th International Conference on Software Analysis, Evolution and Reengineering (Hofmeister et al. 2017). See the end of Section 1 for details.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hofmeister, J.C., Siegmund, J. & Holt, D.V. Shorter identifier names take longer to comprehend. Empir Software Eng 24, 417–443 (2019). https://doi.org/10.1007/s10664-018-9621-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-018-9621-x

Keywords

Navigation