Skip to main content
Log in

A field study of how developers locate features in source code

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Our current understanding of how programmers perform feature location during software maintenance is based on controlled studies or interviews, which are inherently limited in size, scope and realism. Replicating controlled studies in the field can both explore the findings of these studies in wider contexts and study new factors that have not been previously encountered in the laboratory setting. In this paper, we report on a field study about how software developers perform feature location within source code during their daily development activities. Our study is based on two complementary field data sets: one that reflects complete IDE activity of 67 professional developers over approximately one month, and the other that reflects usage of an IR-based code search tool by nearly 600 developers. Analyzing this data, we report results on how often developers use which type of code search tools, on the types of queries and retreival strategies used by developers, and on patterns of developer feature location behavior following code search. The results of the study suggest that there is (1) a need for helping developers to devise better code search queries; (2) a lack of adoption of niche code search tools; (3) a need for code search tool to handle both lookup and exploratory queries; and (4) a need for better integration between code search, structured navigation, and debugging tools in feature location tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. In this paper, we use the Visual Studio term solution to refer to a software project or a code base. A solution is a container consisting of one or more Visual Studio projects, which, in turn, contains a number of source code files.

  2. Sando usage data spanning from 05/2013 to 06/2014 was included in the dataset.

  3. The interaction monitoring extension, called Blaze, is implemented by researchers at ABB, Inc. Its name is the reason we refer to this dataset as such.

  4. http://visualstudiogallery.msdn.microsoft.com

  5. Editing sessions were identified by applying the session clustering algorithm on editing events.

References

  • Baeza-Yates RA, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston

    Google Scholar 

  • Bajracharya SK, Lopes CV (2012) Analyzing and mining a code search engine usage log. Empirical Software Engineering 17(4-5):424–466

    Article  Google Scholar 

  • Bates MJ (1989) The design of brosing and berrypicking techniques for the online search interface. Online Information Review 13 5:407–424. doi:10.1108/eb024320. http://ci.nii.ac.jp/naid/80004823012/en/

    Article  Google Scholar 

  • Damevski K, Shepherd D, Pollock L (2014) A case study of paired interleaving for evaluating code search techniques. In: Proceedings of the IEEE Conference on Software Maintenance and Reengineering - Working Conference on Reverse Engineering (CSMR-WCRE)

  • Dit B, Moritz E, Poshyvanyk D (2011) A tracelab-based solution for creating, conducting, and sharing feature location experiments. In: IEEE International Conference on Program Comprehension

  • Ge X, Shepherd D, Damevski K, Murphy-Hill E (2014) How the sando search tool recommends queries Software Maintenance, Reengineering and Reverse Engineering (CSMR-WCRE), 2014 Software Evolution Week - IEEE Conference on, pp 425–428

  • Haiduc S, Bavota G, Marcus A, Oliveto R, De Lucia A, Menzies T (2013) Automatic query reformulations for text retrieval in software engineering International Conference on Software Engineering (ICSE)

  • Howard MJ, Gupta S, Pollock L, Vijay-Shanker K (2013) Automatically mining software-based, semantically-similar words from comment-code mappings. In: Proceedings of the 10th Working Conference on Mining Software Repositories. http://dl.acm.org/citation.cfm?id=2487085.2487155. IEEE Press, Piscataway, N J, MSR ’13, pp 377–386

  • Kersten M, Murphy GC (2005) Mylar: A degree-of-interest model for ides. In: Proceedings of the 4th International Conference on Aspect-oriented Software Development. doi:10.1145/1052898.1052912. ACM, New York, NY, USA, AOSD ’05, pp 159–168

  • Ko AJ, Myers BA, Coblenz MJ, Aung HH (2006) An exploratory study of how developers seek, relate, and collect relevant information during software maintenance tasks. IEEE Trans Soft Eng 32(12):971–987

    Article  Google Scholar 

  • Manning CD, Raghavan P, Schütze H (2008) Introduction to Information Retrieval. Cambridge University Press, New York, NY

    Book  MATH  Google Scholar 

  • Murphy GC, Kersten M, Findlater L (2006) How are java software developers using the eclipse ide?. IEEE Software 23(4):76–83. doi:10.1109/MS.2006.105

    Article  Google Scholar 

  • Murphy-Hill E, Parnin C, Black AP (2009) How we refactor, and how we know it. In: Proceedings of the 31st International Conference on Software Engineering. doi:10.1109/ICSE.2009.5070529. IEEE Computer Society, Washington, DC, ICSE ’09, pp 287–297

  • Murphy-Hill E, Jiresal R, Murphy GC (2012) Improving software developers’ fluency by recommending development environment commands. In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. doi:10.1145/2393596.2393645. ACM, New York, FSE ’12, pp 42:1–42:11

  • ReSharper (2014) The Most Intelligent Extension for Visual Studio. http://www.jetbrains.com/resharper/

  • Robillard M, Coelho W, Murphy G (2004) How effective developers investigate source code: an exploratory study. IEEE Trans Softw Eng 30(12):889–903

    Article  Google Scholar 

  • Roldan-Vega M, Mallet G, Hill E, Fails JA (2013) Conquer: A tool for nl-based query refinement and contextualizing code search results. In: Proceedings of the 2013 IEEE International Conference on Software Maintenance. doi:10.1109/ICSM.2013.84. IEEE Computer Society, Washington, DC ICSM ’13, pp 512–515

  • Shepherd D, Damevski K, Ropski B, Fritz T (2012) Sando: an extensible local code search framework. In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, FSE, pp 15:1–15:2

  • Sillito J, Murphy GC, De Volder K (2006) Questions programmers ask during software evolution tasks. In: Proceedings of the 14th ACM SIGSOFT International Symposium on Foundations of Software Engineering. doi:10.1145/1181775.1181779. ACM, New York, SIGSOFT ’06/FSE-14, pp 23–34

  • Wang J, Peng X, Xing Z, Zhao W (2011) An exploratory study of feature location process: Distinct phases, recurring patterns, and elementary actions. In: Software Maintenance, IEEE Int Conf on, IEEE, pp 213–222

  • Wang J, Peng X, Xing Z, Zhao W (2013) Improving feature location practice with multi-faceted interactive exploration. In: Proceedings of the 2013 International Conference on Software Engineering, IEEE Press, Piscataway, NJ, USA, ICSE ’13. http://dl.acm.org/citation.cfm?id=2486788.2486888, pp 762–771

  • Yang J, Tan L (2012) Inferring semantically related words from software context. In: Mining Software Repositories (MSR), 2012 9th IEEE Working Conference on, IEEE, pp 161–170

Download references

Acknowledgements

The authors gratefully acknowledge developers at ABB, Inc. and users of the Sando search tool who allowed anonymous data collection during their daily work. We also acknowledge Will Snipes for collecting and sharing the Blaze dataset and data collection tool with us.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kostadin Damevski.

Additional information

Communicated by: Andrea De Lucia

Appendix A: List of Relevant Events in Blaze and Sando Datasets

Appendix A: List of Relevant Events in Blaze and Sando Datasets

Table 2 Blaze Dataset Events
Table 3 Sando Dataset Events

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Damevski, K., Shepherd, D. & Pollock, L. A field study of how developers locate features in source code. Empir Software Eng 21, 724–747 (2016). https://doi.org/10.1007/s10664-015-9373-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-015-9373-9

Keywords

Navigation