Skip to main content
Log in

An empirical analysis of information retrieval based concept location techniques in software comprehension

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Concept location, the problem of associating human oriented concepts with their counterpart solution domain concepts, is a fundamental problem that lies at the heart of software comprehension. Recent research has attempted to alleviate the impact of the concept location problem through the application of methods drawn from the information retrieval (IR) community. Here we present a new approach based on a complimentary IR method which also has a sound basis in cognitive theory. We compare our approach to related work through an experiment and present our conclusions. This research adapts and expands upon existing language modelling frameworks in IR for use in concept location, in software systems. In doing so it is novel in that it leverages implicit information available in system documentation. Surprisingly, empirical evaluation of this approach showed little performance benefit overall and several possible explanations are forwarded for this finding.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Anquetil N, Lethbridge T (1997) File clustering using naming conventions for legacy systems. Conference of the centre for advanced studies on collaborative research. IBM, Toronto, Ontario, Canada

    Google Scholar 

  • Antoniol G, Canfora G et al (2002) Recovering traceability links between code and documentation. IEEE Trans Soft Eng 28(10):970–983

    Article  Google Scholar 

  • Bai J, Song D et al (2005) Query expansion using term relationships in language models for information retrieval. 14th ACM International Conference on Information and Knowledge Management. ACM, Bremen, Germany

    Google Scholar 

  • Berry M, Do T et al (2007) SVDPACK. http://www.netlib.org/svdpack/

  • Biggerstaff TJ, Mitbander BG et al (1993) The concept assignment problem in program understanding. 15th International Conference on Software Engineering. IEEE Computer Society Press, Baltimore, MD, USA

    Google Scholar 

  • Biggerstaff TJ, Mitbander BG et al (1994) Program understanding and the concept assignment problem. Commun ACM 37(5):72–82

    Article  Google Scholar 

  • Bruza PD, Song D (2002) Inferring query models by computing information flow. Proceedings of the eleventh international conference on Information and knowledge management. ACM, McLean, VA, USA

    Google Scholar 

  • Buckley C, Voorhees EM (2000) Evaluating evaluation measure stability. 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, Athens, Greece

    Google Scholar 

  • Canfora G, Cerulo L (2005) Impact analysis by mining software and change request repositories. 11th IEEE International Symposium on Software Metrics (METRICS’05)

  • Canfora G, Cerulo L (2006) Fine grained indexing of software repositories to support impact analysis. International Workshop on Mining Software Repositories (MSR’06)

  • Chung W, Harrison W et al (2005) Working with implicit concerns in the concern manipulation environment. Linking aspect technology and evolution (LATE) co located with aspect orientated software development (ASOD 05). IEEE, Chicago, USA

    Google Scholar 

  • Cleary B (2007) Cognitive assignment plug-in. https://sourceforge.net/projects/forager

  • Cleary B, Exton C (2006a) Assisting concept assignment using probabilistic classification and cognitive mapping. 2nd International Workshop on Supporting Knowledge Collaboration in Software Development (KSCD2006). IEEE/ACM, Tokyo, Japan

  • Cleary B, Exton C (2006b) The cognitive assignment eclipse plug-in (ICPC 06). 10th International Conference on Program Comprehension. IEEE Computer Society Press, Athens, Greece

    Book  Google Scholar 

  • Cleary B, Exton C (2007) Assisting concept location in software comprehension. 19th Annual Psychology of Programming Workshop (PPIG07). Joensu, Finland

  • Cubranic D, Murphy GC et al (2005) Hipikat: a project memory for software development. IEEE Trans Soft Eng 31(6):446–465

    Article  Google Scholar 

  • Deerwester S, Dumais ST et al (1990) Indexing by latent semantic analysis. J Am Soc Info Sci 41(6):391–407

    Article  Google Scholar 

  • Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  Google Scholar 

  • Diesner J, Carley K (2004) AutoMap1.2-Extract, analyze, represent, and compare mental models from texts. Carnegie Mellon University

  • Eisenbarth T, Koschke R et al (2003) Locating features in source code. IEEE Trans Softw Eng 29(3):210–224

    Article  Google Scholar 

  • Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32:675–701

    Article  Google Scholar 

  • Gao J, Nie J-Y et al (2004) Dependence language model for information retrieval. 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, Sheffield, UK

    Google Scholar 

  • Greenfield J, Short K (2004) Software factories: assembling applications with patterns, frameworks, models & tools. Wiley, New York

    Google Scholar 

  • Hassan AE, Holt RC (2004) Using development history sticky notes to understand software architecture. Proceedings of the 12th IEEE International Workshop on Program Comprehension

  • Hill E, Pollock L et al (2007) Exploring the neighborhood with Dora to expedite software maintenance. 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE’07)

  • IEEE (2000) IEEE recommended practice for architectural description of software-intensive systems. Software Engineering Standards Committee

  • Jones KS, Walker S et al (2000) A probabilistic model of information retrieval: development and comparative experiments. Inf Process Manage 36(6):779–808

    Article  Google Scholar 

  • Kagdi H, Maletic JI et al (2007) Mining software repositories for traceability links. Proceedings of the 15th IEEE International Conference on Program Comprehension (ICPC ’07)

  • Kiczales G, Lamping J et al (1997) Aspect-oriented programming. European conference on object-oriented programming. Springer, Jyväskylä, Finland

    Google Scholar 

  • Kishida K (2005) Property of average precision and its generalization: an examination of evaluation indicator for information retrieval experiments. National Institute of Informatics, Tokyo, Japan

    Google Scholar 

  • Knight C, Munro M (2002) Program comprehension experiences with GXL: comprehension for comprehension. 10th International Workshop on Program Comprehension (IWPC 02). IEEE Computer Society Press, Paris, France

    Google Scholar 

  • Landauer TK, Foltz PW et al (1998) Introduction to latent semantic analysis. Discourse Process 25:259–248

    Article  Google Scholar 

  • LeGear A, Buckley J et al (2005) Achieving a reuse perspective within a component recovery process: an industrial scale case study. 13th International Workshop on Program Comprehension (IWPC 2005). IEEE Computer Society Press, St. Louis, MI, USA

    Google Scholar 

  • Littman DC, Pinto J et al (1986) Mental models and software maintenance. First Workshop on Empirical Studies of Programmers. Ablex, Washington, DC, USA

    Google Scholar 

  • Lund K, Burgess C (1996) Producing high-dimensional semantic spaces from lexical co-occurrence. Behav Res Meth Instrum Comput 28(2):203–208

    Google Scholar 

  • Lund K, Burgess C (1997) Producing high-dimensional semantic spaces from lexical co-occurrence. Behav Res Meth Instrum Comput 28:203–208

    Google Scholar 

  • Manning CD, Raghavan P et al (2007) Introduction to information retrieval. Cambridge University Press, Cambridge

    Google Scholar 

  • Marcus A, Maletic JI (2003) Recovering documentation-to-source-code traceability links using latent semantic indexing. 25th International Conference on Software Engineering (ICSE 2003). ACM/IEEE, Portland, OR, USA

  • Marcus A, Feng L et al (2003) Comprehension of software analysis data using 3D visualisation. 1st IEEE International Workshop on Program Comprehension (IWPC’03). IEEE Computer Society Press, Portland, OR, USA

    Google Scholar 

  • Marcus A, Sergeyev A et al (2004) An information retrieval approach to concept location in source code. 11th Working Conference on Reverse Engineering (WCRE 2004). Delft, The Netherlands.

  • Merlo E, McAdam I et al (2003) Feed-forward and recurrent neural networks for source code informal information analysis. J Softw Maint Evol Res Pract 15(4):205–244

    Article  Google Scholar 

  • Murphy GC, Kersten M et al (2006) How are java software developers using the eclipse IDE? IEEE Softw 23(4):76–83

    Article  Google Scholar 

  • Nemenyi PB (1963) Distribution-free multiple comparisons. PhD thesis, Princeton University

  • Poshyvanyk D, Marcus A (2007) Combining formal concept analysis with information retrieval for concept location in source code. 15th IEEE International Conference on Program Comprehension (ICPC ‘07)

  • Poshyvanyk D, Marcus A et al (2006a) JIRiSS—an eclipse plug-in for source code exploration. 14th IEEE International Conference on Program Comprehension (ICPC 2006). Athens, Greece.

  • Poshyvanyk D, Marcus A et al (2006b) Combining probabilistic ranking and latent semantic indexing for feature identification. 14th IEEE International Conference on Program Comprehension (ICPC 2006). IEEE Computer Society Press, Athens, Greece

    Google Scholar 

  • Rajlich V, Wilde N (2002) The role of concepts in program comprehension. 10th International Workshop on Program Comprehension, (IWPC 2002). IEEE Computer Society Press, Paris, France

    Google Scholar 

  • Robillard MP (2003) Representing concerns in source code. The University of British Columbia

  • Rohde D (2007) SVDLIBC. http://tedlab.mit.edu/~dr/SVDLIBC/

  • Salton G (1989) Automatic text processing the transformation analysis and retrieval of information by computer. Addison-Wesley, Reading, MA

    Google Scholar 

  • Salton G, Buckley C (1987) Term weighting approaches in automatic text retrieval. Cornell University, NY, USA

    Google Scholar 

  • Salton G, Wong A et al (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620

    Article  MATH  Google Scholar 

  • Sayyad-Shirabad J, Lethbridge TC et al (1997) A little knowledge can go a long way towards program understanding. Fifth International Workshop on Program Comprehension (IWPC ‘97). IEEE Computer Society Press, Dearborn, MI, USA

    Google Scholar 

  • Schneidewind N, Kitchenharn B et al (1999) Resolved: software maintenance is nothing more than another form of development. IEEE International Conference on Software Maintenance (ICSM ‘99). IEEE Computer Society Press, Oxford, UK

    Google Scholar 

  • Shepherd D, Fry Z et al (2007) Using natural language program analysis to locate and understand action-oriented concerns. International Conference on Aspect Oriented Software Development (AOSD’07)

  • Simonyi C (2005) Intentional programming. www.intentionalsoftware.com

  • Song D, Bruza P (2001) Discovering information flow suing high dimensional conceptual space. Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New Orleans, LA, USA, pp 327–333

    Google Scholar 

  • Song D, Bruza P (2003) Towards context-sensitive information inference. J Am Soc Info Sci Technol (JASIST) 4(54):321–334

    Article  Google Scholar 

  • TREC (2007) Text RETrieval Conference. http://trec.nist.gov/

  • Wilde N, Scully MC (1995) Software reconnaissance: mapping program features to code. J Softw Maint Res Pract 7(1):49–62

    Article  Google Scholar 

  • Wilde N, Page H et al (2001) A case study of feature location in unstructured legacy Fortran code. 5th European Conference on Software Maintenance and Reengineering (CSMR 01). IEEE Computer Society Press, Lisbon, Portugal

    Google Scholar 

  • Zayour L, Lethbridge TC (2001) Adoption of reverse engineering tools a cognitive perspective and methodology. 9th International Workshop on Program Comprehension (IWPC 01). IEEE Computer Society Press, Toronto, Canada

    Google Scholar 

  • Zhai C, Lafferty J (2004) A study of smoothing methods for language models applied to information retrieval. ACM Trans Info Syst 22(2):179–214

    Article  Google Scholar 

  • Zhao W, Zhang L et al (2004) SNIAFL: towards a static non-interactive approach to feature location. International Conference on Software Engineering (ICSE 04). ACM/IEEE, Edinburgh, Scotland

  • Zimmermann T (2006) Knowledge collaboration by mining software repositories. 2nd International Workshop on Supporting Knowledge Collaboration in Software Development (KSCD2006). IEEE/ACM, Tokyo, Japan

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Brendan Cleary.

Additional information

Editors: Tim Menzies and Letha Etzkorn

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cleary, B., Exton, C., Buckley, J. et al. An empirical analysis of information retrieval based concept location techniques in software comprehension. Empir Software Eng 14, 93–130 (2009). https://doi.org/10.1007/s10664-008-9095-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-008-9095-3

Keywords

Navigation