Abstract
Concept location, the problem of associating human oriented concepts with their counterpart solution domain concepts, is a fundamental problem that lies at the heart of software comprehension. Recent research has attempted to alleviate the impact of the concept location problem through the application of methods drawn from the information retrieval (IR) community. Here we present a new approach based on a complimentary IR method which also has a sound basis in cognitive theory. We compare our approach to related work through an experiment and present our conclusions. This research adapts and expands upon existing language modelling frameworks in IR for use in concept location, in software systems. In doing so it is novel in that it leverages implicit information available in system documentation. Surprisingly, empirical evaluation of this approach showed little performance benefit overall and several possible explanations are forwarded for this finding.
Similar content being viewed by others
References
Anquetil N, Lethbridge T (1997) File clustering using naming conventions for legacy systems. Conference of the centre for advanced studies on collaborative research. IBM, Toronto, Ontario, Canada
Antoniol G, Canfora G et al (2002) Recovering traceability links between code and documentation. IEEE Trans Soft Eng 28(10):970–983
Bai J, Song D et al (2005) Query expansion using term relationships in language models for information retrieval. 14th ACM International Conference on Information and Knowledge Management. ACM, Bremen, Germany
Berry M, Do T et al (2007) SVDPACK. http://www.netlib.org/svdpack/
Biggerstaff TJ, Mitbander BG et al (1993) The concept assignment problem in program understanding. 15th International Conference on Software Engineering. IEEE Computer Society Press, Baltimore, MD, USA
Biggerstaff TJ, Mitbander BG et al (1994) Program understanding and the concept assignment problem. Commun ACM 37(5):72–82
Bruza PD, Song D (2002) Inferring query models by computing information flow. Proceedings of the eleventh international conference on Information and knowledge management. ACM, McLean, VA, USA
Buckley C, Voorhees EM (2000) Evaluating evaluation measure stability. 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, Athens, Greece
Canfora G, Cerulo L (2005) Impact analysis by mining software and change request repositories. 11th IEEE International Symposium on Software Metrics (METRICS’05)
Canfora G, Cerulo L (2006) Fine grained indexing of software repositories to support impact analysis. International Workshop on Mining Software Repositories (MSR’06)
Chung W, Harrison W et al (2005) Working with implicit concerns in the concern manipulation environment. Linking aspect technology and evolution (LATE) co located with aspect orientated software development (ASOD 05). IEEE, Chicago, USA
Cleary B (2007) Cognitive assignment plug-in. https://sourceforge.net/projects/forager
Cleary B, Exton C (2006a) Assisting concept assignment using probabilistic classification and cognitive mapping. 2nd International Workshop on Supporting Knowledge Collaboration in Software Development (KSCD2006). IEEE/ACM, Tokyo, Japan
Cleary B, Exton C (2006b) The cognitive assignment eclipse plug-in (ICPC 06). 10th International Conference on Program Comprehension. IEEE Computer Society Press, Athens, Greece
Cleary B, Exton C (2007) Assisting concept location in software comprehension. 19th Annual Psychology of Programming Workshop (PPIG07). Joensu, Finland
Cubranic D, Murphy GC et al (2005) Hipikat: a project memory for software development. IEEE Trans Soft Eng 31(6):446–465
Deerwester S, Dumais ST et al (1990) Indexing by latent semantic analysis. J Am Soc Info Sci 41(6):391–407
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Diesner J, Carley K (2004) AutoMap1.2-Extract, analyze, represent, and compare mental models from texts. Carnegie Mellon University
Eisenbarth T, Koschke R et al (2003) Locating features in source code. IEEE Trans Softw Eng 29(3):210–224
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32:675–701
Gao J, Nie J-Y et al (2004) Dependence language model for information retrieval. 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, Sheffield, UK
Greenfield J, Short K (2004) Software factories: assembling applications with patterns, frameworks, models & tools. Wiley, New York
Hassan AE, Holt RC (2004) Using development history sticky notes to understand software architecture. Proceedings of the 12th IEEE International Workshop on Program Comprehension
Hill E, Pollock L et al (2007) Exploring the neighborhood with Dora to expedite software maintenance. 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE’07)
IEEE (2000) IEEE recommended practice for architectural description of software-intensive systems. Software Engineering Standards Committee
Jones KS, Walker S et al (2000) A probabilistic model of information retrieval: development and comparative experiments. Inf Process Manage 36(6):779–808
Kagdi H, Maletic JI et al (2007) Mining software repositories for traceability links. Proceedings of the 15th IEEE International Conference on Program Comprehension (ICPC ’07)
Kiczales G, Lamping J et al (1997) Aspect-oriented programming. European conference on object-oriented programming. Springer, Jyväskylä, Finland
Kishida K (2005) Property of average precision and its generalization: an examination of evaluation indicator for information retrieval experiments. National Institute of Informatics, Tokyo, Japan
Knight C, Munro M (2002) Program comprehension experiences with GXL: comprehension for comprehension. 10th International Workshop on Program Comprehension (IWPC 02). IEEE Computer Society Press, Paris, France
Landauer TK, Foltz PW et al (1998) Introduction to latent semantic analysis. Discourse Process 25:259–248
LeGear A, Buckley J et al (2005) Achieving a reuse perspective within a component recovery process: an industrial scale case study. 13th International Workshop on Program Comprehension (IWPC 2005). IEEE Computer Society Press, St. Louis, MI, USA
Littman DC, Pinto J et al (1986) Mental models and software maintenance. First Workshop on Empirical Studies of Programmers. Ablex, Washington, DC, USA
Lund K, Burgess C (1996) Producing high-dimensional semantic spaces from lexical co-occurrence. Behav Res Meth Instrum Comput 28(2):203–208
Lund K, Burgess C (1997) Producing high-dimensional semantic spaces from lexical co-occurrence. Behav Res Meth Instrum Comput 28:203–208
Manning CD, Raghavan P et al (2007) Introduction to information retrieval. Cambridge University Press, Cambridge
Marcus A, Maletic JI (2003) Recovering documentation-to-source-code traceability links using latent semantic indexing. 25th International Conference on Software Engineering (ICSE 2003). ACM/IEEE, Portland, OR, USA
Marcus A, Feng L et al (2003) Comprehension of software analysis data using 3D visualisation. 1st IEEE International Workshop on Program Comprehension (IWPC’03). IEEE Computer Society Press, Portland, OR, USA
Marcus A, Sergeyev A et al (2004) An information retrieval approach to concept location in source code. 11th Working Conference on Reverse Engineering (WCRE 2004). Delft, The Netherlands.
Merlo E, McAdam I et al (2003) Feed-forward and recurrent neural networks for source code informal information analysis. J Softw Maint Evol Res Pract 15(4):205–244
Murphy GC, Kersten M et al (2006) How are java software developers using the eclipse IDE? IEEE Softw 23(4):76–83
Nemenyi PB (1963) Distribution-free multiple comparisons. PhD thesis, Princeton University
Poshyvanyk D, Marcus A (2007) Combining formal concept analysis with information retrieval for concept location in source code. 15th IEEE International Conference on Program Comprehension (ICPC ‘07)
Poshyvanyk D, Marcus A et al (2006a) JIRiSS—an eclipse plug-in for source code exploration. 14th IEEE International Conference on Program Comprehension (ICPC 2006). Athens, Greece.
Poshyvanyk D, Marcus A et al (2006b) Combining probabilistic ranking and latent semantic indexing for feature identification. 14th IEEE International Conference on Program Comprehension (ICPC 2006). IEEE Computer Society Press, Athens, Greece
Rajlich V, Wilde N (2002) The role of concepts in program comprehension. 10th International Workshop on Program Comprehension, (IWPC 2002). IEEE Computer Society Press, Paris, France
Robillard MP (2003) Representing concerns in source code. The University of British Columbia
Rohde D (2007) SVDLIBC. http://tedlab.mit.edu/~dr/SVDLIBC/
Salton G (1989) Automatic text processing the transformation analysis and retrieval of information by computer. Addison-Wesley, Reading, MA
Salton G, Buckley C (1987) Term weighting approaches in automatic text retrieval. Cornell University, NY, USA
Salton G, Wong A et al (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620
Sayyad-Shirabad J, Lethbridge TC et al (1997) A little knowledge can go a long way towards program understanding. Fifth International Workshop on Program Comprehension (IWPC ‘97). IEEE Computer Society Press, Dearborn, MI, USA
Schneidewind N, Kitchenharn B et al (1999) Resolved: software maintenance is nothing more than another form of development. IEEE International Conference on Software Maintenance (ICSM ‘99). IEEE Computer Society Press, Oxford, UK
Shepherd D, Fry Z et al (2007) Using natural language program analysis to locate and understand action-oriented concerns. International Conference on Aspect Oriented Software Development (AOSD’07)
Simonyi C (2005) Intentional programming. www.intentionalsoftware.com
Song D, Bruza P (2001) Discovering information flow suing high dimensional conceptual space. Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New Orleans, LA, USA, pp 327–333
Song D, Bruza P (2003) Towards context-sensitive information inference. J Am Soc Info Sci Technol (JASIST) 4(54):321–334
TREC (2007) Text RETrieval Conference. http://trec.nist.gov/
Wilde N, Scully MC (1995) Software reconnaissance: mapping program features to code. J Softw Maint Res Pract 7(1):49–62
Wilde N, Page H et al (2001) A case study of feature location in unstructured legacy Fortran code. 5th European Conference on Software Maintenance and Reengineering (CSMR 01). IEEE Computer Society Press, Lisbon, Portugal
Zayour L, Lethbridge TC (2001) Adoption of reverse engineering tools a cognitive perspective and methodology. 9th International Workshop on Program Comprehension (IWPC 01). IEEE Computer Society Press, Toronto, Canada
Zhai C, Lafferty J (2004) A study of smoothing methods for language models applied to information retrieval. ACM Trans Info Syst 22(2):179–214
Zhao W, Zhang L et al (2004) SNIAFL: towards a static non-interactive approach to feature location. International Conference on Software Engineering (ICSE 04). ACM/IEEE, Edinburgh, Scotland
Zimmermann T (2006) Knowledge collaboration by mining software repositories. 2nd International Workshop on Supporting Knowledge Collaboration in Software Development (KSCD2006). IEEE/ACM, Tokyo, Japan
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Tim Menzies and Letha Etzkorn
Rights and permissions
About this article
Cite this article
Cleary, B., Exton, C., Buckley, J. et al. An empirical analysis of information retrieval based concept location techniques in software comprehension. Empir Software Eng 14, 93–130 (2009). https://doi.org/10.1007/s10664-008-9095-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-008-9095-3