Empirical Software Engineering

, Volume 19, Issue 3, pp 465–500

Configuring latent Dirichlet allocation based feature location

  • Lauren R. Biggers
  • Cecylia Bocovich
  • Riley Capshaw
  • Brian P. Eddy
  • Letha H. Etzkorn
  • Nicholas A. Kraft
Article

DOI: 10.1007/s10664-012-9224-x

Cite this article as:
Biggers, L.R., Bocovich, C., Capshaw, R. et al. Empir Software Eng (2014) 19: 465. doi:10.1007/s10664-012-9224-x

Abstract

Feature location is a program comprehension activity, the goal of which is to identify source code entities that implement a functionality. Recent feature location techniques apply text retrieval models such as latent Dirichlet allocation (LDA) to corpora built from text embedded in source code. These techniques are highly configurable, and the literature offers little insight into how different configurations affect their performance. In this paper we present a study of an LDA based feature location technique (FLT) in which we measure the performance effects of using different configurations to index corpora and to retrieve 618 features from 6 open source Java systems. In particular, we measure the effects of the query, the text extractor configuration, and the LDA parameter values on the accuracy of the LDA based FLT. Our key findings are that exclusion of comments and literals from the corpus lowers accuracy and that heuristics for selecting LDA parameter values in the natural language context are suboptimal in the source code context. Based on the results of our case study, we offer specific recommendations for configuring the LDA based FLT.

Keywords

Software evolution Program comprehension Feature location Static analysis Text retrieval 

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Lauren R. Biggers
    • 1
  • Cecylia Bocovich
    • 2
    • 5
  • Riley Capshaw
    • 3
  • Brian P. Eddy
    • 1
  • Letha H. Etzkorn
    • 4
  • Nicholas A. Kraft
    • 1
  1. 1.Department of Computer ScienceThe University of AlabamaTuscaloosaUSA
  2. 2.Department of Mathematics, Statistics, and Computer ScienceMacalester CollegeSaint PaulUSA
  3. 3.Department of Mathematics & Computer ScienceHendrix CollegeConwayUSA
  4. 4.Department of Computer ScienceThe University of Alabama in HuntsvilleHuntsvilleUSA
  5. 5.David R. Cheriton School of Computer ScienceUniversity of WaterlooWaterlooCanada

Personalised recommendations