Advertisement

Knowledge and Information Systems

, Volume 49, Issue 2, pp 629–659 | Cite as

A feature location approach supported by time-aware weighting of terms associated with developer expertise profiles

  • Sima Zamani
  • Sai Peck Lee
  • Ramin Shokripour
  • John Anvik
Regular Paper

Abstract

Feature location is a frequent software maintenance activity that aims to identify initial source code location pertinent to a software feature. Most of feature location approaches are based, at least in part, on text analysis methods which originate from the natural language context. However, the natural language context and the text data in software repositories have different properties that reveal the need for adaption of the methods to apply in the context of software repositories. One of the differences is the existence of a set of metadata, such as developer information and time stamp, which is associated with the data in the repositories. However, this difference has not been fully considered in previous feature location research studies. This study proposes a feature location approach that analyzes developer expertise profiles, which contain source code entities modified by the associated software developers, to identify the most similar location pertinent to a desired feature. This approach uses a time-aware term-weighting technique to determine the similarity. An experimental evaluation on four open-source projects shows an improvement in the accuracy, performance, and effectiveness up to 55, 39, and 29 %, respectively, compared to the high-performing information retrieval methods used in feature location. Moreover, the proposed time-aware technique increases the accuracy, performance, and effectiveness of the typical term-weighting technique, tf-idf, as much as 15, 11, and 13 %, respectively. Finally, the proposed approach outperforms our previous approach, noun-based feature location, as much as 17 %. These experimental results demonstrate that time-aware analysis of developers’ expertise significantly improves the feature location process.

Keywords

Mining software repositories Text analysis Term weighting Time aware Developer expertise 

Notes

Acknowledgments

This work is carried out within the framework of the research project supported by High Impact Research Grant with reference UM.C/625/1/HIR/MOHE/FCSIT/13, funded by the Ministry of Education, Malaysia.

References

  1. 1.
    Abebe SL, Tonella P (2010) Natural language parsing of program element names for concept extraction. In: IEEE 18th international conference on program comprehension (ICPC). IEEE, pp 156–159Google Scholar
  2. 2.
    Anvik J (2006) Automating bug report assignment. In: Proceedings of the 28th international conference on software engineering (ICSE). ACM, pp 937–940Google Scholar
  3. 3.
    Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug? In: Proceedings of the 28th international conference on software engineering, ICSE ’06, New York, NY, USA. ACM, pp 361–370. ISBN: 1-59593-375-1. doi: 10.1145/1134285.1134336
  4. 4.
    Bacchelli A, Lanza M, Robbes R (2010) Linking e-mails and source code artifacts. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering, vol 1. ACM, pp 375–384Google Scholar
  5. 5.
    Baeza-Yates RA, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley Longman Publishing Co., Inc, Boston. ISBN: 020139829XGoogle Scholar
  6. 6.
    Bai J, Nie J-Y, Paradis F (2004) Using language models for text classification. In: Asia information retrieval symposium (AIRS), Beijing, ChinaGoogle Scholar
  7. 7.
    Biggerstaff TJ, Mitbander BG, Webster D (1993) The concept assignment problem in program understanding. In: Proceedings of the 15th international conference on software engineering (ICSE). IEEE Computer Society Press, pp 482–498Google Scholar
  8. 8.
    Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022zbMATHGoogle Scholar
  9. 9.
    Butler S, Wermelinger M, Yu Y, Sharp H (2011) Improving the tokenisation of identifier names. In: ECOOP 2011-object-oriented programming, pp 130–154Google Scholar
  10. 10.
    Capobianco G, Lucia AD, Oliveto R, Panichella A, Panichella S (2013) Improving IR-based traceability recovery via noun-based indexing of software artifacts. J Softw Evol Process 25(7):743–762CrossRefGoogle Scholar
  11. 11.
    Cleary B, Exton C, Buckley J, English M (2009) An empirical analysis of information retrieval based concept location techniques in software comprehension. Empir Softw Eng 14(1):93–130CrossRefGoogle Scholar
  12. 12.
    Cunningham H, Maynard D, Bontcheva K, Tablan V (2002) Gate: an architecture for development of robust hlt applications. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 168–175Google Scholar
  13. 13.
    Dit B, Moritz E, Poshyvanyk D (2012) A tracelab-based solution for creating, conducting, and sharing feature location experiments. In: 2012 IEEE 20th international conference on program comprehension (ICPC). IEEE, pp 203–208Google Scholar
  14. 14.
    Dit B, Revelle M, Gethers M, Poshyvanyk D (2013a) Feature location in source code: a taxonomy and survey. J Softw Evol Process 25(1):53–95CrossRefGoogle Scholar
  15. 15.
    Dit B, Revelle M, Poshyvanyk D (2013b) Integrating information retrieval, execution and link analysis algorithms to improve feature location in software. Empir Softw Eng 18(2):277–309CrossRefGoogle Scholar
  16. 16.
    Gay G, Haiduc S, Marcus A, Menzies T (2009) On the use of relevance feedback in IR-based concept location. In: ICSM 2009. IEEE international conference on software maintenance (ICSM). IEEE, pp 351–360Google Scholar
  17. 17.
    Gómez VU, Kellens A, Brichau J, D’Hondt T (2009) Time warp, an approach for reasoning over system histories. In: Proceedings of the joint international and annual ERCIM workshops on principles of software evolution (IWPSE) and software evolution (Evol) workshops. ACM, pp 79–88Google Scholar
  18. 18.
    Hill E, Pollock L, Vijay-Shanker K (2009) Automatically capturing source code context of nl-queries for software maintenance and reuse. In: Proceedings of the 31st international conference on software engineering (ICSE). IEEE Computer Society, pp 232–242Google Scholar
  19. 19.
    Hossen K, Kagdi HH, Poshyvanyk D (2014) Amalgamating source code authors, maintainers, and change proneness to triage change requests. In: ICPC, pp 130–141Google Scholar
  20. 20.
    Kagdi H, Maletic JI, Sharif B (2007) Mining software repositories for traceability links. In: ICPC’07. 15th IEEE international conference on program comprehension (ICPC). IEEE, pp 145–154Google Scholar
  21. 21.
    Kagdi H, Gethers M, Poshyvanyk D, Hammad M (2012) Assigning change requests to software developers. J Softw Evol Process 24(1):3–33CrossRefGoogle Scholar
  22. 22.
    Liu D, Marcus A, Poshyvanyk D, Rajlich V (2007) Feature location via information retrieval based filtering of a single scenario execution trace. In: Proceedings of the twenty-second IEEE/ACM international conference on automated software engineering. ACM, pp 234–243Google Scholar
  23. 23.
    Lukins SK, Kraft NA, Etzkorn LH (2010) Bug localization using latent dirichlet allocation. Inf Softw Technol 52(9):972–990CrossRefGoogle Scholar
  24. 24.
    Manning CD, Raghavan P, Schutze H (2008) Introduction to information retrieval, vol 1. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
  25. 25.
    Petrenko M, Rajlich V, Vanciu R (2008) Partial domain comprehension in software evolution and maintenance. In: ICPC 2008. The 16th IEEE international conference on program comprehension (ICPC). IEEE, pp 13–22Google Scholar
  26. 26.
    Poshyvanyk D, Guéhéneuc Y-G, Marcus A, Antoniol G, Rajlich V (2006) Combining probabilistic ranking and latent semantic indexing for feature identification. In: ICPC 2006. 14th IEEE international conference on program comprehension (ICPC). IEEE, pp 137–148Google Scholar
  27. 27.
    Poshyvanyk D, Guéhéneuc Y-G, Marcus A, Antoniol G, Rajlich V (2007) Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans Softw Eng 33(6):420–432CrossRefGoogle Scholar
  28. 28.
    Poshyvanyk D, Gethers M, Marcus A (2012) Concept location using formal concept analysis and information retrieval. ACM Trans Softw Eng Methodol (TOSEM) 21(4):23CrossRefGoogle Scholar
  29. 29.
    Rao S, Kak A (2011) Retrieval from software libraries for bug localization: a comparative study of generic and composite text models. In: Proceeding of the 8th working conference on mining software repositories (MSR), pp 43–52 (2011)Google Scholar
  30. 30.
    Ratanotayanon S, Choi HJ, Sim SE (2010) Using transitive changesets to support feature location. In: Proceedings of the IEEE/ACM international conference on automated software engineering. ACM, pp 341–344Google Scholar
  31. 31.
    Ratiu D, Deissenboeck F (2007) From reality to programs and (not quite) back again. In: ICPC’07. 15th IEEE international conference on program comprehension. IEEE, pp 91–102Google Scholar
  32. 32.
    Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620. doi: 10.1145/361219.361220 (ISSN 0001–0782)CrossRefzbMATHGoogle Scholar
  33. 33.
    Schuler D, Zimmermann T (2008) Mining usage expertise from version archives. In: Proceedings of the 2008 international working conference on mining software repositories. ACM, pp 121–124Google Scholar
  34. 34.
    Servant F, Jones JA (2012) Whosefault: automatic developer-to-fault assignment through fault localization. In: 2012 34th International conference on software engineering (ICSE), pp 36–46Google Scholar
  35. 35.
    Shepherd D, Fry ZP, Hill E, Pollock L, Vijay-Shanker K (2007) Using natural language program analysis to locate and understand action-oriented concerns. In: Proceedings of the 6th international conference on aspect-oriented software development. ACM, pp 212–224Google Scholar
  36. 36.
    Shokripour R, Anvik J, Kasirun ZM, Zamani S (2013) Why so complicated? Simple term filtering and weighting for location-based bug report assignment recommendation. In: Proceedings of the tenth international workshop on mining software repositories. IEEE Press, pp 2–11Google Scholar
  37. 37.
    Ramin S, John A, Kasirun ZM, Zamani S (2014) Improving automatic bug assignment using time-metadata in term-weighting. Institution of Engineering and Technology, IET (2014)Google Scholar
  38. 38.
    Wang S, Lo D, Xing Z, Jiang L (2011) Concern localization using information retrieval: an empirical study on linux kernel. In: 18th Working conference on reverse engineering (WCRE2011). IEEE, pp 92–96Google Scholar
  39. 39.
    Wilde N, Scully MC (1995) Software reconnaissance: mapping program features to code. J Softw Maint Res Pract 7(1):49–62CrossRefGoogle Scholar
  40. 40.
    Wohlin C, Runeson P, Hst M, Ohlsson MC, Regnell B, Wessln A (2012) Experimentation in software engineering. Springer Publishing Company, Incorporated. ISBN: 3642290434, 9783642290435Google Scholar
  41. 41.
    Zamani S, Lee SP, Shokripour R, Anvik J (2014) A noun-based approach to feature location using time-aware term-weighting. Inf Softw Technol 56(8):991–1011CrossRefGoogle Scholar
  42. 42.
    Zhai Chengxiang, Lafferty John (2004) A study of smoothing methods for language models applied to information retrieval. ACM Trans Inf Syst (TOIS) 22(2):179–214CrossRefGoogle Scholar
  43. 43.
    Zhou J, Zhang H, Lo D (2012) Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. In: 34th International conference on software engineering (ICSE). IEEE, pp 14–24Google Scholar

Copyright information

© Springer-Verlag London 2016

Authors and Affiliations

  1. 1.Faculty of Computer Science and Information TechnologyUniversity of MalayaKuala LumpurMalaysia
  2. 2.Information Technology DepartmentIran Telecommunication Research CenterTehranIran
  3. 3.Mathematics and Computer ScienceUniversity of LethbridgeLethbridgeCanada

Personalised recommendations