Do topics make sense to managers and developers?

Hindle, Abram; Bird, Christian; Zimmermann, Thomas; Nagappan, Nachiappan

doi:10.1007/s10664-014-9312-1

Do topics make sense to managers and developers?

Published: 26 June 2014

Volume 20, pages 479–515, (2015)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Abram Hindle¹,
Christian Bird²,
Thomas Zimmermann² &
…
Nachiappan Nagappan²

790 Accesses
23 Citations
1 Altmetric
Explore all metrics

Abstract

Large organizations like Microsoft tend to rely on formal requirements documentation in order to specify and design the software products that they develop. These documents are meant to be tightly coupled with the actual implementation of the features they describe. In this paper we evaluate the value of high-level topic-based requirements traceability and issue report traceability in the version control system, using Latent Dirichlet Allocation (LDA). We evaluate LDA topics on practitioners and check if the topics and trends extracted match the perception that industrial Program Managers and Developers have about the effort put into addressing certain topics. We then replicate this study again on Open Source Developers using issue reports from issue trackers instead of requirements, confirming our previous industrial conclusions. We found that efforts extracted as commits from version control systems relevant to a topic often matched the perception of the managers and developers of what actually occurred at that time. Furthermore we found evidence that many of the identified topics made sense to practitioners and matched their perception of what occurred. But for some topics, we found that practitioners had difficulty interpreting and labelling them. In summary, we investigate the high-level traceability of requirements topics and issue/bug report topics to version control commits via topic analysis and validate with the actual stakeholders the relevance of these topics extracted from requirements and issues.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Exploratory Study on Architectural Knowledge in Issue Tracking Systems

Open Source Is a Continual Bugfixing by a Few

Corrective commit probability: a measure of the effort invested in bug fixing

Article 05 August 2021

Notes

The set of stop words used: http://softwareprocess.es/b/stop_words
git-svn man page: https://www.kernel.org/pub/software/scm/git/docs/git-svn.html
git-grep.pl is located here: https://github.com/abramhindle/gh-lda-extractor
JSON Definition: http://JSON.org
⁵ Github Issue Extractor: https://github.com/abramhindle/github-issues-to-json
⁶ Google Code Issue Extractor: https://github.com/abramhindle/google-code-bug-tracker-downloader
Vowpal Wabbit: https://github.com/JohnLangford/vowpal_wabbit/wiki
Our Github LDA Extractor: https://github.com/abramhindle/gh-lda-extractor
FreeNode: http://freenode.org
IRSSI IRC Client: http://irssi.org/
Asterisk issue tracker guidelines: https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines

References

Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Softw Eng 28(10):970–983
Article Google Scholar
Asuncion A, Welling M, Smyth P, Teh YW (2009) On smoothing and inference for topic models. In: Proceedings of the 25th conference on uncertainty in artificial intelligence.AUAI Press, pp 27–34
Asuncion HU, Asuncion AU, Taylor RN (2010) Software traceability with topic modeling. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering, ICSE ’10, vol 1. ACM, New York, pp 95–104. doi:10.1145/1806799.1806817
Baldi PF, Lopes CV, Linstead EJ, Bajracharya SK (2008) A theory of aspects as latent topics. In: Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications, OOPSLA ’08. ACM, New York, pp 543–562
Google Scholar
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Capiluppi A, Izquierdo-Cortázar D (2013) Effort estimation of floss projects: a study of the linux kernel. Empir Softw Eng 18(1):60–88
Article Google Scholar
Cheng BHC, Atlee JM (2007) Research directions in requirements engineering. In: 2007 future of software engineering, FOSE ’07. IEEE Computer Society, Washington, DC, pp 285–303
Google Scholar
Cleland-Huang J, Settimi R, BenKhadra O, Berezhanskaya E, Christina S (2005) Goal-centric traceability for managing non-functional requirements. In: Proceedings of the 27th international conference on software engineering, ICSE ’05. ACM, New York, pp 362–371
Google Scholar
De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S (2012) Using ir methods for labeling source code artifacts: Is it worthwhile?. In: IEEE 20th international conference on program comprehension (ICPC), 2012. IEEE, pp 193–202
De Lucia A, Marcus A, Oliveto R, Poshyvanyk D (2012) Information retrieval methods for automated traceability recovery. In: Software and systems traceability. Springer, pp 71–98
Ernst N, Mylopoulos J (2010) On the perception of software quality requirements during the project lifecycle. In: Wieringa R, Persson A (eds) Requirements engineering: foundation for software quality. Lecture notes in computer science, vol 6182. Springer, Berlin / Heidelberg, pp 143–157
Google Scholar
Gethers M, Oliveto R, Poshyvanyk D, Lucia AD (2011) On integrating orthogonal information retrieval methods to improve traceability recovery. In: 2011 27th IEEE international conference on software maintenance (ICSM). IEEE, pp 133–142
Grant S, Cordy JR (2010) Estimating the optimal number of latent concepts in source code analysis. In: Proceedings of the 2010 10th IEEE working conference on source code analysis and manipulation, SCAM ’10. IEEE Computer Society, Washington, DC, pp 65–74
Chapter Google Scholar
Hindle A, Bird C, Zimmermann T, Nagappan N (2012) Relating requirements to implementation via topic analysis: Do topics extracted from requirements make sense to managers and developers? In: Proceedings of the 28th IEEE international conference on software maintenance. IEEE
Hindle A, Ernst NA, Godfrey MW, Mylopoulos J (2011) Automated topic naming to support cross-project analysis of software maintenance activities. ACM, New York, pp 163–172
Google Scholar
Hoffman M, Bach FR, Blei DM (2010) Online learning for latent dirichlet allocation. In: Advances in neural information processing systems. pp 856–864
Ko AJ, DeLine R, Venolia G (2007) Information needs in collocated software development teams. In: Proceedings of the 29th international conference on software engineering, ICSE ’07. IEEE Computer Society, Washington, DC, pp 344–353
Google Scholar
Koch S (2008) Effort modeling and programmer participation in open source software projects. Inf Econ Policy 20(4):345–355
Article Google Scholar
Konrad S, Cheng B (2006) Automated analysis of natural language properties for uml models. In: Bruel JM (ed) Satellite events at the MoDELS 2005 conference. Lecture notes in computer science, vol 3844. Springer, Berlin / Heidelberg, pp 48–57
Google Scholar
Kozlenkov A, Zisman A (2002) Are their design specifications consistent with our requirements? In: Proceedings of the 10th anniversary IEEE joint international conference on requirements engineering, RE ’02. IEEE Computer Society, Washington, DC, pp 145–156
Chapter Google Scholar
Kuhn A, Ducasse S, Gírba T (2007) Semantic clustering: identifying topics in source code. Inf Softw Technol 49(3):230–243
Article Google Scholar
Lukins SK, Kraft NA, Etzkorn LH (2008) Source code retrieval for bug localization using latent dirichlet allocation. In: Proceedings of the 2008 15th working conference on reverse engineering, WCRE ’08. IEEE Computer Society, Washington, DC, pp 155–164
Chapter Google Scholar
Marcus A, Maletic JI (2003) Recovering documentation-to-source-code traceability links using latent semantic indexing. In: Proceedings 25th international conference on software engineering, 2003. IEEE, pp 125–135
Marcus A, Sergeyev A, Rajlich V, Maletic JI (2004) An information retrieval approach to concept location in source code. In: Proceedings of the 11th working conference on reverse engineering, WCRE ’04. IEEE Computer Society, Washington, DC, pp 214–223
Chapter Google Scholar
McMillan C, Poshyvanyk D, Revelle M (2009) Combining textual and structural analysis of software artifacts for traceability link recovery. In: Proceedings of the 2009 ICSE workshop on traceability in emerging forms of software engineering, TEFSE ’09. IEEE Computer Society, Washington, DC, pp 41–48
Chapter Google Scholar
Murphy GC, Notkin D, Sullivan KJ (2001) Software reflexion models: bridging the gap between design and implementation. IEEE Trans Softw Eng 27(4):364–380. doi:10.1109/32.917525
Article Google Scholar
Panichella A, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 522–531
Poshyvanyk D (2008) Using information retrieval to support software maintenance tasks, Ph.D. thesis, Wayne State University, Detroit, MI, USA
Ramage D, Dumais ST, Liebling DJ (2010) Characterizing microblogs with topic models. In: ICWSM
Ramesh B (1998) Factors influencing requirements traceability practice. Commun ACM 41(12):37–44. doi:10.1145/290133.290147
Article Google Scholar
Reiss, SP (2006) Incremental maintenance of software artifacts. IEEE Trans. Softw. Eng. 32(9):682–697. doi:10.1109/TSE.2006.91
Article Google Scholar
Sabetzadeh M, Easterbrook S (2005) Traceability in viewpoint merging: a model management perspective. In: Proceedings of the 3rd international workshop on traceability in emerging forms of software engineering, TEFSE ’05. ACM, New York, pp 44–49
Chapter Google Scholar
Savage T, Dit B, Gethers M, Poshyvanyk D (2010) Topicxp: exploring topics in source code using latent dirichlet allocation. In: Proceedings of the 2010 IEEE international conference on software maintenance, ICSM ’10. IEEE Computer Society, Washington, DC, pp 1–6
Chapter Google Scholar
Shull F, Singer J, Sjberg DIK (2010) Guide to advanced empirical software engineering, 1st edn. Springer Publishing Company Incorporated
Sneed HM (2007) Testing against natural language requirements. In: Proceedings of the 7th international conference on quality software, QSIC ’07. IEEE Computer Society, Washington, DC, pp 380–387
Google Scholar
Thomas SW, Adams B, Hassan AE, Blostein D (2010) Validating the use of topic models for software evolution. In: Proceedings of the 2010 10th IEEE working conference on source code analysis and manipulation, SCAM ’10. IEEE Computer Society, Washington, DC, pp 55–64
Chapter Google Scholar
Thomas SW, Adams B, Hassan AE, Blostein D (2011) Modeling the evolution of topics in source code histories. In: Proceedings of the 8th working conference on mining software repositories, MSR ’11. ACM, New York, pp 173–182
Chapter Google Scholar
Tillmann N., Chen F., Schulte W. (2006) Discovering likely method specifications. In: Liu Z., He J. (eds) Formal methods and software engineering. Lecture notes in computer science, vol 4260. Springer, Berlin / Heidelberg, pp 717–736
Google Scholar
Wiegers KE (2003) Software requirements, 2nd edn. Microsoft Press, Redmond
Google Scholar
Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2000) Experimentation in software engineering: an introduction. Kluwer Academic Publishers, Norwell
Book Google Scholar

Download references

Acknowledgments

Thanks to the many managers and developers at Microsoft who volunteered their time to participate in our research and provide their valuable insights and feedback. Abram Hindle performed some of this work as a visiting researcher at Microsoft Research. Thanks to the Natural Sciences and Engineering Research Council of Canada for partially funding this work. Thanks to Abram Hindle’s first student, Zhang Chenlei, for his feedback. Thanks to the FLOSS developers who chose to participate: Julian Harty, Lisa Milne, Tobias Leich, Ian Cordasco, Ricky Elrod, Anthony Grimes, Geoffrey Greer, Nicolas J. Bouliane, Drew DeVault, Daniel Huckstep, Chad Whitacre, Devin Joel Austin, and Gerson Goulart.

Author information

Authors and Affiliations

Department of Computing Science, University of Alberta, Edmonton, Canada
Abram Hindle
Microsoft Research, Redmond, WA, USA
Christian Bird, Thomas Zimmermann & Nachiappan Nagappan

Authors

Abram Hindle
View author publications
You can also search for this author in PubMed Google Scholar
Christian Bird
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Zimmermann
View author publications
You can also search for this author in PubMed Google Scholar
Nachiappan Nagappan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abram Hindle.

Additional information

Communicated by: Massimiliano Di Penta and Jonathan Maletic

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hindle, A., Bird, C., Zimmermann, T. et al. Do topics make sense to managers and developers?. Empir Software Eng 20, 479–515 (2015). https://doi.org/10.1007/s10664-014-9312-1

Download citation

Published: 26 June 2014
Issue Date: April 2015
DOI: https://doi.org/10.1007/s10664-014-9312-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Do topics make sense to managers and developers?

Abstract

Access this article

Similar content being viewed by others

An Exploratory Study on Architectural Knowledge in Issue Tracking Systems

Open Source Is a Continual Bugfixing by a Few

Corrective commit probability: a measure of the effort invested in bug fixing

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Do topics make sense to managers and developers?

Abstract

Access this article

Similar content being viewed by others

An Exploratory Study on Architectural Knowledge in Issue Tracking Systems

Open Source Is a Continual Bugfixing by a Few

Corrective commit probability: a measure of the effort invested in bug fixing

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation