Skip to main content
Log in

Visual exploration of software evolution via topic modeling

  • Regular Paper
  • Published:
Journal of Visualization Aims and scope Submit manuscript

Abstract

For various reasons, such as new requirements, architecture refactoring, and bug fixing, software projects often evolve to yield better quality and performance. All changes produced during the development process are reflected in the source code, which provides an opportunity to explore software evolution. In this paper, we propose a visual analytics system to support evolution analysis based on topic modeling. We focus on three aspects: (1) when significant changes to source code occur, (2) how software features evolve, and (3) why software evolution occurs. Each source file is regarded as a document and represented by its topic vector. The files of each two successive versions are classified into four types to quantify version differences, and the number of topic-associated files is denoted as the topic assignment to characterize feature evolution. Finally, we inspect the causes of software evolution through the visual comparison between versions. Two case studies on JavaScript libraries demonstrate the usefulness and effectiveness of our system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. https://github.com/d3/d3.

  2. https://github.com/vuejs/vue.

  3. https://medium.com/the-vue-point/vue-2-0-is-here-ef1f26acf4b8.

References

  • Alcocer JPS, Beck F, Bergel A (2019) Performance evolution matrix: visualizing performance variations along software versions. In: 2019 Working conference on software visualization (VISSOFT), pp. 1–11. IEEE

  • Banitaan S, Alenezi M (2015) Software evolution via topic modeling: an analytic study. Int J Softw Eng Appl 9(5):43–52

    Google Scholar 

  • Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022

    MATH  Google Scholar 

  • Bolte F, Bruckner S (2020) Vis-a-vis: visual exploration of visualization source code evolution. IEEE Trans Vis Comput Gr

  • Burch M, Munz T, Beck F, Weiskopf D (2015) Visualizing work processes in software engineering with developer rivers. In: 2015 IEEE 3rd working conference on software visualization (VISSOFT), pp. 116–124. IEEE

  • Carreño LVG, Winbladh K (2013) Analysis of user comments: an approach for software requirements evolution. In: 2013 35th international conference on software engineering (ICSE), pp. 582–591. IEEE

  • Chen TH, Thomas SW, Nagappan M, Hassan AE (2012) Explaining software defects using topic models. In: 2012 9th IEEE working conference on mining software repositories (MSR), pp. 189–198. IEEE

  • Chotisarn N, Merino L, Zheng X, Lonapalawong S, Zhang T, Xu M, Chen W (2020) A systematic literature review of modern software visualization. J Vis 23(4):539–558

    Article  Google Scholar 

  • Gethers M, Poshyvanyk D (2010) Using relational topic models to capture coupling among classes in object-oriented software systems. In: 2010 IEEE international conference on software maintenance, pp. 1–10. IEEE

  • Gleicher M, Albers D, Walker R, Jusufi I, Hansen CD, Roberts JC (2011) Visual comparison for information visualization. Inf Vis 10(4):289–309

    Article  Google Scholar 

  • Göde N, Koschke R (2009) Incremental clone detection. In: 2009 13th European conference on software maintenance and reengineering, pp. 219–228. IEEE

  • Havre S, Hetzler E, Whitney P, Nowell L (2002) Themeriver: visualizing thematic changes in large document collections. IEEE Trans Vis Comput Gr 8(1):9–20

    Article  Google Scholar 

  • Hindle A, Godfrey MW, Holt RC (2009) What’s hot and what’s not: Windowed developer topic analysis. In: 2009 IEEE international conference on software maintenance, pp. 339–348. IEEE

  • Hu J, Sun X, Li B (2015) Explore the evolution of development topics via on-line LDA. In: 2015 IEEE 22nd international conference on software analysis, evolution, and reengineering (SANER), pp. 555–559. IEEE

  • Hu J, Sun X, Lo D, Li B (2015) Modeling the evolution of development topics using dynamic topic models. In: 2015 IEEE 22nd international conference on software analysis, evolution, and reengineering (SANER), pp. 3–12. IEEE

  • Ishio T, Maeda N, Shibuya K, Inoue K (2018) Cloned buggy code detection in practice using normalized compression distance. In: 2018 IEEE international conference on software maintenance and evolution (ICSME), pp. 591–594. IEEE

  • Jelodar H, Wang Y, Yuan C, Feng X, Jiang X, Li Y, Zhao L (2019) Latent dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimed Tools Appl 78(11):15169–15211

    Article  Google Scholar 

  • Juričić V (2011) Detecting source code similarity using low-level languages. In: Proceedings of the ITI 2011, 33rd international conference on information technology interfaces, pp. 597–602. IEEE

  • Kamiya T, Kusumoto S, Inoue K (2002) Ccfinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 28(7):654–670

    Article  Google Scholar 

  • Kawamitsu N, Ishio T, Kanda T, Kula RG, De Roover C, Inoue K (2014) Identifying source code reuse across repositories using LCS-based source code similarity. In: 2014 IEEE 14th international working conference on source code analysis and manipulation, pp. 305–314. IEEE

  • Linstead E, Lopes C, Baldi P (2008) An application of latent dirichlet allocation to analyzing software evolution. In: 2008 seventh international conference on machine learning and applications, pp. 813–818. IEEE

  • Linstead E, Rigor P, Bajracharya S, Lopes C, Baldi P (2007) Mining concepts from code with probabilistic topic models. In: Proceedings of the twenty-second IEEE/ACM international conference on automated software engineering, pp. 461–464

  • Liu S, Cui W, Wu Y, Liu M (2014) A survey on information visualization: recent advances and challenges. Vis Comput 30(12):1373–1393

    Article  Google Scholar 

  • Lukins SK, Kraft NA, Etzkorn LH (2010) Bug localization using latent dirichlet allocation. Inf Softw Technol 52(9):972–990

    Article  Google Scholar 

  • Nam D, Lee YK, Medvidovic N (2018) Eva: a tool for visualizing software architectural evolution. In: Proceedings of the 40th international conference on software engineering: companion proceeedings, pp. 53–56

  • Novais RL, Torres A, Mendes TS, Mendonça M, Zazworka N (2013) Software evolution visualization: a systematic mapping study. Inf Softw Technol 55(11):1860–1883

    Article  Google Scholar 

  • Ogawa M, Ma KL (2010) Software evolution storylines. In: Proceedings of the 5th international symposium on Software visualization, pp. 35–42

  • Popescu DA, Nicolae D (2014) Determining the similarity of two web applications using the edit distance. In: International workshop soft computing applications, pp. 681–690. Springer

  • Ragkhitwetsagul C, Krinke J, Clark D (2018) A comparison of code similarity analysers. Empir Softw Eng 23(4):2464–2519

    Article  Google Scholar 

  • Schneider T, Tymchuk Y, Salgado R, Bergel A (2016) Cuboidmatrix: exploring dynamic structural connections in software components using space-time cube. In: 2016 IEEE working conference on software visualization (VISSOFT), pp. 116–125. IEEE

  • Sun X, Liu X, Li B, Duan Y, Yang H, Hu J (2016) Exploring topic models in software engineering data analysis: a survey. In: 2016 17th IEEE/ACIS international conference on software engineering, artificial intelligence, networking and parallel/distributed computing (SNPD), pp. 357–362. IEEE

  • Telea A, Auber D (2008) Code flows: visualizing structural evolution of source code. Comput Gr Forum 27(3):831–838

    Article  Google Scholar 

  • Thomas SW, Adams B, Hassan AE, Blostein D (2010) Validating the use of topic models for software evolution. In: 2010 10th IEEE working conference on source code analysis and manipulation, pp. 55–64. IEEE

  • Thomas SW, Adams B, Hassan AE, Blostein D (2011) Modeling the evolution of topics in source code histories. In: Proceedings of the 8th working conference on mining software repositories, pp. 173–182

  • Thomas SW, Adams B, Hassan AE, Blostein D (2014) Studying software evolution using topic models. Sci Comput Program 80:457–479

    Article  Google Scholar 

  • Vincúr J, Návrat P, Polasek I (2017) Vr city: software analysis in virtual reality environment. In: 2017 IEEE international conference on software quality, reliability and security companion (QRS-C), pp. 509–516. IEEE

  • Wittenhagen M, Cherek C, Borchers J (2016) Chronicler: interactive exploration of source code history. In: Proceedings of the 2016 CHI conference on human factors in computing systems, pp. 3522–3532

  • Yoon Y, Myers BA, Koo S (2013) Visualization of fine-grained code change history. In: 2013 IEEE symposium on visual languages and human centric computing, pp. 119–126. IEEE

Download references

Acknowledgements

This work was supported by the National Key Research & Development Program of China (2017YFB0202203) and National Natural Science Foundation of China (61672452, 61890954, and 61972343).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yubo Tao or Hai Lin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, H., Tao, Y., Qiu, Y. et al. Visual exploration of software evolution via topic modeling. J Vis 24, 827–844 (2021). https://doi.org/10.1007/s12650-020-00739-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12650-020-00739-7

Keywords

Navigation