Visual exploration of software evolution via topic modeling


For various reasons, such as new requirements, architecture refactoring, and bug fixing, software projects often evolve to yield better quality and performance. All changes produced during the development process are reflected in the source code, which provides an opportunity to explore software evolution. In this paper, we propose a visual analytics system to support evolution analysis based on topic modeling. We focus on three aspects: (1) when significant changes to source code occur, (2) how software features evolve, and (3) why software evolution occurs. Each source file is regarded as a document and represented by its topic vector. The files of each two successive versions are classified into four types to quantify version differences, and the number of topic-associated files is denoted as the topic assignment to characterize feature evolution. Finally, we inspect the causes of software evolution through the visual comparison between versions. Two case studies on JavaScript libraries demonstrate the usefulness and effectiveness of our system.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9


  1. 1.

  2. 2.

  3. 3.


  1. Alcocer JPS, Beck F, Bergel A (2019) Performance evolution matrix: visualizing performance variations along software versions. In: 2019 Working conference on software visualization (VISSOFT), pp. 1–11. IEEE

  2. Banitaan S, Alenezi M (2015) Software evolution via topic modeling: an analytic study. Int J Softw Eng Appl 9(5):43–52

    Google Scholar 

  3. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022

    MATH  Google Scholar 

  4. Bolte F, Bruckner S (2020) Vis-a-vis: visual exploration of visualization source code evolution. IEEE Trans Vis Comput Gr

  5. Burch M, Munz T, Beck F, Weiskopf D (2015) Visualizing work processes in software engineering with developer rivers. In: 2015 IEEE 3rd working conference on software visualization (VISSOFT), pp. 116–124. IEEE

  6. Carreño LVG, Winbladh K (2013) Analysis of user comments: an approach for software requirements evolution. In: 2013 35th international conference on software engineering (ICSE), pp. 582–591. IEEE

  7. Chen TH, Thomas SW, Nagappan M, Hassan AE (2012) Explaining software defects using topic models. In: 2012 9th IEEE working conference on mining software repositories (MSR), pp. 189–198. IEEE

  8. Chotisarn N, Merino L, Zheng X, Lonapalawong S, Zhang T, Xu M, Chen W (2020) A systematic literature review of modern software visualization. J Vis 23(4):539–558

    Article  Google Scholar 

  9. Gethers M, Poshyvanyk D (2010) Using relational topic models to capture coupling among classes in object-oriented software systems. In: 2010 IEEE international conference on software maintenance, pp. 1–10. IEEE

  10. Gleicher M, Albers D, Walker R, Jusufi I, Hansen CD, Roberts JC (2011) Visual comparison for information visualization. Inf Vis 10(4):289–309

    Article  Google Scholar 

  11. Göde N, Koschke R (2009) Incremental clone detection. In: 2009 13th European conference on software maintenance and reengineering, pp. 219–228. IEEE

  12. Havre S, Hetzler E, Whitney P, Nowell L (2002) Themeriver: visualizing thematic changes in large document collections. IEEE Trans Vis Comput Gr 8(1):9–20

    Article  Google Scholar 

  13. Hindle A, Godfrey MW, Holt RC (2009) What’s hot and what’s not: Windowed developer topic analysis. In: 2009 IEEE international conference on software maintenance, pp. 339–348. IEEE

  14. Hu J, Sun X, Li B (2015) Explore the evolution of development topics via on-line LDA. In: 2015 IEEE 22nd international conference on software analysis, evolution, and reengineering (SANER), pp. 555–559. IEEE

  15. Hu J, Sun X, Lo D, Li B (2015) Modeling the evolution of development topics using dynamic topic models. In: 2015 IEEE 22nd international conference on software analysis, evolution, and reengineering (SANER), pp. 3–12. IEEE

  16. Ishio T, Maeda N, Shibuya K, Inoue K (2018) Cloned buggy code detection in practice using normalized compression distance. In: 2018 IEEE international conference on software maintenance and evolution (ICSME), pp. 591–594. IEEE

  17. Jelodar H, Wang Y, Yuan C, Feng X, Jiang X, Li Y, Zhao L (2019) Latent dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimed Tools Appl 78(11):15169–15211

    Article  Google Scholar 

  18. Juričić V (2011) Detecting source code similarity using low-level languages. In: Proceedings of the ITI 2011, 33rd international conference on information technology interfaces, pp. 597–602. IEEE

  19. Kamiya T, Kusumoto S, Inoue K (2002) Ccfinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 28(7):654–670

    Article  Google Scholar 

  20. Kawamitsu N, Ishio T, Kanda T, Kula RG, De Roover C, Inoue K (2014) Identifying source code reuse across repositories using LCS-based source code similarity. In: 2014 IEEE 14th international working conference on source code analysis and manipulation, pp. 305–314. IEEE

  21. Linstead E, Lopes C, Baldi P (2008) An application of latent dirichlet allocation to analyzing software evolution. In: 2008 seventh international conference on machine learning and applications, pp. 813–818. IEEE

  22. Linstead E, Rigor P, Bajracharya S, Lopes C, Baldi P (2007) Mining concepts from code with probabilistic topic models. In: Proceedings of the twenty-second IEEE/ACM international conference on automated software engineering, pp. 461–464

  23. Liu S, Cui W, Wu Y, Liu M (2014) A survey on information visualization: recent advances and challenges. Vis Comput 30(12):1373–1393

    Article  Google Scholar 

  24. Lukins SK, Kraft NA, Etzkorn LH (2010) Bug localization using latent dirichlet allocation. Inf Softw Technol 52(9):972–990

    Article  Google Scholar 

  25. Nam D, Lee YK, Medvidovic N (2018) Eva: a tool for visualizing software architectural evolution. In: Proceedings of the 40th international conference on software engineering: companion proceeedings, pp. 53–56

  26. Novais RL, Torres A, Mendes TS, Mendonça M, Zazworka N (2013) Software evolution visualization: a systematic mapping study. Inf Softw Technol 55(11):1860–1883

    Article  Google Scholar 

  27. Ogawa M, Ma KL (2010) Software evolution storylines. In: Proceedings of the 5th international symposium on Software visualization, pp. 35–42

  28. Popescu DA, Nicolae D (2014) Determining the similarity of two web applications using the edit distance. In: International workshop soft computing applications, pp. 681–690. Springer

  29. Ragkhitwetsagul C, Krinke J, Clark D (2018) A comparison of code similarity analysers. Empir Softw Eng 23(4):2464–2519

    Article  Google Scholar 

  30. Schneider T, Tymchuk Y, Salgado R, Bergel A (2016) Cuboidmatrix: exploring dynamic structural connections in software components using space-time cube. In: 2016 IEEE working conference on software visualization (VISSOFT), pp. 116–125. IEEE

  31. Sun X, Liu X, Li B, Duan Y, Yang H, Hu J (2016) Exploring topic models in software engineering data analysis: a survey. In: 2016 17th IEEE/ACIS international conference on software engineering, artificial intelligence, networking and parallel/distributed computing (SNPD), pp. 357–362. IEEE

  32. Telea A, Auber D (2008) Code flows: visualizing structural evolution of source code. Comput Gr Forum 27(3):831–838

    Article  Google Scholar 

  33. Thomas SW, Adams B, Hassan AE, Blostein D (2010) Validating the use of topic models for software evolution. In: 2010 10th IEEE working conference on source code analysis and manipulation, pp. 55–64. IEEE

  34. Thomas SW, Adams B, Hassan AE, Blostein D (2011) Modeling the evolution of topics in source code histories. In: Proceedings of the 8th working conference on mining software repositories, pp. 173–182

  35. Thomas SW, Adams B, Hassan AE, Blostein D (2014) Studying software evolution using topic models. Sci Comput Program 80:457–479

    Article  Google Scholar 

  36. Vincúr J, Návrat P, Polasek I (2017) Vr city: software analysis in virtual reality environment. In: 2017 IEEE international conference on software quality, reliability and security companion (QRS-C), pp. 509–516. IEEE

  37. Wittenhagen M, Cherek C, Borchers J (2016) Chronicler: interactive exploration of source code history. In: Proceedings of the 2016 CHI conference on human factors in computing systems, pp. 3522–3532

  38. Yoon Y, Myers BA, Koo S (2013) Visualization of fine-grained code change history. In: 2013 IEEE symposium on visual languages and human centric computing, pp. 119–126. IEEE

Download references


This work was supported by the National Key Research & Development Program of China (2017YFB0202203) and National Natural Science Foundation of China (61672452, 61890954, and 61972343).

Author information



Corresponding authors

Correspondence to Yubo Tao or Hai Lin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Liu, H., Tao, Y., Qiu, Y. et al. Visual exploration of software evolution via topic modeling. J Vis (2021).

Download citation


  • Software evolution
  • Code topics
  • Software visualization