Skip to main content

Building the MSR Tool Kaiaulu: Design Principles and Experiences

  • Conference paper
  • First Online:
Software Architecture (ECSA 2021)


Background: Since Alitheia Core was proposed and subsequently retired, tools that support empirical studies of software projects continue to be proposed, such as Codeface, Codeface4Smells, GrimoireLab and SmartSHARK, but they all make different design choices and provide overlapping functionality. Aims: We seek to understand the design decisions adopted by these tools–the good and the bad–along with their consequences, to understand why their authors reinvented functionality already present in other tools, and to help inform the design of future tools. Method: We used action research to evaluate the tools, and to determine a set of principles and anti-patterns to motivate a new tool design. Results: We identified 7 major design choices among the tools: 1) Abstraction Debt, 2) the use of Project Configuration Files, 3) the choice of Batch or Interactive Mode, 4) Minimal Paths to Data, 5) Familiar Software Abstractions, 6) Licensing and 7) the Perils of Code Reuse. Building on the observed good and bad design decisions, we created our own tool architecture and implemented it as an R package. Conclusions: Tools should not require onerous setup for users to obtain data. Authors should consider the conventions and abstractions used by their chosen language and build upon these instead of redefining them. Tools should encourage best practices in experiment reproducibility by leveraging self-contained and readable schemas that are used for tool automation, and reuse must be done with care to avoid depending on dead code.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others


  1. 1.

  2. 2.

    The documentation for the tool can be found at

  3. 3.

  4. 4.

    See and for Codeface and Codeface4Smells respectively.

  5. 5.

  6. 6.

  7. 7.

  8. 8.

  9. 9.

  10. 10.

  11. 11.

  12. 12.

  13. 13.

  14. 14.

  15. 15.

  16. 16.

  17. 17.

  18. 18.

  19. 19.

    See for example under Vignettes:

  20. 20.

  21. 21.

    Also publicly available at:

  22. 22.

  23. 23.

  24. 24.

  25. 25.

    For a full analysis with the metrics, see: The source code can be found on

  26. 26.


  1. 15th IEEE/ACM International Workshop on Cooperative and Human Aspects of Software Engineering, CHASE@ICSE 2022. IEEE (2022).

  2. Almeida, D.A., Murphy, G.C., Wilson, G., Hoye, M.: Do software developers understand open source licenses? In: IEEE/ACM 25th International Conference on Program Comprehension, pp. 1–11 (2017)

    Google Scholar 

  3. Bass, L., Clements, P., Kazman, R.: Software Architecture in Practice, 4th edn. Addison-Wesley, Boston (2021)

    Google Scholar 

  4. Bird, C., Gourley, A., Devanbu, P., Gertz, M., Swaminathan, A.: Mining email social networks. In: Proceedings of International Workshop on Mining Software Repositories, pp. 137–143. ACM (2006)

    Google Scholar 

  5. Bird, C., Rigby, P.C., Barr, E.T., Hamilton, D.J., German, D.M., Devanbu, P.: The promises and perils of mining git. In: 2009 6th IEEE International Working Conference on Mining Software Repositories, pp. 1–10 (2009).

  6. Broere, C.: Effects of community smells on turnover in Open Source Software projects. Ph.D. thesis, Vrije Universiteit Amsterdam (2021)

    Google Scholar 

  7. Carlos Paradis, R.K.: Design choices in building an MSR tool: the case of kaiaulu. In: ECSA 2021 (2021).

  8. Coelho, J., Valente, M.: Why modern open source projects fail. In: Proceedings of 11th Joint Meeting on Foundations of Software Engineering, pp. 186–196. ACM (2017)

    Google Scholar 

  9. De Stefano, M., Iannone, E., Pecorelli, F., Tamburri, D.A.: Impacts of software community patterns on process and product: an empirical study. Sci. Comput. Program. 214, 102731 (2022).

  10. Dueñas, S., Cosentino, V., Robles, G., Gonzalez-Barahona, J.M.: Perceval: software project data at your will. In: Proceedings of 40th International Conference on Software Engineering: Companion Proceedings, pp. 1–4. ACM (2018)

    Google Scholar 

  11. Easterbrook, S., Singer, J., Storey, M.A., Damian, D.: Selecting empirical methods for software engineering research. In: Shull, F., Singer, J., Sjøberg, D.I.K. (eds.) Guide to Advanced Empirical Software Engineering, pp. 285–311. Springer, London (2008).

    Chapter  Google Scholar 

  12. Giarola, F.: Detecting code and community smells in open-source: an automated approach. Master’s thesis, Politecnico di Milano (2016)

    Google Scholar 

  13. Gousios, G., Spinellis, D.: Alitheia core: an extensible software quality monitoring platform. In: 2009 IEEE 31st International Conference on Software Engineering, pp. 579–582 (2009)

    Google Scholar 

  14. Joblin, M., Apel, S., Hunsen, C., Mauerer, W.: Classifying developers into core and peripheral: an empirical study on count and network metrics. In: IEEE/ACM 39th International Conference on Software Engineering, pp. 164–174 (2017)

    Google Scholar 

  15. Joblin, M., Mauerer, W., Apel, S., Siegmund, J., Riehle, D.: From developer networks to verified communities: a fine-grained approach. In: 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, vol. 1, pp. 563–573 (2015)

    Google Scholar 

  16. Joblin, M.: Structural and evolutionary analysis of developer networks. Ph.D. thesis, Universitat Passau (2017)

    Google Scholar 

  17. Kalliamvakou, E., Gousios, G., Blincoe, K., Singer, L., German, D.M., Damian, D.: The promises and perils of mining github. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 92–101. Association for Computing Machinery, New York (2014).

  18. Kazman, R., Bianco, P., Echeverria, S., Ivers, J.: Robustness. Technical report, CMU/SEI-2022-TR-004 (2022)

    Google Scholar 

  19. Lancichinetti, A., Radicchi, F., Ramasco, J.J., Fortunato, S.: Finding statistically significant communities in networks. PLoS ONE 6(4), e18961 (2011).

  20. Magnoni, S.: An approach to measure community smells in software development communities. Ph.D. thesis, Politecnico Milano (2016)

    Google Scholar 

  21. van Meijel, J.: On the relations between community patterns and smells in open-source: a taxonomic and empirical analysis. Ph.D. thesis, Eindhoven University of Technology (2021)

    Google Scholar 

  22. Menzies, T., Williams, L., Zimmermann, T.: Perspectives on Data Science for Software Engineering, 1st edn. Morgan Kaufmann Publishers Inc., Burlington (2016)

    Google Scholar 

  23. Mo, R., Cai, Y., Kazman, R., Xiao, L., Feng, Q.: Architecture anti-patterns: automatically detectable violations of design principles. IEEE Trans. Softw. Eng. 47(5), 1008–1028 (2019)

    Article  Google Scholar 

  24. Moreno, D., et al.: Sortinghat: wizardry on software project members. In: IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings, pp. 51–54 (2019)

    Google Scholar 

  25. Neubig, G., et al.: XNMT: the extensible neural machine translation toolkit. In: Conference of the Association for Machine Translation in the Americas Open Source Software Showcase (2018)

    Google Scholar 

  26. Raymond, E.S.: The Art of UNIX Programming. Pearson Education, Burlington (2003)

    Google Scholar 

  27. Sculley, D., et al.: Hidden technical debt in machine learning systems. In: Proceedings of 28th International Conference on Neural Information Processing Systems, NIPS 2015, vol. 2, pp. 2503–2511. MIT Press, Cambridge (2015)

    Google Scholar 

  28. Spadini, D., Aniche, M., Bacchelli, A.: PyDriller: python framework for mining software repositories. In: Proceedings of 26th ACM Joint Proceedings of ESEC/FSE, pp. 908–911. ACM Press (2018)

    Google Scholar 

  29. Tamburri, D., Palomba, F., Kazman, R.: Exploring community smells in open-source: an automated approach. IEEE Trans. Softw. Eng. 47(3), 630–652 (2019)

    Article  Google Scholar 

  30. Trautsch, A., Trautsch, F., Herbold, S., Ledel, B., Grabowski, J.: The smartshark ecosystem for software repository mining. arXiv preprint arXiv:2001.01606 (2020)

  31. Trautsch, F., Herbold, S., Makedonski, P., Grabowski, J.: Addressing problems with replicability and validity of repository mining studies through a smart data platform. Empir. Softw. Eng. 23(2), 1036–1083 (2017).

    Article  Google Scholar 

  32. Valiev, M., Vasilescu, B., Herbsleb, J.: Ecosystem-level determinants of sustained activity in open-source projects: a case study of the PyPI ecosystem. In: Proceedings of 26th ACM Joint Proceedings of ESEC/FSE, pp. 644–655. ACM (2018)

    Google Scholar 

  33. Zhu, J., Wei, J.: An empirical study of multiple names and email addresses in OSS version control repositories. In: IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), pp. 409–420 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Carlos Paradis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Paradis, C., Kazman, R. (2022). Building the MSR Tool Kaiaulu: Design Principles and Experiences. In: Scandurra, P., Galster, M., Mirandola, R., Weyns, D. (eds) Software Architecture. ECSA 2021. Lecture Notes in Computer Science, vol 13365. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-15115-6

  • Online ISBN: 978-3-031-15116-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics