Software & Systems Modeling

, Volume 14, Issue 2, pp 889–904 | Cite as

Corpus-based analysis of domain-specific languages

Regular Paper


As more domain-specific languages (DSLs) are designed and developed, the need to evaluate these languages becomes an essential part of the overall DSL life cycle. Corpus-based analysis can serve as an evaluation mechanism to identify characteristics of the language after it has been deployed by looking at how end-users employ it in practice. This analysis that is based on actual usage of the language brings a new perspective which can be considered by a language engineer when working toward improving the language. In this paper, we describe our utilization of corpus-based analysis techniques and exemplify them on the evaluation of the Puppet and ATL DSLs. We also outline an Eclipse plug-in, which is a generic corpus-based DSL analysis tool that can accommodate the evaluation of different DSLs.


Domain-specific languages DSL Corpus Analysis ATL Puppet 


  1. 1.
    Abrahão, S., Iborra, E., Vanderdonckt, J.: Usability evaluation of user interfaces generated with a model-driven architecture tool. In: Law, E.L.-C., Hvannberg, E.T., Cockton, G. (eds) Maturing Usability, Human-Computer Interaction Series, pp. 3–32. Springer, London (2008)Google Scholar
  2. 2.
    Baker, B.: On finding duplication and near-duplication in large software systems. In: Working Conference on Reverse Engineering, pp. 86–95. IEEE Computer Society, Washington, DC (1995)Google Scholar
  3. 3.
    Bellon, S., Koschke, R., Antoniol, G., Krinke, J., Merlo, E.: Comparison and evaluation of clone detection tools. IEEE Trans. Softw. Eng. 33(9), 577–591 (2007)CrossRefGoogle Scholar
  4. 4.
    Berenbach, B.: The evaluation of large, complex UML analysis and design models. In: International Conference on, Software Engineering, pp. 232–241 (2004)Google Scholar
  5. 5.
    Cabot, J, Clarisó, R, Riera, D.: UMLtoCSP: a tool for the formal verification of UML/OCL models using constraint programming. In: International Conference on Automated Software Engineering, pp. 547–548. ACM, New York, NY (2007)Google Scholar
  6. 6.
    Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 391–407 (1990)CrossRefGoogle Scholar
  7. 7.
    Dobing, B., Parsons, J.: Dimensions of UML diagram use: a survey of practitioners. J. Database Manag. 19, 1–18 (2008)CrossRefGoogle Scholar
  8. 8.
    Falke, R., Frenzel, P., Koschke, R.: Empirical evaluation of clone detection using syntax suffix trees. Empir. Softw. Eng. 13, 601–643 (2008)CrossRefGoogle Scholar
  9. 9.
    Gabriel, P., Afonso Goulão, M., Amaral, V.: Do software languages engineers evaluate their languages? In: Congreso Iberoamericano en ”Software Engineering, pp. 149–162 (2010)Google Scholar
  10. 10.
    Genero, M., Manso, E., Visaggio, A., Canfora, G., Piattini, M.: Building measure-based prediction models for UML class diagram maintainability. Empir. Softw. Eng. 12, 517–549 (2007)CrossRefGoogle Scholar
  11. 11.
    Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)CrossRefMATHGoogle Scholar
  12. 12.
    Hage, J., Keeken, P.: Neon: a library for language usage analysis. In: Gašević, D., Lämmel, R., Wyk, E. (eds.) International Conference on Software Language Engineering, Volume 5452 of Lecture Notes in Computer Science, pp. 35–53. Springer, Berlin (2008)Google Scholar
  13. 13.
    Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco, CA (2005)Google Scholar
  14. 14.
    Hermans, F., Pinzger, M., van Deursen, A.: Domain-specific languages in practice: a user study on the success factors. In: Schürr, A., Selic, B. (eds.) International Conference on Model Driven Engineering Languages and Systems, volume 5795 of Lecture Notes in Computer Science, pp. 423–437. Springer, Berlin (2009)Google Scholar
  15. 15.
    Higo, Y., Kusumoto, S., Inoue, K.: A metric-based approach to identifying refactoring opportunities for merging code clones in a java software system. J. Softw. Maint. Evol. Res. Pract. 20(6), 435–461 (2008)CrossRefGoogle Scholar
  16. 16.
    Hubaux, A., Boucher, Q., Hartmann, H., Michel, R., Heymans, P.: Evaluating a textual feature modelling language: four industrial case studies. In: Malloy, B., Staab, S., van den Brand, M. (eds.) International Conference on Software Language Engineering, volume 6563 of Lecture Notes in Computer Science, pp. 337–356. Springer, Berlin (2010)Google Scholar
  17. 17.
    Jeanneret, C., Glinz, M., Baudry, B.: Estimating footprints of model operations. In: International Conference on Software Engineering, pp. 601–610. ACM, New York, NY (2011)Google Scholar
  18. 18.
    Jouault, F., Allilaire, F., Bézivin, J., Kurtev, I.: ATL: a model transformation tool. Sci. Comput. Program. 72(1–2), 31–39 (2008)CrossRefMATHGoogle Scholar
  19. 19.
    Kamiya, T., Kusumoto, S., Inoue, K.: CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans. Softw. Eng. 28(7), 654–670 (2002)CrossRefGoogle Scholar
  20. 20.
    Kim, H., Boldyreff, C.: Developing software metrics applicable to UML models. In: Workshop on Quantitative Approaches in Object-Oriented, Software Engineering (2002)Google Scholar
  21. 21.
    Lämmel, R., Pek, E.: Vivisection of a non-executable, domain-specific language-understanding (the usage of) the P3P language. In: International Conference on Program Comprehension, pp. 104–113 (2010)Google Scholar
  22. 22.
    Lange, C., Wijns, M., Chaudron, M.: Metricviewevolution: UML-based views for monitoring model evolution and quality. In: European Conference on Software Maintenance and Reengineering, pp. 327–328 (2007)Google Scholar
  23. 23.
    Mernik, M., Heering, J., Sloane, A.: When and how to develop domain-specific languages. ACM Comput. Surv. 37, 316–344 (2005)CrossRefGoogle Scholar
  24. 24.
    Monperrus, M., Jézéquel, J.-M., Baudry, B., Champeau, J., Hoeltzener, B.: Model-driven generative development of measurement software. Softw. Syst. Model. 10, 537–552 (2011)CrossRefGoogle Scholar
  25. 25.
    Muehlen, M., Recker, J.: How much language is enough? Theoretical and practical use of the business process modeling notation. In: Bellahsène, Z., Léonard, M. (eds.) International Conference on Advanced Information Systems Engineering, volume 5074 of Lecture Notes in Computer Science, pp. 465–479. Springer, Berlin (2008)Google Scholar
  26. 26.
    Recker, J., Muehlen, M., Siau, K., Erickson, J., Indulska, M.: UML versus BPMN. In: Americas Conference on Information Systems, Measuring method complexity (2009)Google Scholar
  27. 27.
    Störrle, H.: Towards clone detection in UML domain models. Softw. Syst. Model. pp. 1–23 (2012)Google Scholar
  28. 28.
    Tairas, R., Cabot, J.: Cloning in DSLs: experiments with OCL. In: Assman, U. Sloane, A. (eds.) International Conference on Software Language Engineering, volume 6940 of Lecture Notes in Computer Science, pp. 60–76. Springer, Berlin (2011)Google Scholar
  29. 29.
    Tairas, R., Gray, J.: Phoenix-based clone detection using suffix trees. In: ACM Southeast Regional Conference, pp. 679–684. ACM, New York, NY (2006)Google Scholar
  30. 30.
    Tairas, R., Gray, J.: An information retrieval process to aid in the analysis of code clones. Empir. Softw. Eng. 14, 33–56 (2009)CrossRefGoogle Scholar
  31. 31.
    van Amstel, M., van den Brand, M., Engelen, L.: An exercise in iterative domain-specific language design. In: Joint ERCIM Workshop on Software Evolution (EVOL) and International Workshop on Principles of Software Evolution (IWPSE), pp. 48–57. ACM, New York, NY (2010)Google Scholar
  32. 32.
    Xtext. Xtext documentation (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.AtlanMod, École des Mines de Nantes - INRIA - LINANantesFrance

Personalised recommendations