Skip to main content
Log in

Recommending metamodel concepts during modeling activities with pre-trained language models

  • Theme Section Paper
  • Published:
Software and Systems Modeling Aims and scope Submit manuscript

Abstract

The design of conceptually sound metamodels that embody proper semantics in relation to the application domain is particularly tedious in model-driven engineering. As metamodels define complex relationships between domain concepts, it is crucial for a modeler to define these concepts thoroughly while being consistent with respect to the application domain. We propose an approach to assist a modeler in the design of metamodel by recommending relevant domain concepts in several modeling scenarios. Our approach does not require knowledge from the domain or to hand-design completion rules. Instead, we design a fully data-driven approach using a deep learning model that is able to abstract domain concepts by learning from both structural and lexical metamodel properties in a corpus of thousands of independent metamodels. We evaluate our approach on a test set containing 166 metamodels, unseen during the model training, with more than 5000 test samples. Our preliminary results show that the trained model is able to provide accurate top 5 lists of relevant recommendations for concept renaming scenarios. Although promising, the results are less compelling for the scenario of the iterative construction of the metamodel, in part because of the conservative strategy we use to evaluate the recommendations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. In this paper, the term “model” refers to a machine learning model rather than an instance of a metamodel as customary in the MDE literature. All MDE artifacts we operate on are metamodels.

  2. The illustration is inspired from the following blog: http://jalammar.github.io/illustrated-bert/.

  3. http://mar-search.org/experiments/models20/.

  4. https://www.eclipse.org/xtend/.

  5. https://www.eclipse.org/modeling/emf/.

  6. https://wordnet.princeton.edu/.

References

  1. Agt-Rickauer, H., Kutsche, R.D., Sack, H.: Automated recommendation of related model elements for domain models. In: International Conference on Model-Driven Engineering and Software Development, pp. 134–158. Springer, Berlin (2018)

  2. Agt-Rickauer, H., Kutsche, R.D., Sack, H.: Domore—a recommender system for domain modeling. In: MODELSWARD, pp. 71–82 (2018)

  3. Atkinson, C., Kühne, T.: A tour of language customization concepts. Adv. Comput. 70, 105–161 (2007)

    Article  Google Scholar 

  4. Baker, P., Loh, S., Weil, F.: Model-driven engineering in a large industrial context—Motorola case study. In: International Conference on Model Driven Engineering Languages and Systems, pp. 476–491. Springer, Berlin (2005)

  5. Basciani, F., Di Rocco, J., Di Ruscio, D., Di Salle, A., Iovino, L., Pierantonio, A.: Mdeforge: an extensible web-based modeling platform. In: 2nd International Workshop on Model-Driven Engineering on and for the Cloud, CloudMDE 2014, Co-located with the 17th International Conference on Model Driven Engineering Languages and Systems, MoDELS 2014, vol. 1242, pp. 66–75. CEUR-WS (2014)

  6. Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(null), 1137–1155 (2003)

    MATH  Google Scholar 

  7. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language models are few-shot learners. Adv. Neural Inf. Process Syst. 33, 1877–1901 (2020)

  8. Burgueño, L., Clarisó, R., Li, S., Gérard, S., Cabot, J.: A NLP-based architecture for the autocompletion of partial domain models. https://hal.archives-ouvertes.fr/hal-03010872. Working paper or preprint (2020)

  9. Burgueño, L., Cabot, J., Gérard, S.: An LSTM-based neural network architecture for model transformations. In: 2019 ACM/IEEE 22nd International Conference on Model Driven Engineering Languages and Systems (MODELS), pp. 294–299 (2019). https://doi.org/10.1109/MODELS.2019.00013

  10. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  11. Devlin, J., Chang, M-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  12. Di Rocco, J., Di Sipio, C., Di Ruscio, D., Nguyen, T.P.: A GNN-based recommender system to assist the specification of metamodels and models. https://github.com/MDEGroup/MORGAN/blob/main/main.pdf

  13. Eclipse Foundation, Inc.: Eclipse Emfatic. https://www.eclipse.org/emfatic/

  14. Elkamel, A., Gzara, M., Ben-Abdallah, H.: An UML class recommender system for software design. In: 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA), pp. 1–8 (2016). https://doi.org/10.1109/AICCSA.2016.7945659

  15. Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Qin, B., Liu, T., Jiang, D., et al.: CodeBERT: a pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 (2020)

  16. France, R., Bieman, J., Cheng, B.H.: Repository for model driven development (ReMoDD). In: International Conference on Model Driven Engineering Languages and Systems, pp. 311–317. Springer, Berlin (2006)

  17. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  18. Kanade, A., Maniatis, P., Balakrishnan, G., Shi, K.: Pre-trained contextual embedding of source code. arXiv preprint arXiv:2001.00059 (2019)

  19. Karampatsis, R.M., Babii, H., Robbes, R., Sutton, C., Janes, A.: Big code != big vocabulary: open-vocabulary models for source code. Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (2020). https://doi.org/10.1145/3377811.3380342

  20. Karampatsis, R.M., Sutton, C.: SCELMo: source code embeddings from language models. arXiv preprint arXiv:2004.13214 (2020)

  21. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)

  22. Kuschke, T., Mäder, P.: Pattern-based auto-completion of UML modeling activities. In: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, pp. 551–556 (2014)

  23. Kuschke, T., Mäder, P., Rempel, P.: Recommending auto-completions for software modeling activities. In: International Conference on Model Driven Engineering Languages and Systems, pp. 170–186. Springer, Berlin (2013)

  24. Lample, G., Conneau, A.: Cross-lingual language model pretraining. CoRR arXiv:1901.07291 (2019)

  25. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBerta: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  26. López, J.A.H., Cuadrado, J.S.: MAR: a structure-based search engine for models. In: Proceedings of the 23rd ACM/IEEE International Conference on Model Driven Engineering Languages and Systems, pp. 57–67 (2020)

  27. López-Fernández, J.J., Guerra, E., De Lara, J.: Assessing the quality of meta-models. In: MoDeVVa@ MoDELS, pp. 3–12. Citeseer (2014)

  28. Mohagheghi, P., Gilani, W., Stefanescu, A., Fernandez, M.A.: An empirical study of the state of the practice and acceptance of model-driven engineering in four industrial cases. Empir. Softw. Eng. 18(1), 89–116 (2013)

    Article  Google Scholar 

  29. Mussbacher, G., Combemale, B., Abrahão, S., Bencomo, N., Burgueño, L., Engels, G., Kienzle, J., Kühn, T., Mosser, S., Sahraoui, H., et al.: Towards an assessment grid for intelligent modeling assistance. In: Proceedings of the 23rd ACM/IEEE International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings, pp. 1–10 (2020)

  30. Mussbacher, G., Combemale, B., Kienzle, J., Abrahão, S., Ali, H., Bencomo, N., Búr, M., Burgueño, L., Engels, G., Jeanjean, P., et al.: Opportunities in intelligent modeling assistance. Softw. Syst. Model. 19(5), 1045–1053 (2020)

    Article  Google Scholar 

  31. NaoMod Research Group: Atlanmod Modeling Tools. https://www.atlanmod.org/

  32. Rabbi, F., Lamo, Y., Yu, I., Kristensen, L.M.: A diagrammatic approach to model completion. In: AMT@MoDELS (2015)

  33. Radford, A.: Improving language understanding by generative pre-training. OpenAI Blog (2018)

  34. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)

    Google Scholar 

  35. Robillard, M., Walker, R., Zimmermann, T.: Recommendation systems for software engineering. IEEE Softw. 27(4), 80–86 (2009)

    Article  Google Scholar 

  36. Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE Trans. Neural Netw. 20(1), 61–80 (2008)

    Article  Google Scholar 

  37. Sen, S., Baudry, B., Precup, D.: Partial model completion in model driven engineering using constraint logic programming. In: 17th International Conference on Applications of Declarative Programming and Knowledge Management (INAP 2007) and 21st Workshop on (Constraint), p. 59 (2007)

  38. Sen, S., Baudry, B., Vangheluwe, H.: Domain-specific model editors with model completion. In: Giese, H. (ed.) Models in Software Engineering, pp. 259–270. Springer, Berlin (2008)

    Chapter  Google Scholar 

  39. Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1715–1725. Association for Computational Linguistics, Berlin (2016). https://doi.org/10.18653/v1/P16-1162. https://www.aclweb.org/anthology/P16-1162

  40. Stephan, M.: Towards a cognizant virtual software modeling assistant using model clones. In: 2019 IEEE/ACM 41st International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER), pp. 21–24. IEEE (2019)

  41. Svyatkovskiy, A., Lee, S., Hadjitofi, A., Riechert, M., Franco, J.V., Allamanis, M.: Fast and memory-efficient neural code completion. In: IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp. 329–340. IEEE (2020)

  42. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)

  43. Weyssow, M., Sahraoui, H., Frénay, B., Vanderose, B.: Combining code embedding with static analysis for function-call completion. arXiv:2008.03731 (2020)

  44. Whittle, J., Hutchinson, J., Rouncefield, M.: The state of practice in model-driven engineering. IEEE Softw. 31(3), 79–85 (2014). https://doi.org/10.1109/MS.2013.65

    Article  Google Scholar 

  45. Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Scao, T.L., Gugger, S., Drame, M., Lhoest, Q., Rush, A.M.: Huggingface’s transformers: state-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019)

  46. Yang, Z., Dai, Z., Yang, Y., Carbonell, J.G., Salakhutdinov, R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. arXiv preprint arXiv:1906.08237 (2019)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Weyssow.

Additional information

Communicated by L. Burgueño, J. Cabot, M. Wimmer and S. Zschaler.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A Replication package

We make our code, datasets, and models publicly available to ease the replication of our experiments and to help researchers that are interested in extending our work:

https://github.com/martin-wey/metamodel-concepts-bert.

The data and models are available on Zenodo:

https://doi.org/10.5281/zenodo.5579980.

B Model hyperparameters

See Table 3.

Table 3 Hyperparameters (HP) used for the training of our model

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Weyssow, M., Sahraoui, H. & Syriani, E. Recommending metamodel concepts during modeling activities with pre-trained language models. Softw Syst Model 21, 1071–1089 (2022). https://doi.org/10.1007/s10270-022-00975-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10270-022-00975-5

Keywords

Navigation