A UML Profile for the Design, Quality Assessment and Deployment of Data-intensive Applications

  • Diego Perez-Palacin
  • José MerseguerEmail author
  • José I. Requeno
  • M. Guerriero
  • Elisabetta Di Nitto
  • D. A. Tamburri
Regular Paper


Big Data or Data-Intensive applications (DIAs) seek to mine, manipulate, extract or otherwise exploit the potential intelligence hidden behind Big Data. However, several practitioner surveys remark that DIAs potential is still untapped because of very difficult and costly design, quality assessment and continuous refinement. To address the above shortcoming, we propose the use of a UML domain-specific modeling language or profile specifically tailored to support the design, assessment and continuous deployment of DIAs. This article illustrates our DIA-specific profile and outlines its usage in the context of DIA performance engineering and deployment. For DIA performance engineering, we rely on the Apache Hadoop technology, while for DIA deployment, we leverage the TOSCA language. We conclude that the proposed profile offers a powerful language for data-intensive software and systems modeling, quality evaluation and automated deployment of DIAs on private or public clouds.


UML Profile Data-intensive applications Software design Big Data Performance assessment Model-driven deployment Apache Hadoop TOSCA language 



This work is supported by the European Commission Grant No. 644869 (H2020, Call 1), DICE. D. Perez-Palacin, J. Merseguer and J.I. Requeno have been supported by the project CyCriSec [TIN2014-58457-R] and Aragon Government Ref. T27-DISCO Research Group.


  1. 1.
    Ajmone-Marsan, M., Balbo, G., Conte, G., Donatelli, S., Franceschinis, G.: Modeling with Generalized Stochastic Petri Nets. Wiley, New York (1994)zbMATHGoogle Scholar
  2. 2.
    Ardagna, D., Bernardi, S., Gianniti, E., Karimian Aliabadi, S., Perez-Palacin, D., Requeno, J.I.: Modeling performance of hadoop applications: a journey from queueing networks to stochastic well formed nets. In: International Conference on Algorithms and Architectures for Parallel Processing, pp. 599–613. Springer, Cham (2016).
  3. 3.
    Ardagna, D., Di Nitto, E., Casale, G., Petcu, D., Mohagheghi, P., Mosser, S., Matthews, P., Gericke, A., Ballagny, C., D’Andria, F., Nechifor, C.-S., Sheridan, C.: Modaclouds: a model-driven approach for the design and execution of applications on multiple clouds. In: Proceedings of the 4th International Workshop on Modeling in Software Engineering, MiSE’12, pp. 50–56. IEEE Press, Piscataway, NJ (2012).
  4. 4.
    Artac, M., Borovsak, T., Di Nitto, E., Guerriero, M., Perez-Palacin, D., Tamburri, D.A.: Infrastructure-as-code for data-intensive architectures: a model-driven development approach. In: IEEE International Conference on Software Architecture, ICSA 2018, Seattle, WA, April 30–May 4, 2018, pp. 156–165. IEEE Computer Society (2018).
  5. 5.
    ATC. Athens Technology Center Website (2018). Accessed Dec 2018
  6. 6.
    Baresi, L., Guinea, S., Quattrocchi, G., Tamburri, D.A.: Microcloud: A container-based solution for efficient resource management in the cloud. In: 2016 IEEE International Conference on Smart Cloud (SmartCloud), pp. 218–223, Nov 2016.
  7. 7.
    Bell, G., Hey, T., Szalay, A.: Beyond the data deluge. Science 323(5919), 1297–1298 (2009)CrossRefGoogle Scholar
  8. 8.
    Bernardi, S., Dominguez, J.L., Gómez, A., Joubert, C., Merseguer, José, Perez-Palacin, D., Requeno, J.I., Romeu, A.: A systematic approach for performance assessment using process mining. Empir. Softw. Eng. (2018) (accepted for publication).
  9. 9.
    Bernardi, S., Requeno, J.I., Joubert, C., Romeu, A.: A systematic approach for performance evaluation using process mining: the Posidonia Operations case study. In: Proceedings of the 2nd International Workshop on Quality-Aware DevOps, QUDOS 2016, pp. 24–29. ACM, New York, NY (2016).
  10. 10.
    Bernardi, S., Merseguer, J., Petriu, D.C.: A dependability profile within MARTE. Softw. Syst. Model. 10(3), 313–336 (2011)CrossRefGoogle Scholar
  11. 11.
    Bernardi, S., Merseguer, J., Petriu, D.C.: Model-Driven Dependability Assessment of Software Systems. Springer, New York (2013)CrossRefzbMATHGoogle Scholar
  12. 12.
    Blu Age. Blu Age, Make IT Digital (2018). Accessed Dec 2018
  13. 13.
    Casale et al., G.: DICE: Quality-driven development of data-intensive cloud applications. In: Proceedings of the Seventh International Workshop on Modeling in Software Engineering, pp. 78–83, IEEE Press, NJ (2015).
  14. 14.
    Chandrasekaran, K., Santurkar, S., Arora, A.: Stormgen—a domain specific language to create ad-hoc storm topologies. In: Ganzha, M., Maciaszek, L.A., Paprzycki, M. (eds.) FedCSIS, pp. 1621–1628 (2014).
  15. 15.
    Chen, C.L.P., Zhang, C.-Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)CrossRefGoogle Scholar
  16. 16.
    Chiola, G., Dutheillet, C., Franceschinis, G., Haddad, S.: Stochastic well-formed colored nets and symmetric modeling applications. IEEE Trans. Comput. 42(11), 1343–1360 (1993). CrossRefGoogle Scholar
  17. 17.
    Clements, P., Kazman, R., Klein, M.: Evaluating Software Architectures: Methods and Case Studies. Addison-Wesley, Boston (2001)Google Scholar
  18. 18.
    Cois, C.A., Yankel, J., Connell, A.: Modern devops: optimizing software development through effective system interactions. In: IPCC, pp. 1–7. IEEE (2014).
  19. 19.
    Colas, M., Finck, I., Buvat, J., Nambiar, R., Singh, R.R.: Cracking the data conundrum: how successful companies make big data operational. Technical report, Capgemini consulting (2015).
  20. 20.
    Cortellessa, V., Di Marco, A., Inverardi, P.: Model-Based Software Performance Analysis. Springer, New York (2011)CrossRefGoogle Scholar
  21. 21.
    Dean, J., Ghemawat, S.: Mapreduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)CrossRefGoogle Scholar
  22. 22.
    Di Nitto, E., Mattew, P., Petcu, D., Solberg, A. (eds.): Model-Driven Development and Operation of Multi-Cloud Applications. PoliMI SpringerBriefs. Springer, New York (2017)Google Scholar
  23. 23.
    Dipartamento di informatica, Università di Torino. GRaphical Editor and Analyzer for Timed and Stochastic Petri Nets, Dec 2015.
  24. 24.
    Gilmore, S., Hillston, J., Kloul, L., Ribaudo, M.: Pepa nets: a structured performance modelling formalism. Perform. Eval. 54(2), 79–104 (2003). CrossRefzbMATHGoogle Scholar
  25. 25.
    Gómez, A., Merseguer, J., Di Nitto, E., Tamburri, D.A.: Towards a uml profile for data intensive applications. In: Proceedings of the 2Nd International Workshop on Quality-Aware DevOps, QUDOS 2016, pp. 18–23, ACM, New York, NY (2016).
  26. 26.
    Juniper Project: Experimental: models for big data stream processing (2015). Juniper Project Tutorial. Accessed Dec 2018
  27. 27.
    Kroß, J., Brunnert, A., Krcmar, H.: Modeling big data systems by extending the palladio component model. Softwaretechnik-Trends 35(3) (2015)Google Scholar
  28. 28.
    Kroß, J., Krcmar, H.: Modeling and simulating Apache Spark streaming applications. Softwaretechnik-Trends 36(4), 1–3 (2016)Google Scholar
  29. 29.
    Lagarde, F., Espinoza, H., Terrier, F., Gérard, S.: Improving UML profile design practices by leveraging conceptual domain models. In: 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2007), Atlanta (USA), ACM, Nov 2007, pp. 445–448Google Scholar
  30. 30.
    Langheinrich, M.: Privacy by design. In: Abowd, G.D., Brumitt, B., Shafer, A. (eds.) UBICOMP 2001, pp. 273–291. Springer, New York (2001)CrossRefGoogle Scholar
  31. 31.
    Lazowska, E.D., Zahorjan, J., Scott Graham, G., Sevcik, C.: Quantitative System Performance: Computer System Analysis Using Queueing Network models. Prentice-Hall, Upper Saddle River (1984)Google Scholar
  32. 32.
    Lipton, P., Palma, D., Rutkowski, M., Tamburri, D.A.: TOSCA solves big problems in the cloud and beyond. IEEE Cloud 21(11), 31–39 (2016)Google Scholar
  33. 33.
    López-Grao, J.P., Merseguer, J., Campos, J.: From UML activity diagrams to stochastic petri nets: application to software performance engineering. In: Proceedings of the 4th International Workshop on Software and Performance, WOSP’04, pp. 25–36, ACM, New York, NY (2004).
  34. 34.
    Morris, K.: Infrastructure As Code: Managing Servers in the Cloud. Oreilly & Associates Incorporated, Sebastopol (2016)Google Scholar
  35. 35.
    Palma, D., Rutkowski, M., Spatzier, T.: Tosca simple profile in YAML version 1.0. Technical report, OASIS Committee Specification (2016).
  36. 36.
    Perez-Palacin, D, Ridene, Y., Merseguer, J.: Quality assessment in DevOps: automated analysis of a tax fraud detection system. In: Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering Companion, ICPE’17 Companion, pp. 133–138, ACM, New York, NY (2017)Google Scholar
  37. 37.
    Petriu, D.C., Alhaj, M., Tawhid, R.: Software Performance Modeling. Lecture Notes in Computer Science, vol. 7320. Springer, Berlin (2012)Google Scholar
  38. 38.
    Prodevelop: Prodevelop-Integrating Tech (2018). Accessed Dec 2018
  39. 39.
    Rajbhoj, A., Kulkarni, V., Bellarykar, N.: Early experience with model-driven development of MapReduce based big data application. In: 2014 21st Asia-Pacific Software Engineering Conference (APSEC), vol. 1, pp. 94–97 (Dec 2014).
  40. 40.
    Ranjan, R.: Modeling and simulation in performance optimization of big data processing frameworks. IEEE Cloud Comput. 1(4), 14–19 (2014)CrossRefGoogle Scholar
  41. 41.
    Requeno, J.I., Merseguer, J., Bernardi, S., Perez-Palacin, D., Giotis, G., Papanikolaou, V.: Quantitative analysis of apache storm applications: the NewsAsset case study. Inf. Syst. Front. (2018) (accepted for publication).
  42. 42.
    Requeno, J.-I., Merseguer, J., Bernardi, S.: Performance analysis of apache storm applications using stochastic petri nets. In: IEEE International Conference on Information Reuse and Integration (IRI), pp. 411–418 (2017).,
  43. 43.
    Sanders, W.H., Meyer, J.F.: Stochastic Activity Networks: Formal Definitions and Concepts. Lecture Notes in Computer Science, vol. 2090. Springer, Berlin (2001)Google Scholar
  44. 44.
    Sandmann, G., Thompson, R.: Development of AUTOSAR software components within model-based design. SAE Technical Paper 04 (2008).
  45. 45.
    Santurkar, S., Arora, A., Chandrasekaran, K.: Stormgen—a domain specific language to create ad-hoc storm topologies. In: 2014 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 1621–1628 (Sept 2014).
  46. 46.
    Scheidgen, M., Zubow, A:. Map/reduce on emf models. In: MDHPCL@MoDELS. ACM (2012).
  47. 47.
    Selic, B.: A systematic approach to domain-specific language design using UML. In: Tenth IEEE International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC 2007), 7–9 May 2007, Santorini Island, Greece, pp. 2–9 Computer Society (2007)Google Scholar
  48. 48.
    Selic, B., Gerard, S. (eds.): Modeling and Analysis of Real-Time and Embedded Systems with UML and MARTE. Morgan Kaufmann, Boston (2014)Google Scholar
  49. 49.
    Smith, C.U., Williams, L.G.: Performance Solutions: A Practical Guide to Creating Responsive. Scalable Software. Addison Wesley Longman Publishing Co., Inc., Redwood City, CA (2002)Google Scholar
  50. 50.
    The Apache Software Foundation. Apache Cassandra. Accessed Dec 2018
  51. 51.
    The Apache Software Foundation. Apache Hadoop. Accessed Dec 2018
  52. 52.
    The Apache Software Foundation. Apache Kafka. Accessed Dec 2018
  53. 53.
    The Apache Software Foundation. Apache Spark. Accessed Dec 2018
  54. 54.
    The Apache Software Foundation. Apache Storm. Accessed Dec 2018
  55. 55.
    The Apache Software Foundation. Apache Tez. Accessed Dec 2018
  56. 56.
    The DICE Consortium. DICE Models Repository, Jan 2017.
  57. 57.
    The DICE Consortium. DICE Profiles Repository, Sept 2017.
  58. 58.
    The DICE Consortium. DICE Profiles, Sept 2017.
  59. 59.
    The DICE Consortium. DICE Simulation tool, Oct 2017.
  60. 60.
    The DICE Consortium. DICE-Rollout, Sept 2017.
  61. 61.
    The Object Management Group (OMG): Model-Driven Architecture Specification and Standardisation. Technical report (2018).
  62. 62.
    The DICE Consortium. DICE simulation tools. Technical report, European Union’s Horizon 2020 research and innovation programme (2017).
  63. 63.
    The DICE Consortium. DICE transformations to Analysis Models. Technical report, European Union’s Horizon 2020 research and innovation programme (2016).
  64. 64.
    UML Profile for MARTE: Modeling and Analysis of Real-Time and Embedded Systems (June 2011). Version 1.1, OMG document: formal/2011-06-02Google Scholar
  65. 65.
    Unified Modeling Language: Infrastructure, 2017. Version 2.5.1, OMG document: formal/2017-12-05Google Scholar
  66. 66.
    Wang, K., Khan, M.M.H.: Performance prediction for Apache Apark platform. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), and 2015 IEEE 12th International Conference on Embedded Software and Systems (ICESS), pp. 166–173 (2015)Google Scholar
  67. 67.
    Wettinger, J., Breitenbücher, U., Leymann, F.: Standards-based DevOps automation and integration using TOSCA. In: 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing, pp. 59–68, Dec 2014.
  68. 68.
    WikiMedia project. Wikistats, Dec 2016.
  69. 69.
    Wille, R.: Formal concept analysis as mathematical theory of concepts and concept hierarchies. In: Formal Concept Analysis, pp. 1–33 (2005)Google Scholar
  70. 70.
    Woodside, C.M., Petriu, D.C., Merseguer, J., Petriu, D.B., Alhaj, M.: Transformation challenges: from software models to performance models. Softw. Syst. Model. 13(4), 1529–1552 (2014). CrossRefGoogle Scholar
  71. 71.
    XLAB. XLAB, R&D (2018). Accessed Dec 2018

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Diego Perez-Palacin
    • 1
  • José Merseguer
    • 2
    Email author
  • José I. Requeno
    • 2
  • M. Guerriero
    • 3
  • Elisabetta Di Nitto
    • 3
  • D. A. Tamburri
    • 3
  1. 1.Department of Computer ScienceLinnaeus UniversityVäxjöSweden
  2. 2.Departamento de Informática e Ingeniería de SistemasUniversidad de ZaragozaZaragozaSpain
  3. 3.Dipartimento di Elettronica, Informazione e BioingegnieriaPolitecnico di MilanoMilanItaly

Personalised recommendations