Skip to main content

On Warehouses, Lakes, and Spaces: The Changing Role of Conceptual Modeling for Data Integration

  • Chapter
  • First Online:
Conceptual Modeling Perspectives

Abstract

The role of conceptual models, their formalization and implementation as knowledge bases, and the related metadata and metamodel management, has continuously evolved since their inception in the late 1970s. In this paper, we trace this evolution from traditional database design, to data warehouse integration, to the recent data lake architectures. Concerning future developments, we argue that much of the research has perhaps focused too much on the design perspective of individual companies or strongly managed centralistic company networks, culminating in today’s huge oligopolistic web players, and propose a vision of interacting data spaces which seems to offer more sovereignty of small and medium enterprises over their own data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • 1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley (1995)

    Google Scholar 

  • 2. Aguilera, D., Gómez, C., Olivé, A.: Enforcement of conceptual schema quality issues in current integrated development environments. In: Salinesi, C., Norrie, M.C., Pastor, O. (eds.) Proc. 25th Intl. Conf. on Advanced Information Systems Engineering (CAiSE). Lecture Notes in Computer Science, vol. 7908, pp. 626–640. Springer, Valencia, Spain (2013), https://doi.org/10.1007/978-3-642-38709-8_40

  • 3. Atzeni, P., Bellomarini, L., Bugiotti, F., Gianforme, G.: Mism: A platform for modelindependent solutions to model management problems. Journal of Data Semantics 14, 133–161 (2009)

    Google Scholar 

  • 4. Atzeni, P., Cappellari, P., Torlone, R., Bernstein, P.A., Gianforme, G.: Model-independent schema translation. VLDB Journal 17(6), 1347–1370 (2008)

    Google Scholar 

  • 5. Batini, C., Lenzerini, M., Navathe, S.B.: A comparative analysis of methodologies for database schema integration. ACM Computing Surveys 18(4), 323–364 (1986)

    Google Scholar 

  • 6. Batini, C., Scannapieco, M.: Data Quality: Concepts, Methodologies and Techniques. Data-Centric Systems and Applications, Springer (2006), https://doi.org/10.1007/3-540-33173-5

  • 7. Beeri, C., Vardi, M.Y.: A proof procedure for data dependencies. Journal of the ACM 31(4), 718–741 (1984)

    Google Scholar 

  • 8. Bergamaschi, S., Castano, S., Vincini, M., Beneventano, D.: Semantic integration of heterogeneous information sources. Data & Knowledge Engineering 36(3), 215–249 (2001)

    Google Scholar 

  • 9. Bernstein, P.A., Halevy, A.Y., Pottinger, R.: A vision for management of complex models. SIGMOD Record 29(4), 55–63 (2000)

    Google Scholar 

  • 10. Bernstein, P.A., Melnik, S.: Model management 2.0: Manipulating richer mappings. In: Zhou, L., Ling, T.W., Ooi, B.C. (eds.) Proc. ACM SIGMOD Intl. Conf. on Management of Data. pp. 1–12. ACM Press, Beijing, China (2007)

    Google Scholar 

  • 11. Brodie, M.L.: Data integration at scale: From relational data integration to information ecosystems. In: Proc. 24th IEEE Intl. Conf. on Advanced Information Networking and Applications (AINA). pp. 2–3. IEEE Computer Society, Perth, Australia (2010)

    Google Scholar 

  • 12. Calvanese, D., Giacomo, G.D., Lenzerini, M., Nardi, D., Rosati, R.: Data Integration in Data Warehousing. International Journal of Cooperative Information Systems (IJCIS) 10(3), 237–271 (2001)

    Google Scholar 

  • 13. Dixon, J.: Data lakes revisited. James Dixon’s Blog (September 2014), https://jamesdixon.wordpress.com/2014/09/25/data-lakes-revisited/

  • 14. Do, H.H., Rahm, E.: Coma - a system for flexible combination of schema matching approaches. In: Proc. 28th Intl. Conference on Very Large Data Bases (VLDB). pp. 610–621. Morgan Kaufmann, Hong Kong, China (2002)

    Google Scholar 

  • 15. Fagin, R.: Tuple-generating dependencies. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, pp. 3201–3202. Springer (2009), https://doi.org/10.1007/978-0-387-39940-9_1274

  • 16. Fagin, R., Haas, L.M., Hernández, M.A., Miller, R.J., Popa, L., Velegrakis, Y.: Clio: Schema mapping creation and data exchange. In: Conceptual Modeling: Foundations and Applications. LNCS, vol. 5600, pp. 198–236. Springer (2009)

    Google Scholar 

  • 17. Fagin, R., Haas, L.M., Hernández, M.A., Miller, R.J., Popa, L., Velegrakis, Y.: Clio: Schema mapping creation and data exchange. In: Borgida, A., Chaudhri, V.K., Giorgini, P., Yu, E.S.K. (eds.) Conceptual Modeling: Foundations and Applications. Lecture Notes in Computer Science, vol. 5600, pp. 198–236. Springer (2009)

    Google Scholar 

  • 18. Fagin, R., Kolaitis, P.G., Popa, L., Tan, W.C.: Composing schema mappings: Second-order dependencies to the rescue. ACM Trans. Database Syst. 30(4), 994–1055 (2005)

    Google Scholar 

  • 19. Fuxman, A., Hernández, M.A., Ho, C.T.H., Miller, R.J., Papotti, P., Popa, L.: Nested mappings: Schema mapping reloaded. In: Dayal, U., Whang, K.Y., Lomet, D.B., Alonso, G., Lohman, G.M., Kersten, M.L., Cha, S.K., Kim, Y.K. (eds.) Proc. 32nd Intl. Conference on Very Large Data Bases (VLDB). pp. 67–78. ACM Press (2006)

    Google Scholar 

  • 20. Gessert, F., Ritter, N.: Scalable data management: Nosql data stores in research and practice. In: Proc. 32nd IEEE International Conference on Data Engineering (ICDE). pp. 1420–1423. IEEE Computer Society, Helsinki, Finland (2016), https://doi.org/10.1109/ICDE.2016.7498360

  • 21. Haas, L.M., Hernández, M.A., Ho, H., Popa, L., Roth, M.: Clio grows up: from research prototype to industrial tool. In: Proc. SIGMOD Conf. pp. 805–810. ACM Press (2005)

    Google Scholar 

  • 22. Hai, R., Geisler, S., Quix, C.: Constance: An intelligent data lake system. In: Özcan, F., Koutrika, G., Madden, S. (eds.) Proc. Intl. Conf. on Management of Data (SIGMOD). pp. 2097–2100. ACM, San Francisco, CA, USA (2016), http://doi.acm.org/10.1145/2882903.2899389

  • 23. Haslhofer, B., Klas, W.: A survey of techniques for achieving metadata interoperability. ACM Comput. Surv. 42(2) (2010)

    Google Scholar 

  • 24. Hernández, M.A., Miller, R.J., Haas, L.M.: Clio: A semi-automatic tool for schema mapping. In: Proc. ACM SIGMOD. p. 607 (2001)

    Google Scholar 

  • 25. Horkoff, J., Barone, D., Jiang, L., Yu, E.S.K., Amyot, D., Borgida, A., Mylopoulos, J.: Strategic business modeling: representation and reasoning. Software and System Modeling 13(3), 1015–1041 (2014), https://doi.org/10.1007/s10270-012-0290-8

  • 26. Jarke, M., Gallersdörfer, R., Jeusfeld, M.A., Staudt, M.: ConceptBase - a deductive object base for meta data management. Journal of Intelligent Information Systems 4(2), 167–192 (1995)

    Google Scholar 

  • 27. Jarke, M., Jeusfeld, M.A., Quix, C., Vassiliadis, P.: Architecture and Quality in Data Warehouses: An Extended Repository Approach. Information Systems 24(3), 229–253 (1999)

    Google Scholar 

  • 28. Jarke, M., Lenzerini, M., Vassiliou, Y., Vassiliadis, P. (eds.): Fundamentals of Data Warehouses. Springer-Verlag, 2 edn. (2003)

    Google Scholar 

  • 29. Jeusfeld, M.A.: Änderungskontrolle in Deduktiven Objektbanken. Ph.D. thesis, Universität Passau (1992)

    Google Scholar 

  • 30. Kensche, D., Quix, C.: Transformation of models in(to) a generic metamodel. In: Proc. BTW Workshop on Model and Metadata Management. pp. 4–15 (2007)

    Google Scholar 

  • 31. Kensche, D., Quix, C., Chatti, M.A., Jarke, M.: GeRoMe: A generic role based metamodel for model management. Journal on Data Semantics VIII, 82–117 (2007)

    Google Scholar 

  • 32. Kensche, D., Quix, C., Li, X., Li, Y.: GeRoMeSuite: A system for holistic generic model management. In: Koch, C., Gehrke, J., Garofalakis, M.N., Srivastava, D., Aberer, K., Deshpande, A., Florescu, D., Chan, C.Y., Ganti, V., Kanne, C.C., Klas, W., Neuhold, E.J. (eds.) Proceedings 33rd Intl. Conf. on Very Large Data Bases (VLDB). pp. 1322–1325. Vienna, Austria (2007)

    Google Scholar 

  • 33. Kensche, D., Quix, C., Li, X., Li, Y., Jarke, M.: Generic schema mappings for composition and query answering. Data Knowl. Eng. 68(7), 599–621 (2009)

    Google Scholar 

  • 34. Lenzerini, M.: Data integration: A theoretical perspective. In: Popa, L. (ed.) Proc. 21st ACM Symposium on Principles of Database Systems (PODS). pp. 233–246. ACM Press, Madison, Wisconsin (2002)

    Google Scholar 

  • 35. Li, X., Quix, C.: Merging relational views: A minimization approach. In: Jeusfeld, M.A., Delcambre, L.M.L., Ling, T.W. (eds.) Proc. 30th Intl. Conference on Conceptual Modeling (ER 2011). Lecture Notes in Computer Science, vol. 6998, pp. 379–392. Springer, Brussels, Belgium (2011)

    Google Scholar 

  • 36. Li, X., Quix, C., Kensche, D., Geisler, S.: Automatic schema merging using mapping constraints among incomplete sources. In: Huang, J., Koudas, N., Jones, G.J.F., Wu, X., Collins-Thompson, K., An, A. (eds.) Proc. 19th ACM Conf. on Information and Knowledge Management (CIKM). pp. 299–308. ACM, Toronto, Ontario, Canada (2010)

    Google Scholar 

  • 37. López, J., Olivé, A.: A framework for the evolution of temporal conceptual schemas of information systems. In: Proc. 12th Intl. Conf. on Advanced Information Systems Engineering (CAiSE). pp. 369–386. Stockholm, Sweden (2000), https://doi.org/10.1007/3-540-45140-4_25

  • 38. Melnik, S., Rahm, E., Bernstein, P.A.: Developing metadata-intensive applications with rondo. Journal of Web Semantics 1(1), 47–74 (2003)

    Google Scholar 

  • 39. Melnik, S., Rahm, E., Bernstein, P.A.: Rondo: A programming platform for generic model management. In: Proc. SIGMOD. pp. 193–204. ACM (2003)

    Google Scholar 

  • 40. Mylopoulos, J., Borgida, A., Jarke, M., Koubarakis, M.: Telos: Representing Knowledge About Information Systems. ACM Transactions on Information Systems 8(4), 325–362 (1990)

    Google Scholar 

  • 41. Nicolaescu, P., Rosenstengel, M., Derntl, M., Klamma, R., Jarke, M.: View-based near realtime collaborative modeling for information systems engineering. In: Proc. 28th Intl. Conf. on Advanced Information Systems Engineering (CAiSE). pp. 3–17. Ljubljana, Slovenia (2016), https://doi.org/10.1007/978-3-319-39696-5_1

  • 42. Nissen, H.W., Jarke, M.: Repository support for multi-perspective requirements engineering. Inf. Syst. 24(2), 131–158 (1999), https://doi.org/10.1016/S0306-4379(99)00009-5

  • 43. Olivé, A.: On the design and implementation of information systems from deductive conceptual models. In: Proc. 15th Intl. Conf. on Very Large Data Bases (VLDB). pp. 3–11. Amsterdam, The Netherlands (1989), http://www.vldb.org/conf/1989/P003.PDF

  • 44. Olivé, A.: Conceptual modeling in agile information systems development. In: Proc. 16th Intl. Conf. on Enterprise Information Systems (ICEIS). pp. IS–11. Lisbon, Portugal (2014)

    Google Scholar 

  • 45. Otto, B., Lohmann, S., Auer, S., Brost, G., Cirullies, J., Eitel, A., Ernst, T., Haas, C., Huber, M., Jung, C., Jürjens, J., Lange, C., Mader, C., Menz, N., Nagel, R., Pettenpohl, H., Pullmann, J., Quix, C., Schon, J., Schulz, D., Schütte, J., Spiekermann, M., Wenzel, S.: Reference architecture model for the Industrial Data Space. Technical report, Fraunhofer-Gesellschaft (2017), http://www.industrialdataspace.de

  • 46. Quix, C.: Data Lakes: A Solution or a new Challenge for Big Data Integration? In: Proc. 5th Intl. Conf. Data Management Technologies and Applications (DATA). p. 7. Lisbon, Portugal (2016), keynote presentation

    Google Scholar 

  • 47. Quix, C., Berlage, T., Jarke, M.: Interactive pay-as-you-go-integration of life science data: The HUMIT approach. ERCIM News 2016(104) (2016), http://ercim-news.ercim.eu/en104/special/interactive-pay-as-you-go-integration-of-life-science-data-the-humit-approach

  • 48. Quix, C., Hai, R., Vatov, I.: Metadata extraction and management in data lakes with GEMMS. Complex Systems Informatics and Modeling Quarterly (CSIMQ) 9, 67–83 (2016), https://doi.org/10.7250/csimq.2016-9.04

  • 49. Quix, C., Kensche, D., Li, X.: Generic schema merging. In: Krogstie, J., Opdahl, A., Sindre, G. (eds.) Proc. 19th Intl. Conf. on Advanced Information Systems Engineering (CAiSE’07). LNCS, vol. 4495, pp. 127–141. Springer-Verlag (2007)

    Google Scholar 

  • 50. Quix, C., Kensche, D., Li, X.: Matching of ontologies with xml schemas using a generic metamodel. In: Meersman, R., Tari, Z. (eds.) Proc. OTM Confederated International Conf. CoopIS/DOA/ODBASE/GADA/IS. Lecture Notes in Computer Science, vol. 4803, pp. 1081–1098. Springer, Vilamoura, Portugal (2007)

    Google Scholar 

  • 51. Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB Journal 10(4), 334–350 (2001)

    Google Scholar 

  • 52. Ramesh, B., Jarke, M.: Toward reference models of requirements traceability. IEEE Trans. Software Eng. 27(1), 58–93 (2001), https://doi.org/10.1109/32.895989

  • 53. Raventós, R., Olivé, A.: An object-oriented operation-based approach to translation between MOF metaschemas. Data Knowl. Eng. 67(3), 444–462 (2008), https://doi.org/10.1016/j.datak.2008.07.003

  • 54. Shvaiko, P., Euzenat, J.: Ontology matching: State of the art and future challenges. IEEE Transactions on Knowledge and Data Engineering 25(1), 158–176 (2013)

    Google Scholar 

  • 55. Shvaiko, P., Euzenat, J.: A survey of schema-based matching approaches. Journal on Data Semantics IV, 146–171 (2005), lNCS 3730

    Google Scholar 

  • 56. Staudt, M., Jarke, M.: View management support in advanced knowledge base servers. J. Intell. Inf. Syst. 15(3), 253–285 (2000), https://doi.org/10.1023/A:1008780430577

  • 57. Teniente, E., Olivé, A.: Updating knowledge bases while maintaining their consistency. The VLDB Journal 4(2), 193–241 (1995)

    Google Scholar 

  • 58. Tort, A., Olivé, A.: An approach to website schema.org design. Data Knowl. Eng. 99, 3–16 (2015), https://doi.org/10.1016/j.datak.2015.06.011

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthias Jarke .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Jarke, M., Quix, C. (2017). On Warehouses, Lakes, and Spaces: The Changing Role of Conceptual Modeling for Data Integration. In: Cabot, J., Gómez, C., Pastor, O., Sancho, M., Teniente, E. (eds) Conceptual Modeling Perspectives. Springer, Cham. https://doi.org/10.1007/978-3-319-67271-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67271-7_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67270-0

  • Online ISBN: 978-3-319-67271-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics