Skip to main content
Log in

Two approaches to the integration of heterogeneous data warehouses

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

In this paper we address the problem of integrating independent and possibly heterogeneous data warehouses, a problem that has received little attention so far, but that arises very often in practice.

We start by tackling the basic issue of matching heterogeneous dimensions and provide a number of general properties that a dimension matching should fulfill. We then propose two different approaches to the problem of integration that try to enforce matchings satisfying these properties. The first approach refers to a scenario of loosely coupled integration, in which we just need to identify the common information between data sources and perform join operations over the original sources. The goal of the second approach is the derivation of a materialized view built by merging the sources, and refers to a scenario of tightly coupled integration in which queries are performed against the view.

We also illustrate architecture and functionality of a practical system that we have developed to demonstrate the effectiveness of our integration strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abelló, A., Samos, J., Saltor, F.: On relationships offering new drill-across possibilities. In: Proc. of 5th ACM Int. Workshop on Data Warehousing and OLAP (DOLAP 2002), pp. 7–13, 2002

  2. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison–Wesley, Reading (1995)

    MATH  Google Scholar 

  3. Aho, A.V., Sagiv, Y., Ullman, J.D.: Efficient optimization of a class of relational expressions. ACM Trans. Database Syst. 4(4), 435–454 (1979)

    Article  Google Scholar 

  4. Atzeni, P., Ceri, S., Paraboschi, S., Torlone, R.: Database Systems: Concepts, Languages and Architectures. McGraw–Hill, New York (1999)

    Google Scholar 

  5. Batini, C., Lenzerini, M., Navathe, S.B.: A comparative analysis of methodologies for database scheme integration. ACM Comput. Surv. 18(4), 323–364 (1986)

    Article  Google Scholar 

  6. Cabibbo, L., Torlone, R.: A logical approach to multidimensional databases. In: Proc. of 6th Int. Conference on Extending Database Technology (EDBT’98), pp. 183–197. Springer, Berlin (1998)

    Google Scholar 

  7. Cabibbo, L., Torlone, R.: From a procedural to a visual query language for OLAP. In: Proc. of 10th Int. Conference on Scientific and Statistical Database Management (SSDBM’98), pp. 74–83, 1998

  8. Cabibbo, L., Torlone, R.: On the integration of autonomous data marts. In: Proc. of 16th Int. Conference on Scientific and Statistical Database Management (SSDBM’04), pp. 223–234, 2004

  9. Cabibbo, L., Torlone, R.: Integrating heterogeneous multidimensional databases. In: Proc. of 17th Int. Conference on Scientific and Statistical Database Management (SSDBM’05), pp. 205–214, 2005

  10. Cabibbo, L., Panella, I., Torlone, R.: DaWaII: a tool for the integration of autonomous data marts. In: Proc. of 22nd Int. Conference on Data Engineering (ICDE’06), Demo session, 2006

  11. Elmagarmid, A., Rusinkiewicz, M., Sheth, A.: Management of Heterogeneous and Autonomous Database Systems. Morgan Kaufmann, San Mako (1999)

    Google Scholar 

  12. Fellbaum, C. (ed.): WordNet: a Lexical Database for the English Language. MIT Press, Cambridge (1998)

    Google Scholar 

  13. Gal, A., Anaby-Tavor, A., Trombetta, A., Montesi, D.: A framework for modeling and evaluating automatic semantic reconciliation. VLDB J. 14(1), 50–67 (2005)

    Article  Google Scholar 

  14. Honeyman, P.: Testing satisfaction of functional dependencies. J. ACM 29(3), 668–677 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  15. Hull, R.: Managing semantic heterogeneity in databases: a theoretical perspective. In: Proc. of 16th ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems, pp. 51–61, 1997

  16. Jensen, M.R., Møller, T.H., Pedersen, T.B.: Specifying OLAP Cubes on XML Data. J. Intell. Inf. Syst. 17(2-3), 255–280 (2001)

    Article  MATH  Google Scholar 

  17. Kimball, R., Ross, M.: The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling. 2nd edn. Wiley, New York (2002)

    Google Scholar 

  18. Lenzerini, M.: Data integration: a theoretical perspective. In: Proc. of 21st ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems, pp. 233–246, 2002

  19. Maier, D., Mendelzon, A.O., Sagiv, Y.: Testing implications of data dependencies. ACM Trans. Database Syst. 4(4), 455–468 (1979)

    Article  Google Scholar 

  20. Malvestuto, F.M.: The classification problem with semantically heterogeneous data. In: Proc. of ACM SIGMOD Int. Conference on Management of Data, pp. 157–176, 1988

  21. Malvestuto, F.M., Zuffada, C.: The derivation problem for summary data. In: Proc. of 4th Int. Conference on Scientific and Statistical Database Management (SSDBM’88), pp. 82–89, 1988

  22. Miller, R.J. (ed.): Special issue on integration management. IEEE Bull. Tech. Comm. Data Eng. 25(3), (2002)

  23. Miller, R.J., Hernández, M.A., Haas, L.M., Yan, L., Ho, C.T.H., Fagin, R., Popa, L.: The Clio project: managing heterogeneity. SIGMOD Rec. 30(1), 78–83 (2001)

    Article  Google Scholar 

  24. Pedersen, T.B., Shoshani, A., Gu, J., Jensen, C.S.: Extending OLAP querying to external object databases. In: Proc. of 9th Int. Conference on Information and Knowledge Management, pp. 405–413, 2000

  25. Pedersen, D., Riis, K., Pedersen, T.B.: XML-Extended OLAP Querying. In: Proc. of 14th Int. Conference on Scientific and Statistical Database Management (SSDBM’02), pp. 195–206, 2002

  26. Redner, R., Walker, H.: Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26(2), 195–239 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  27. Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)

    Article  MATH  Google Scholar 

  28. Sato, H.: Handling summary information in a database: derivability. In: Proc. of ACM SIGMOD International Conference on Management of Data, pp. 98–107, 1981

  29. Torlone, R.: Conceptual models for multidimensional databases. In: M. Rafanelli (ed.) Multidimensional Databases, pp. 69–90, Idea Group Publ. (2002)

  30. Torlone, R., Panella, I.: Design and development of a tool for integrating heterogeneous data warehouses. In: Proc. of 7th Int. Conference on Data Warehousing and Knowledge Discovery (DaWaK 2005), pp. 105–114, 2005

  31. Yin, X., Pedersen, T.B.: Evaluating XML-extended OLAP queries based on a physical algebra. In: Proc. of 7th ACM Int. Workshop on Data Warehousing and OLAP (DOLAP’04), pp. 73–82, 2004

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Riccardo Torlone.

Additional information

A preliminary version this paper appeared, under the title “Integrating Heterogeneous Multidimensional Databases” [9], in 17th Int. Conference on Scientific and Statistical Database Management, 2005.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Torlone, R. Two approaches to the integration of heterogeneous data warehouses. Distrib Parallel Databases 23, 69–97 (2008). https://doi.org/10.1007/s10619-007-7022-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-007-7022-z

Keywords

Navigation