Advertisement

An ETL Process for OLAP Using RDF/OWL Ontologies

  • Marko Niinimäki
  • Tapio Niemi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5530)

Abstract

In this paper, we present an advanced method for on-demand construction of OLAP cubes for ROLAP systems. The method contains the steps from cube design to ETL but focuses on ETL. Actual data analysis can then be done using the tools and methods of the OLAP software at hand. The method is based on RDF/OWL ontologies and design tools. The ontology serves as a basis for designing and creating the OLAP schema, its corresponding database tables, and finally populating the database.

Our starting point is heterogeneous and distributed data sources that are eventually used to populate the OLAP cubes. Mapping between the source data and its OLAP form is done by converting the data first to RDF using ontology maps. Then the data are extracted from its RDF form by queries that are generated using the ontology of the OLAP schema. Finally, the extracted data are stored in the database tables and analysed using an OLAP software. Algorithms and examples are provided for all these steps.

In our tests, we have used an open source OLAP implementation and a database server. The performance of the system is found satisfactory when testing with a data source of 450 000 RDF statements. We also propose an ontology based tool that will work as a user interface to the system, from design to actual analysis.

Keywords

OLAP ontology ETL 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    XML Path Language (XPath). Technical report, W3C (1999)Google Scholar
  2. 2.
    OWL Web Ontology Language Overview. Technical report, W3C (2004)Google Scholar
  3. 3.
    RDF primer, W3C recommendation 10 February 2004. Technical report, W3C (2004)Google Scholar
  4. 4.
    RDF Vocabulary Description Language 1.0: RDF Schema. Technical report, W3C (2004)Google Scholar
  5. 5.
    Aberer, K., Cudré-Mauroux, P., Hauswirth, M., Van Pelt, T.: GridVine: Building internet-scale semantic overlay networks. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 107–121. Springer, Heidelberg (2004)Google Scholar
  6. 6.
    Antoniu, G., van Harmelen, F.: Web Ontology Language: OWL, ch. 4. Springer, Heidelberg (2004)Google Scholar
  7. 7.
    Bannon, M., Kontogiannis, K.: Semantic Web data description and discovery. In: STEP 2003: Eleventh Annual International Workshop on Software Technology and Engineering Practice. IEEE, Los Alamitos (2003)Google Scholar
  8. 8.
    Bray, T.: RDF and metadata. XML. com (1998)Google Scholar
  9. 9.
    Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: A generic architecture for storing and querying RDF and RDF schema. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, p. 54. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  10. 10.
    Chaudhuri, S., Dayal, U.: An overview of data warehousing and OLAP technology. SIGMOD Rec. 26(1), 65–74 (1997)CrossRefGoogle Scholar
  11. 11.
    Codd, E., Codd, S., Salley, C.: Providing OLAP to user-analysts: An IT Mandate. Technical report, Hyperion (1993)Google Scholar
  12. 12.
    Codd, E.F.: A relational model for large shared data banks. Communications of the ACM (1970)Google Scholar
  13. 13.
    Codd, E.F.: Further normalization of the data base relational model. In: Data Base Systems, Courant Computer Science Symposia Series 6 (1972)Google Scholar
  14. 14.
    Comito, C., Talia, D.: XML Data Integration in OGSA Grids. In: Pierson, J.-M. (ed.) VLDB DMG 2005. LNCS, vol. 3836, pp. 4–15. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  15. 15.
    Davidson, S., Buneman, P., Kosky, A.: Semantics of database transformations. LNCS, vol. 1358, pp. 55–91. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  16. 16.
    Gennari, J., et al.: The evolution of Protege – an environment for knowledge-based systems development. Int. J. Hum.-Comput. Stud. 58(1) (2003)Google Scholar
  17. 17.
    Gottlob, G., Koch, C., Pichler, R.: The complexity of XPath query evaluation. In: PODS 2003: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 179–190. ACM, New York (2003)CrossRefGoogle Scholar
  18. 18.
    Hull, R.: Managing semantic heterogeneity in databases: a theoretical prospective. In: Proc. ACM Symposium on Principles of Databases (1997)Google Scholar
  19. 19.
    ITU-T. ITU-T Recommendation X.509. Technical Report ISO/IEC 9594-8: 1997, International Telecommunication Union. Information technology - Open Systems Interconnection - The Directory: Authentication framework (1997)Google Scholar
  20. 20.
    Jensen, M.R., Moller, T.H., Bach Pedersen, T.: Specifying OLAP cubes on XML data. J. Intell. Inf. Syst. 17(2-3), 255–280 (2001)zbMATHCrossRefGoogle Scholar
  21. 21.
    Lawrence, M., Rau-Chaplin, A.: The OLAP-Enabled Grid: Model and Query Processing Algorithms. In: HPCS (2006)Google Scholar
  22. 22.
    Lehti, P., Fankhauser, P.: XML data integration with OWL: experiences and challenges. In: Proc. 2004 Intl. Symposium on Applications and the Internet. IEEE, Los Alamitos (2004)Google Scholar
  23. 23.
    Lenz, H., Shoshani, A.: Summarizability in OLAP and statistical data bases. In: Ioannidis, Y., Hansen, D. (eds.) Ninth International Conference on Scientific and Statistical Database Management, Proceedings, Olympia, Washington, USA, pp. 132–143. IEEE Computer Society, Los Alamitos (1997)CrossRefGoogle Scholar
  24. 24.
    Levene, M., Loizou, G.: Why is the snowflake schema a good data warehouse design? Inf. Syst. 28(3), 225–240 (2003)CrossRefGoogle Scholar
  25. 25.
    Maier, D., Ullman, J.D., Vardi, M.Y.: On the foundations of the universal relation model. ACM Trans. Database Syst. 9(2), 283–308 (1984)zbMATHCrossRefMathSciNetGoogle Scholar
  26. 26.
    Näppilä, T., Järvelin, K., Niemi, T.: A tool for data cube construction from structurally heterogeneous XML documents. J. Am. Soc. Inf. Sci. Technol. 59(3), 435–449 (2008)CrossRefGoogle Scholar
  27. 27.
    Niemi, T., Nummenmaa, J., Thanisch, P.: Constructing OLAP cubes based on queries. In: Hammer, J. (ed.) DOLAP 2001, ACM Fourth International Workshop on Data Warehousing and OLAP, pp. 9–11. ACM, New York (2001)CrossRefGoogle Scholar
  28. 28.
    Niemi, T., Nummenmaa, J., Thanisch, P.: Normalising OLAP cubes for controlling sparsity. Data and Knowledge Engineering 46(1), 317–343 (2003)CrossRefGoogle Scholar
  29. 29.
    Niemi, T., Toivonen, S., Niinimäki, M., Nummenmaa, J.: Ontologies with Semantic Web/grid in data integration for OLAP. International Journal on Semantic Web and Information Systems, Special Issue on Semantic Web and Data Warehousing 3(4) (2007)Google Scholar
  30. 30.
    Niinimaki, M.: Grid resources, services and data – towards a semantic grid system. Technical report, University of Tampere, Department of Computer Science (2006)Google Scholar
  31. 31.
    Niinimäki, M., Niemi, T.: Processing Semantic Web queries in Grid. Intl. Transactions on Systems Science and Application 3(4) (2008)Google Scholar
  32. 32.
    Perez, J., Arenas, M., Gutierrez, C.: Semantics and Complexity of SPARQL. In: Cruz, I., et al. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 30–43. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  33. 33.
    Priebe, T., Pernul, G.: Ontology-based Integration of OLAP and Information Retrieval. In: Mařík, V., Štěpánková, O., Retschitzegger, W. (eds.) DEXA 2003. LNCS, vol. 2736. Springer, Heidelberg (2003)Google Scholar
  34. 34.
    Romero, O., Abelló, A.: Automating multidimensional design from ontologies. In: DOLAP 2007: Proceedings of the ACM tenth international workshop on Data warehousing and OLAP, pp. 1–8. ACM, New York (2007)CrossRefGoogle Scholar
  35. 35.
    Sagiv, Y.: Can we use the universal instance assumption without using nulls? In: SIGMOD 1981: Proceedings of the 1981 ACM SIGMOD international conference on Management of data, pp. 108–120. ACM, New York (1981)CrossRefGoogle Scholar
  36. 36.
    Skoutas, D., Simitsis, A.: Designing ETL processes using semantic web technologies. In: DOLAP 2006: Proceedings of the 9th ACM international workshop on Data warehousing and OLAP, pp. 67–74. ACM Press, New York (2006)CrossRefGoogle Scholar
  37. 37.
    Skoutas, D., Simitsis, A.: Ontology-Based Conceptual Design of ETL Processes for Both Structured and Semi-Structured Data. International Journal on Semantic Web and Information Systems, Special Issue on Semantic Web and Data Warehousing 3(4) (2007)Google Scholar
  38. 38.
    Staab, S. (ed.): Handbook on Ontologies. Springer, Heidelberg (2004)Google Scholar
  39. 39.
    The World Wide Web Consortium. XSL Transformations XSLT, Version 1.0, W3C Recommendation (November 16, 1999), http://www.w3.org/TR/xslt
  40. 40.
    Vrdoljak, B., Banek, M., Rizzi, S.: Designing web warehouses from XML schemas. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2003. LNCS, vol. 2737, pp. 89–98. Springer, Heidelberg (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Marko Niinimäki
    • 1
  • Tapio Niemi
    • 1
  1. 1.Helsinki Institute of Physics, Technology ProgrammeCERNGeneva 23

Personalised recommendations