Skip to main content

An ETL Process for OLAP Using RDF/OWL Ontologies

  • Chapter
Journal on Data Semantics XIII

Part of the book series: Lecture Notes in Computer Science ((JODS,volume 5530))

Abstract

In this paper, we present an advanced method for on-demand construction of OLAP cubes for ROLAP systems. The method contains the steps from cube design to ETL but focuses on ETL. Actual data analysis can then be done using the tools and methods of the OLAP software at hand. The method is based on RDF/OWL ontologies and design tools. The ontology serves as a basis for designing and creating the OLAP schema, its corresponding database tables, and finally populating the database.

Our starting point is heterogeneous and distributed data sources that are eventually used to populate the OLAP cubes. Mapping between the source data and its OLAP form is done by converting the data first to RDF using ontology maps. Then the data are extracted from its RDF form by queries that are generated using the ontology of the OLAP schema. Finally, the extracted data are stored in the database tables and analysed using an OLAP software. Algorithms and examples are provided for all these steps.

In our tests, we have used an open source OLAP implementation and a database server. The performance of the system is found satisfactory when testing with a data source of 450 000 RDF statements. We also propose an ontology based tool that will work as a user interface to the system, from design to actual analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. XML Path Language (XPath). Technical report, W3C (1999)

    Google Scholar 

  2. OWL Web Ontology Language Overview. Technical report, W3C (2004)

    Google Scholar 

  3. RDF primer, W3C recommendation 10 February 2004. Technical report, W3C (2004)

    Google Scholar 

  4. RDF Vocabulary Description Language 1.0: RDF Schema. Technical report, W3C (2004)

    Google Scholar 

  5. Aberer, K., Cudré-Mauroux, P., Hauswirth, M., Van Pelt, T.: GridVine: Building internet-scale semantic overlay networks. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 107–121. Springer, Heidelberg (2004)

    Google Scholar 

  6. Antoniu, G., van Harmelen, F.: Web Ontology Language: OWL, ch. 4. Springer, Heidelberg (2004)

    Google Scholar 

  7. Bannon, M., Kontogiannis, K.: Semantic Web data description and discovery. In: STEP 2003: Eleventh Annual International Workshop on Software Technology and Engineering Practice. IEEE, Los Alamitos (2003)

    Google Scholar 

  8. Bray, T.: RDF and metadata. XML. com (1998)

    Google Scholar 

  9. Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: A generic architecture for storing and querying RDF and RDF schema. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, p. 54. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  10. Chaudhuri, S., Dayal, U.: An overview of data warehousing and OLAP technology. SIGMOD Rec. 26(1), 65–74 (1997)

    Article  Google Scholar 

  11. Codd, E., Codd, S., Salley, C.: Providing OLAP to user-analysts: An IT Mandate. Technical report, Hyperion (1993)

    Google Scholar 

  12. Codd, E.F.: A relational model for large shared data banks. Communications of the ACM (1970)

    Google Scholar 

  13. Codd, E.F.: Further normalization of the data base relational model. In: Data Base Systems, Courant Computer Science Symposia Series 6 (1972)

    Google Scholar 

  14. Comito, C., Talia, D.: XML Data Integration in OGSA Grids. In: Pierson, J.-M. (ed.) VLDB DMG 2005. LNCS, vol. 3836, pp. 4–15. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  15. Davidson, S., Buneman, P., Kosky, A.: Semantics of database transformations. LNCS, vol. 1358, pp. 55–91. Springer, Heidelberg (1998)

    Book  Google Scholar 

  16. Gennari, J., et al.: The evolution of Protege – an environment for knowledge-based systems development. Int. J. Hum.-Comput. Stud. 58(1) (2003)

    Google Scholar 

  17. Gottlob, G., Koch, C., Pichler, R.: The complexity of XPath query evaluation. In: PODS 2003: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 179–190. ACM, New York (2003)

    Chapter  Google Scholar 

  18. Hull, R.: Managing semantic heterogeneity in databases: a theoretical prospective. In: Proc. ACM Symposium on Principles of Databases (1997)

    Google Scholar 

  19. ITU-T. ITU-T Recommendation X.509. Technical Report ISO/IEC 9594-8: 1997, International Telecommunication Union. Information technology - Open Systems Interconnection - The Directory: Authentication framework (1997)

    Google Scholar 

  20. Jensen, M.R., Moller, T.H., Bach Pedersen, T.: Specifying OLAP cubes on XML data. J. Intell. Inf. Syst. 17(2-3), 255–280 (2001)

    Article  MATH  Google Scholar 

  21. Lawrence, M., Rau-Chaplin, A.: The OLAP-Enabled Grid: Model and Query Processing Algorithms. In: HPCS (2006)

    Google Scholar 

  22. Lehti, P., Fankhauser, P.: XML data integration with OWL: experiences and challenges. In: Proc. 2004 Intl. Symposium on Applications and the Internet. IEEE, Los Alamitos (2004)

    Google Scholar 

  23. Lenz, H., Shoshani, A.: Summarizability in OLAP and statistical data bases. In: Ioannidis, Y., Hansen, D. (eds.) Ninth International Conference on Scientific and Statistical Database Management, Proceedings, Olympia, Washington, USA, pp. 132–143. IEEE Computer Society, Los Alamitos (1997)

    Chapter  Google Scholar 

  24. Levene, M., Loizou, G.: Why is the snowflake schema a good data warehouse design? Inf. Syst. 28(3), 225–240 (2003)

    Article  Google Scholar 

  25. Maier, D., Ullman, J.D., Vardi, M.Y.: On the foundations of the universal relation model. ACM Trans. Database Syst. 9(2), 283–308 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  26. Näppilä, T., Järvelin, K., Niemi, T.: A tool for data cube construction from structurally heterogeneous XML documents. J. Am. Soc. Inf. Sci. Technol. 59(3), 435–449 (2008)

    Article  Google Scholar 

  27. Niemi, T., Nummenmaa, J., Thanisch, P.: Constructing OLAP cubes based on queries. In: Hammer, J. (ed.) DOLAP 2001, ACM Fourth International Workshop on Data Warehousing and OLAP, pp. 9–11. ACM, New York (2001)

    Chapter  Google Scholar 

  28. Niemi, T., Nummenmaa, J., Thanisch, P.: Normalising OLAP cubes for controlling sparsity. Data and Knowledge Engineering 46(1), 317–343 (2003)

    Article  Google Scholar 

  29. Niemi, T., Toivonen, S., Niinimäki, M., Nummenmaa, J.: Ontologies with Semantic Web/grid in data integration for OLAP. International Journal on Semantic Web and Information Systems, Special Issue on Semantic Web and Data Warehousing 3(4) (2007)

    Google Scholar 

  30. Niinimaki, M.: Grid resources, services and data – towards a semantic grid system. Technical report, University of Tampere, Department of Computer Science (2006)

    Google Scholar 

  31. Niinimäki, M., Niemi, T.: Processing Semantic Web queries in Grid. Intl. Transactions on Systems Science and Application 3(4) (2008)

    Google Scholar 

  32. Perez, J., Arenas, M., Gutierrez, C.: Semantics and Complexity of SPARQL. In: Cruz, I., et al. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 30–43. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  33. Priebe, T., Pernul, G.: Ontology-based Integration of OLAP and Information Retrieval. In: Mařík, V., Štěpánková, O., Retschitzegger, W. (eds.) DEXA 2003. LNCS, vol. 2736. Springer, Heidelberg (2003)

    Google Scholar 

  34. Romero, O., Abelló, A.: Automating multidimensional design from ontologies. In: DOLAP 2007: Proceedings of the ACM tenth international workshop on Data warehousing and OLAP, pp. 1–8. ACM, New York (2007)

    Chapter  Google Scholar 

  35. Sagiv, Y.: Can we use the universal instance assumption without using nulls? In: SIGMOD 1981: Proceedings of the 1981 ACM SIGMOD international conference on Management of data, pp. 108–120. ACM, New York (1981)

    Chapter  Google Scholar 

  36. Skoutas, D., Simitsis, A.: Designing ETL processes using semantic web technologies. In: DOLAP 2006: Proceedings of the 9th ACM international workshop on Data warehousing and OLAP, pp. 67–74. ACM Press, New York (2006)

    Chapter  Google Scholar 

  37. Skoutas, D., Simitsis, A.: Ontology-Based Conceptual Design of ETL Processes for Both Structured and Semi-Structured Data. International Journal on Semantic Web and Information Systems, Special Issue on Semantic Web and Data Warehousing 3(4) (2007)

    Google Scholar 

  38. Staab, S. (ed.): Handbook on Ontologies. Springer, Heidelberg (2004)

    Google Scholar 

  39. The World Wide Web Consortium. XSL Transformations XSLT, Version 1.0, W3C Recommendation (November 16, 1999), http://www.w3.org/TR/xslt

  40. Vrdoljak, B., Banek, M., Rizzi, S.: Designing web warehouses from XML schemas. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2003. LNCS, vol. 2737, pp. 89–98. Springer, Heidelberg (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Niinimäki, M., Niemi, T. (2009). An ETL Process for OLAP Using RDF/OWL Ontologies. In: Spaccapietra, S., Zimányi, E., Song, IY. (eds) Journal on Data Semantics XIII. Lecture Notes in Computer Science, vol 5530. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03098-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03098-7_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03097-0

  • Online ISBN: 978-3-642-03098-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics