PowerDB-XML: A Platform for Data–Centric and Document–Centric XML Processing

  • Torsten Grabs
  • Hans-Jörg Schek
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2824)

Abstract

Relational database systems are well-suited as a platform for data-centric XML processing. Data-centric applications process regularly structured XML documents using precise predicates. However, these approaches come too short when XML applications also require document-centric processing, i.e., processing of less rigidly structured documents using vague predicates in the sense of information retrieval. The PowerDB-XML project at ETH Zurich aims to address this drawback and to cover both these types of XML applications on a single platform. In this paper, we investigate the requirements of document-centric XML processing and propose to refine state-of-the-art retrieval models for unstructured flat document such that they meet the flexibility of the XML format. To do so, we rely on so-called query-specific statistics computed dynamically at query runtime to reflect the query scope. Moreover, we show that document-centric XML processing is efficiently feasible using relational database systems for storage management and standard SQL. This allows us to combine document-centric processing with data-centric XML-to-database mappings. Our XML engine named PowerDB-XML therefore supports the full range of XML applications on the same integrated platform.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abiteboul, S., Buneman, P., Suciu, D.: Data on the Web – From Relations to Semistructured Data and XML. Morgan Kaufmann Publishers, San Francisco (2000)Google Scholar
  2. 2.
    Bohannon, P., Freire, J., Haritsa, J.R., Ramanath, M., Roy, P., Siméon, J.: LegoDB: Customizing Relational Storage for XML Documents. In: Proceedings of 28th International Conference on Very Large Data Bases (VLDB 2002), Hongkong, China, pp. 1091–1094. Morgan Kaufmann, San Francisco (2002)CrossRefGoogle Scholar
  3. 3.
    Bohannon, P., Freire, J., Roy, P., Siméon, J.: From XML Schema to Relations: A Cost-based Approach to XML Storage. In: Proceedings of the 18th International Conference on Data Engineering (ICDE 2002), San Jose, CA, USA. Morgan Kaufmann, San Francisco (2002)Google Scholar
  4. 4.
    Carey, M.J., Kiernan, J., Shanmugasundaram, J., Shekita, E.J., Subramanian, S.N.: XPERANTO: Middleware for Publishing Object-Relational Data as XML Documents. In: Abbadi, A.E., Brodie, M.L., Chakravarthy, S., Dayal, U., Kamel, N., Schlageter, G., Whang, K.-Y. (eds.) Proceedings of 26th International Conference on Very Large Data Bases (VLDB 2000), pp. 646–648. Morgan Kaufmann, San Francisco (2000)Google Scholar
  5. 5.
    Deutsch, A., Fernandez, M.F., Suciu, D.: Storing Semistructured Data with STORED. In: Delis, A., Faloutsos, C., Ghandeharizadeh, S. (eds.) SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, USA, pp. 431–442. ACM Press, New York (1999)CrossRefGoogle Scholar
  6. 6.
    Fernandez, M.F., Tan, W.C., Suciu, D.: SilkRoute: Trading between Relations and XML. WWW9 / Computer Networks 33(1-6), 723–745 (2000)CrossRefGoogle Scholar
  7. 7.
    Florescu, D., Kossmann, D.: Storing and Querying XML Data using an RDMBS. IEEE Data Engineering Bulletin 22(3), 27–34 (1999)Google Scholar
  8. 8.
    Florescu, D., Kossmann, D., Manolescu, I.: Integrating Keyword Search into XML Query Processing. In: Proceedings of the International WWW Conference. Elsevier, Amsterdam (May 2000)Google Scholar
  9. 9.
    Fox, E., Koll, M.: Practical Enhanced Boolean Retrieval: Experiments with the SMART and SIRE Systems. Information Processing and Management 24(3), 257–267 (1988)CrossRefGoogle Scholar
  10. 10.
    Frieder, O., Chowdhury, A., Grossman, D., McCabe, M.: On the Integration of Structured Data and Text: A Review of the SIRE Architecture. In: Proceedings of the 1st DELOS Network of Excellence Workshop on Information Seeking, Searching and Querying in Digital Libraries, Zurich, Switzerland. ERCIM, pp. 53–58 (2000)Google Scholar
  11. 11.
    Fuhr, N., Gövert, N., Kazai, G., Lalmas, M.: INEX: Initiative for the Evaluation of XML Retrieval. In: Baeza-Yates, R., Fuhr, N., Maarek, Y.S. (eds.) Proceedings of the ACM SIGIR Workshop on XML and Information Retrieval, Tampere, Finland, pp. 62–70. ACM Press, New York (2002)Google Scholar
  12. 12.
    Fuhr, N., Gövert, N., Kazai, G., Lalmas, M. (eds.)Proceedings of the First Workshop of the Initiative for the Evaluation of XML Retrieval (INEX). In: ERCIM DELOS, Schloss Dagstuhl, Germany, December 9–11 2002. ERCIM-03- W03 (2003)Google Scholar
  13. 13.
    Fuhr, N., Großjohann, K.: XIRQL: A Query Language for Information Retrieval in XML Documents. In: Croft, W.B., Harper, D.J., Kraft, D.H., Zobel, J. (eds.) Proceedings of the 24th Annual ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, USA, pp. 172–180. ACM Press, New York (2001)CrossRefGoogle Scholar
  14. 14.
    Goldman, R., McHugh, J., Widom, J.: From Semistructured Data to XML: Migrating the Lore Data Model and Query Language. In: ACM SIGMOD Workshop on The Web and Databases (WebDB 1999), Philadelphia, Pennsylvania, USA, June 3–4. INRIA, Informal Proceedings, pp. 25–30 (1999)Google Scholar
  15. 15.
    Goldman, R., Widom, J.: DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. In: Proceedings of 23rd International Conference on Very Large Data Bases, Athens, Greece, pp. 436–445. Morgan Kaufmann, San Francisco (1997)Google Scholar
  16. 16.
    Grabs, T., Böhm, K., Schek, H.-J.: PowerDB-IR – Information Retrieval on Top of a Database Cluster. In: Proceedings of the 10th International Conference on Information and Knowledge Management (CIKM 2001), Atlanta, GA, USA, pp. 411–418. ACM Press, New York (2001)CrossRefGoogle Scholar
  17. 17.
    Grabs, T., Schek, H.-J.: Generating Vector Spaces On-the-fly for Flexible XML Retrieval. In: Baeza-Yates, R., Fuhr, N., Maarek, Y.S. (eds.) Proceedings of the ACM SIGIR Workshop on XML and Information Retrieval, Tampere, Finland, pp. 4–13. ACM Press, New York (2002)Google Scholar
  18. 18.
    Grossman, D.A., Frieder, O., Holmes, D.O., Roberts, D.C.: Integrating Structured Data and Text: A Relational Approach. Journal of the American Society for information Science (JASIS) 48(2), 122–132 (1997)CrossRefGoogle Scholar
  19. 19.
    Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: Xrank: Ranked keyword search over xml documents. In: Halevy, A.Y., Ives, Z.G., Doan, A. (eds.) Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, California, USA, pp. 16–27. ACM, New York (2003)CrossRefGoogle Scholar
  20. 20.
    Kaufmann, H., Schek, H.J.: Text Search Using Database Systems Revisited - Some Experiments -. In: Proceedings of the 13th British National Conference on Databases, pp. 18–20 (1995)Google Scholar
  21. 21.
    Rys, M.: Bringing the Internet to Your Database: Using SQLServer 2000 and XML to Build Loosely-Coupled Systems. In: Proceedings of the 17th International Conference on Data Engineering 2001, Heidelberg, Germany, pp. 465–472. IEEE Computer Society, Los Alamitos (2001)CrossRefGoogle Scholar
  22. 22.
    Salton, G.: The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall, Englewood Cliffs (1971)Google Scholar
  23. 23.
    Salton, G., Fox, E.A., Wu, H.: Extended Boolean Information Retrieval. Commun. ACM 26(12), 1022–1036 (1983)MATHCrossRefMathSciNetGoogle Scholar
  24. 24.
    Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)MATHGoogle Scholar
  25. 25.
    Shanmugasundaram, J., Kiernan, J., Shekita, E.J., Fan, C., Funderburk, J.: Querying XML Views of Relational Data. In: Apers, P.M.G., Atzeni, P., Ceri, S., Paraboschi, S., Ramamohanarao, K., Snodgrass, R.T. (eds.) Proceedings of 27th International Conference on Very Large Data Bases, Roma, Italy, pp. 261–270. Morgan Kaufmann, San Francisco (2001)Google Scholar
  26. 26.
    Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton, J.: Relational Databases for Querying XML Documents: Limitations and Opportunities. In: Atkinson, M.P., Orlowska, M.E., Valduriez, P., Zdonik, S.B., Brodie, M.L. (eds.) Proceedings of 25th International Conference on Very Large Data Bases (VLDB 1999), pp. 302–314. Morgan Kaufmann, San Francisco (1999)Google Scholar
  27. 27.
    Volz, M., Aberer, K., Böhm, K.: Applying a Flexible OODBMS-IRS-Coupling for Structured Document Handling. In: Su, S.Y.W. (ed.) Proceedings of the 12th International Conference on Data Engineering, New Orleans, Louisiana, USA, pp. 10–19. IEEE Computer Society, Los Alamitos (1996)Google Scholar
  28. 28.
    Clark, J., DeRose, S. (eds.): W3C – World Wide Web Consortium . XML Path Language (XPath) Version 1.0 (November 1999), http://www.w3.org/TR/xpath
  29. 29.
    Boag, S., Chamberlin, D., Fernandez, M.F., Florescu, D., Robie, J., Siméon, J. (eds.): W3C – World Wide Web Consortium. XQuery 1.0: An XML Query Language (November 2002), http://www.w3.org/TR/xquery
  30. 30.
    Bray, T., Paoli, J., Sperberg-McQueen, C.M. (eds.): W3C – World Wide Web Consortium. Extensible Markup Language (XML) 1.0 (February 1998), http://www.w3.org/TR/1998/REC-xml-19980210

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Torsten Grabs
    • 1
  • Hans-Jörg Schek
    • 1
  1. 1.Database Research Group, Institute of Information SystemsSwiss Federal Institute of TechnologyZurichSwitzerland

Personalised recommendations