Abstract
Relational database systems are well-suited as a platform for data-centric XML processing. Data-centric applications process regularly structured XML documents using precise predicates. However, these approaches come too short when XML applications also require document-centric processing, i.e., processing of less rigidly structured documents using vague predicates in the sense of information retrieval. The PowerDB-XML project at ETH Zurich aims to address this drawback and to cover both these types of XML applications on a single platform. In this paper, we investigate the requirements of document-centric XML processing and propose to refine state-of-the-art retrieval models for unstructured flat document such that they meet the flexibility of the XML format. To do so, we rely on so-called query-specific statistics computed dynamically at query runtime to reflect the query scope. Moreover, we show that document-centric XML processing is efficiently feasible using relational database systems for storage management and standard SQL. This allows us to combine document-centric processing with data-centric XML-to-database mappings. Our XML engine named PowerDB-XML therefore supports the full range of XML applications on the same integrated platform.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Abiteboul, S., Buneman, P., Suciu, D.: Data on the Web – From Relations to Semistructured Data and XML. Morgan Kaufmann Publishers, San Francisco (2000)
Bohannon, P., Freire, J., Haritsa, J.R., Ramanath, M., Roy, P., Siméon, J.: LegoDB: Customizing Relational Storage for XML Documents. In: Proceedings of 28th International Conference on Very Large Data Bases (VLDB 2002), Hongkong, China, pp. 1091–1094. Morgan Kaufmann, San Francisco (2002)
Bohannon, P., Freire, J., Roy, P., Siméon, J.: From XML Schema to Relations: A Cost-based Approach to XML Storage. In: Proceedings of the 18th International Conference on Data Engineering (ICDE 2002), San Jose, CA, USA. Morgan Kaufmann, San Francisco (2002)
Carey, M.J., Kiernan, J., Shanmugasundaram, J., Shekita, E.J., Subramanian, S.N.: XPERANTO: Middleware for Publishing Object-Relational Data as XML Documents. In: Abbadi, A.E., Brodie, M.L., Chakravarthy, S., Dayal, U., Kamel, N., Schlageter, G., Whang, K.-Y. (eds.) Proceedings of 26th International Conference on Very Large Data Bases (VLDB 2000), pp. 646–648. Morgan Kaufmann, San Francisco (2000)
Deutsch, A., Fernandez, M.F., Suciu, D.: Storing Semistructured Data with STORED. In: Delis, A., Faloutsos, C., Ghandeharizadeh, S. (eds.) SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, USA, pp. 431–442. ACM Press, New York (1999)
Fernandez, M.F., Tan, W.C., Suciu, D.: SilkRoute: Trading between Relations and XML. WWW9 / Computer Networks 33(1-6), 723–745 (2000)
Florescu, D., Kossmann, D.: Storing and Querying XML Data using an RDMBS. IEEE Data Engineering Bulletin 22(3), 27–34 (1999)
Florescu, D., Kossmann, D., Manolescu, I.: Integrating Keyword Search into XML Query Processing. In: Proceedings of the International WWW Conference. Elsevier, Amsterdam (May 2000)
Fox, E., Koll, M.: Practical Enhanced Boolean Retrieval: Experiments with the SMART and SIRE Systems. Information Processing and Management 24(3), 257–267 (1988)
Frieder, O., Chowdhury, A., Grossman, D., McCabe, M.: On the Integration of Structured Data and Text: A Review of the SIRE Architecture. In: Proceedings of the 1st DELOS Network of Excellence Workshop on Information Seeking, Searching and Querying in Digital Libraries, Zurich, Switzerland. ERCIM, pp. 53–58 (2000)
Fuhr, N., Gövert, N., Kazai, G., Lalmas, M.: INEX: Initiative for the Evaluation of XML Retrieval. In: Baeza-Yates, R., Fuhr, N., Maarek, Y.S. (eds.) Proceedings of the ACM SIGIR Workshop on XML and Information Retrieval, Tampere, Finland, pp. 62–70. ACM Press, New York (2002)
Fuhr, N., Gövert, N., Kazai, G., Lalmas, M. (eds.)Proceedings of the First Workshop of the Initiative for the Evaluation of XML Retrieval (INEX). In: ERCIM DELOS, Schloss Dagstuhl, Germany, December 9–11 2002. ERCIM-03- W03 (2003)
Fuhr, N., Großjohann, K.: XIRQL: A Query Language for Information Retrieval in XML Documents. In: Croft, W.B., Harper, D.J., Kraft, D.H., Zobel, J. (eds.) Proceedings of the 24th Annual ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, USA, pp. 172–180. ACM Press, New York (2001)
Goldman, R., McHugh, J., Widom, J.: From Semistructured Data to XML: Migrating the Lore Data Model and Query Language. In: ACM SIGMOD Workshop on The Web and Databases (WebDB 1999), Philadelphia, Pennsylvania, USA, June 3–4. INRIA, Informal Proceedings, pp. 25–30 (1999)
Goldman, R., Widom, J.: DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. In: Proceedings of 23rd International Conference on Very Large Data Bases, Athens, Greece, pp. 436–445. Morgan Kaufmann, San Francisco (1997)
Grabs, T., Böhm, K., Schek, H.-J.: PowerDB-IR – Information Retrieval on Top of a Database Cluster. In: Proceedings of the 10th International Conference on Information and Knowledge Management (CIKM 2001), Atlanta, GA, USA, pp. 411–418. ACM Press, New York (2001)
Grabs, T., Schek, H.-J.: Generating Vector Spaces On-the-fly for Flexible XML Retrieval. In: Baeza-Yates, R., Fuhr, N., Maarek, Y.S. (eds.) Proceedings of the ACM SIGIR Workshop on XML and Information Retrieval, Tampere, Finland, pp. 4–13. ACM Press, New York (2002)
Grossman, D.A., Frieder, O., Holmes, D.O., Roberts, D.C.: Integrating Structured Data and Text: A Relational Approach. Journal of the American Society for information Science (JASIS) 48(2), 122–132 (1997)
Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: Xrank: Ranked keyword search over xml documents. In: Halevy, A.Y., Ives, Z.G., Doan, A. (eds.) Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, California, USA, pp. 16–27. ACM, New York (2003)
Kaufmann, H., Schek, H.J.: Text Search Using Database Systems Revisited - Some Experiments -. In: Proceedings of the 13th British National Conference on Databases, pp. 18–20 (1995)
Rys, M.: Bringing the Internet to Your Database: Using SQLServer 2000 and XML to Build Loosely-Coupled Systems. In: Proceedings of the 17th International Conference on Data Engineering 2001, Heidelberg, Germany, pp. 465–472. IEEE Computer Society, Los Alamitos (2001)
Salton, G.: The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall, Englewood Cliffs (1971)
Salton, G., Fox, E.A., Wu, H.: Extended Boolean Information Retrieval. Commun. ACM 26(12), 1022–1036 (1983)
Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Shanmugasundaram, J., Kiernan, J., Shekita, E.J., Fan, C., Funderburk, J.: Querying XML Views of Relational Data. In: Apers, P.M.G., Atzeni, P., Ceri, S., Paraboschi, S., Ramamohanarao, K., Snodgrass, R.T. (eds.) Proceedings of 27th International Conference on Very Large Data Bases, Roma, Italy, pp. 261–270. Morgan Kaufmann, San Francisco (2001)
Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton, J.: Relational Databases for Querying XML Documents: Limitations and Opportunities. In: Atkinson, M.P., Orlowska, M.E., Valduriez, P., Zdonik, S.B., Brodie, M.L. (eds.) Proceedings of 25th International Conference on Very Large Data Bases (VLDB 1999), pp. 302–314. Morgan Kaufmann, San Francisco (1999)
Volz, M., Aberer, K., Böhm, K.: Applying a Flexible OODBMS-IRS-Coupling for Structured Document Handling. In: Su, S.Y.W. (ed.) Proceedings of the 12th International Conference on Data Engineering, New Orleans, Louisiana, USA, pp. 10–19. IEEE Computer Society, Los Alamitos (1996)
Clark, J., DeRose, S. (eds.): W3C – World Wide Web Consortium . XML Path Language (XPath) Version 1.0 (November 1999), http://www.w3.org/TR/xpath
Boag, S., Chamberlin, D., Fernandez, M.F., Florescu, D., Robie, J., Siméon, J. (eds.): W3C – World Wide Web Consortium. XQuery 1.0: An XML Query Language (November 2002), http://www.w3.org/TR/xquery
Bray, T., Paoli, J., Sperberg-McQueen, C.M. (eds.): W3C – World Wide Web Consortium. Extensible Markup Language (XML) 1.0 (February 1998), http://www.w3.org/TR/1998/REC-xml-19980210
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Grabs, T., Schek, HJ. (2003). PowerDB-XML: A Platform for Data–Centric and Document–Centric XML Processing. In: Bellahsène, Z., Chaudhri, A.B., Rahm, E., Rys, M., Unland, R. (eds) Database and XML Technologies. XSym 2003. Lecture Notes in Computer Science, vol 2824. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39429-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-39429-7_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20055-0
Online ISBN: 978-3-540-39429-7
eBook Packages: Springer Book Archive