Abstract
The standard XML query languages, XPath and XQuery, are built on the assumption of a regular structure with well-defined parent/child relationships between nodes and exact conditions on nodes. Full text extensions to both languages allow Information Retrieval (IR) style queries over text-rich documents. Important applications exist for which the purely textual information is not predominant and documents exhibit a structure, that is however not relatively regular. Thus, approaches to relax both content and structure conditions in queries on XML document collections and to rank results according to some measure to assess similarity have been proposed, as well as processing approaches to efficiently evaluate them. In the chapter, the various dimensions of query relaxation and alternative approaches to approximate processing will be discussed.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abiteboul, S., Buneman, P., Suciu, D.: Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann (1999)
Agrawal, S., Chaudhuri, S., Das, G., Gionis, A.: Automated Ranking of Database Query Results. In: CIDR (2003)
Amer-Yahia, S., Cho, S., Srivastava, D.: Tree Pattern Relaxation. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 496–513. Springer, Heidelberg (2002)
Amer-Yahia, S., Koudas, N., Marian, A., Srivastava, D., Toman, D.: Structure and Content Scoring for XML. In: VLDB, pp. 361–372 (2005)
Amer-Yahia, S., Lakshmanan, L.V.S., Pandit, S.: FleXPath: Flexible Structure and Full-Text Querying for XML. In: SIGMOD Conference, pp. 83–94 (2004)
Amer-Yahia, S., Lalmas, M.: XML Search: Languages, INEX and Scoring. SIGMOD Record 35(4), 16–23 (2006)
Augsten, N., Barbosa, D., Böhlen, M.H., Palpanas, T.: TASM: Top-k Approximate Subtree Matching. In: ICDE, pp. 353–364 (2010)
Augsten, N., Böhlen, M.H., Dyreson, C.E., Gamper, J.: Approximate Joins for Data-Centric XML. In: ICDE, pp. 814–823 (2008)
Augsten, N., Böhlen, M.H., Gamper, J.: Approximate Matching of Hierarchical Data Using pq-Grams. In: VLDB, pp. 301–312 (2005)
Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. ACM Press / Addison-Wesley (1999)
Bruno, N., Koudas, N., Srivastava, D.: Holistic Twig Joins: Optimal XML Pattern Matching. In: SIGMOD Conference, pp. 310–321 (2002)
Cao, H., Qi, Y.Q., Candan, K.S., Sapino, M.L.: Feedback-driven Result Ranking and Query Refinement for Exploring Semi-structured Data Collections. In: EDBT, pp. 3–14 (2010)
Chaudhuri, S., Ramakrishnan, R., Weikum, G.: Integrating DB and IR Technologies: What is the Sound of One Hand Clapping? In: CIDR, pp. 1–12 (2005)
Damiani, E., Lavarini, N., Marrara, S., Oliboni, B., Pasini, D., Tanca, L., Viviani, G.: The APPROXML Tool Demonstration. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 753–755. Springer, Heidelberg (2002)
Deshpande, A., Ives, Z.G., Raman, V.: Adaptive Query Processing. Foundations and Trends in Databases 1(1), 1–140 (2007)
Fagin, R., Lotem, A., Naor, M.: Optimal Aggregation Algorithms for Middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press (1998)
Goldman, R., Widom, J.: DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. In: VLDB, pp. 436–445 (1997)
Gou, G., Chirkova, R.: Efficiently Querying Large XML Data Repositories: A Survey. IEEE Trans. Knowl. Data Eng. 19(10), 1381–1403 (2007)
Grust, T., van Keulen, M., Teubner, J.: Staircase Join: Teach a Relational DBMS to Watch its (Axis) Steps. In: VLDB, pp. 524–525 (2003)
Guerrini, G., Mesiti, M., Bertino, E.: Structural Similarity Measures in Sources of XML Documents. In: Darmont, J., Boussaid, O. (eds.) Processing and Managing Complex Data for Decision Support, pp. 247–279. IDEA Group (2006)
Guerrini, G., Mesiti, M., Sanz, I.: An Overview of Similarity Measures for Clustering XML Documents. In: Vakali, A., Pallis, G. (eds.) Web Data Management Practices: Emerging Techniques and Technologies, IDEA Group (2007)
Guha, S., Jagadish, H.V., Koudas, N., Srivastava, D., Yu, T.: Approximate XML Joins. In: SIGMOD Conference, pp. 287–298 (2002)
Hung, E., Deng, Y., Subrahmanian, V.S.: TOSS: An Extension of TAX with Ontologies and Similarity Queries. In: SIGMOD Conference, pp. 719–730 (2004)
Ide, N., Véronis, J.: Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art. Computational Linguistics 24(1), 1–40 (1998)
Ilyas, I.F., Beskales, G., Soliman, M.A.: A Survey of Top-k Query Processing Techniques in Relational Database Systems. ACM Comput. Surv. 40(4) (2008)
Jones, K.S., Walker, S., Robertson, S.E.: A Probabilistic Model of Information Retrieval: Development and Comparative Experiments - Part 1 and Part 2. Inf. Process. Manage. 36(6), 779–840 (2000)
Lalmas, M.: XML Retrieval. Synthesis Lectures on Information Concepts, Retrieval, and Services. Morgan & Claypool Publishers (2009)
Lalmas, M., Trotman, A.: XML Retrieval. In: Encyclopedia of Database Systems, pp. 3616–3621 (2009)
Lau, H.L., Ng, W.: A Multi-Ranker Model for Adaptive XML Searching. VLDB J. 17(1), 57–80 (2008)
Lovins, J.B.: Development of a Stemming Algorithm. Mechanical Translation and Computational Linguistics 11, 22–31 (1968)
Marian, A., Amer-Yahia, S., Koudas, N., Srivastava, D.: Adaptive Processing of Top- k Queries in XML. In: ICDE, pp. 162–173 (2005)
Marian, A., Schenkel, R., Theobald, M.: Ranked XML Processing. In: Encyclopedia of Database Systems, pp. 2325–2332 (2009)
Navarro, G.: A Guided Tour to Approximate String Matching. ACM Comput. Surv. 33(1), 31–88 (2001)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill (1983)
Sanz, I., Llavori, R.B., Mesiti, M., Guerrini, G.: ArHeX: Flexible Composition of Indexes and Similarity Measures for XML. In: ICDE Workshops, pp. 281–284 (2007)
Sanz, I., Mesiti, M., Guerrini, G., Llavori, R.B.: Fragment-Based Approximate Retrieval in Highly Heterogeneous XML Collections. Data Knowl. Eng. 64(1), 266–293 (2008)
Sanz, I., Mesiti, M., Guerrini, G., Llavori, R.B.: Flexible Multi-Similarity XML Data Querying with Top-k Processing. Tech. rep., Universitat Jaume I (2009)
Schlieder, T.: Schema-Driven Evaluation of Approximate Tree-Pattern Queries. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 514–532. Springer, Heidelberg (2002)
Tai, K.C.: The Tree-to-Tree Correction Problem. J. ACM 26(3), 422–433 (1979)
Tatarinov, I., Viglas, S., Beyer, K.S., Shanmugasundaram, J., Shekita, E.J., Zhang, C.: Storing and Querying Ordered XML using a Relational Database System. In: SIGMOD Conference, pp. 204–215 (2002)
Tekli, J., Chbeir, R., Yétongnon, K.: An Overview on XML Similarity: Background, Current Trends and Future Directions. Computer Science Review 3(3), 151–173 (2009)
Theobald, M., Bast, H., Majumdar, D., Schenkel, R., Weikum, G.: TopX: Efficient and Versatile Top-k Query Processing for Semistructured Data. VLDB J. 17(1), 81–115 (2008)
W3C: XML Path Language (XPath) 2.0 (2007), http://www.w3.org/TR/xpath20/
W3C: XQuery 1.0: An XML Query Language (2007), http://www.w3.org/TR/xquery/
W3C: XQuery and XPath Full Text 1.0 (2010), http://www.w3.org/TR/xpath-full-text-10/
Xin, D., Han, J., Chang, K.C.C.: Progressive and Selective Merge: Computing Top-k with ad-hoc Ranking Functions. In: SIGMOD Conference, pp. 103–114 (2007)
Zhang, K., Shasha, D.: Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems. SIAM J. Comput. 18(6), 1245–1262 (1989)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Guerrini, G. (2013). Approximate XML Query Processing. In: Catania, B., Jain, L. (eds) Advanced Query Processing. Intelligent Systems Reference Library, vol 36. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28323-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-28323-9_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28322-2
Online ISBN: 978-3-642-28323-9
eBook Packages: EngineeringEngineering (R0)