Skip to main content
Log in

Efficient keyword search over virtual XML views

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Emerging applications such as personalized portals, enterprise search, and web integration systems often require keyword search over semi-structured views. However, traditional information retrieval techniques are likely to be expensive in this context because they rely on the assumption that the set of documents being searched is materialized. In this paper, we present a system architecture and algorithm that can efficiently evaluate keyword search queries over virtual (unmaterialized) XML views. An interesting aspect of our approach is that it exploits indices present on the base data and thereby avoids materializing large parts of the view that are not relevant to the query results. Another feature of the algorithm is that by solely using indices, we can still score the results of queries over the virtual view, and the resulting scores are the same as if the view was materialized. Our performance evaluation using the INEX data set in the Quark (Bhaskar et al. in Quark: an efficient XQuery full-text implementation. In: SIGMOD, 2006) open-source XML database system indicates that the proposed approach is scalable and efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aboulnaga, A., Naughton, J.F., Zhang C.: Generating synthetic complex-structured XML data. In: WebDB, pp. 79–84 (2001)

  2. Al-Khalifa, S., Yu, C., Jagadish, H.V.: Querying structured text in an XML database. In: SIGMOD (2003)

  3. Amer-Yahia, S. et al.: Structure and content scoring for XML. In: VLDB (2005)

  4. Theobald, A., Weikum, G.: The index-based XXL search engine for querying XML data with relevance rankings (2002)

  5. Baeza-Yates R., Ribeiro-Neto B.: Modern information retrieval. ACM Press, New York (1999)

    Google Scholar 

  6. Bhaskar, A., et al.: Quark: an efficient XQuery full-text implementation. In: SIGMOD (2006)

  7. Botev, C., Shanmugasundaram, J.: Context-sensitive keyword search and ranking for XML. In: WebDB (2005)

  8. Bressan S., Catania B., Lacroix Z., Li Y.G., Maddalena A.: Accelerating queries by pruning XML documents. Data Knowl. Eng. 54(2), 211–240 (2005)

    Article  Google Scholar 

  9. Carey, J.M.: XPERANTO: middleware for publishing object- relational data as XML documents. In: VLDB, pp. 646–648 (2000)

  10. Chan C.Y., Felber P., Garofalakis M.N., Rastogi R.: Efficient filtering of XML documents with XPath expressions. VLDB J. 11(4), 354–379 (2002)

    Article  MATH  Google Scholar 

  11. Chaudhuri S., Gravano L., Marian A.: Optimizing top-k selection queries over multimedia repositories. IEEE Trans. Knowl. Data Eng. 16(8), 992–1009 (2004)

    Article  Google Scholar 

  12. Chen Z. et al.: Index structures for matching XML twigs using relational query processors. Data Knowl. Eng. 60(2), 283–302 (2007)

    Article  Google Scholar 

  13. Chen, Z., Jagadish, H.V., Lakshmanan, L.V.S., Paparizos, S.: From tree patterns to generalized tree patterns: on efficient evaluation of XQuery. In: VLDB (2003)

  14. Cho, S.: Indexing for XML siblings. In: WebDB (2005)

  15. Christophides, V., Cluet, S., Simeon, J.: On wrapping query languages and efficient XML integration. In: SIGMOD (2000)

  16. Curtmola, E., Amer-Yahia, S., Brown, P., Fernandez, M.: GalaTex: a conformant implementation of the XQuery full-text language. In: XIME-P (2005)

  17. Diao, Y., Fischer, P., Franklin, M., To, R.: YFilter: efficient and scalable filtering of XML documents. In: ICDE (2002)

  18. Fagin, R.: Combining fuzzy information from multiple systems. In: PODS (1996)

  19. Fahl G., Risch T.: Query processing over object views of relational data. VLDB J 6(4), 261–281 (1997)

    Article  Google Scholar 

  20. Fernandez M.F., Tan W.C., Suciu D.: SilkRoute: trading between relations and XML. Comput. Netw. 33(1-6), 723–745 (2000)

    Article  Google Scholar 

  21. Fuhr, N., Großjohann, K.: XIRQL: a query language for information retrieval in XML documents (2001)

  22. Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: ranked keyword search over XML documents. In: SIGMOD (2003)

  23. Hristidis, V., Gravano, L., Papakonstantinou, Y.: Efficient IR-style keyword search over relational databases. In: VLDB (2003)

  24. Hristidis, V., Papakonstantinou, Y.: Discover: keyword search in relational databases. In: VLDB (2002)

  25. Ilyas, I.F., et al.: Rank-aware query optimization. In: SIGMOD (2004)

  26. Jagadish H.V. et al.: TIMBER: a native XML database. VLDB J. 11(4), 274–291 (2002)

    Article  MATH  Google Scholar 

  27. Kaushik, R., Krishnamurthy, R., Naughton, J.F., Ramakrishnan, R.: On the integration of structure indexes and inverted lists. In: ICDE (2004)

  28. Marian, A., Siméon, J.: Projecting XML documents. In: VLDB (2003)

  29. Mass, Y., et al.: JuruXML—an XML retrieval system at INEX’02. In: INEX (2002)

  30. Myaeng, S.-H., Jang, D.-H., Kim, M.-S., Zhoo, Z.-C.: A flexible model for retrieval of SGML documents. In: SIGIR (1998)

  31. Naughton J.F. et al.: The niagara internet query system. IEEE Data Eng. Bull. 24(2), 27–33 (2001)

    Google Scholar 

  32. O’Neil, P., et al.: ORDPATHs: insert-friendly XML node labels. In: SIGMOD (2004)

  33. Paparizos, S., Wu, Y., Lakshmanan, L.V.S., Jagadish, H.V.: Tree logical classes for efficient evaluation of XQuery. In: SIGMOD. ACM Press, New York, pp. 71–82 (2004)

  34. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: PODS (2001)

  35. Salton, G.: Automatic text processing: the transaction, analysis and retrieval of information by computer. Addison-Wesley, Reading (1989)

  36. Schott, S., Noga, M.L.: Lazy XSL transformations. In: DocEng 2003. ACM Press, Grenoble (2003)

  37. Shanmugasundaram, J., et al.: Querying XML views of relational data. In: VLDB (2001)

  38. Shao, F., et al.: Efficient ranked keyword search over virtual XML views, technical report TR2007-2077, Cornell University (2007)

  39. Shao, F., Guo, L., Botev, C., Bhaskar, A., Chettiar, M.M.M., Yang, F., Shanmugasundaram, J.: Efficient keyword search over virtual XML views. In: VLDB, pp. 1057–1068 (2007)

  40. Witten I.H., Moffat A., Bell T.C.: Managing Gigabytes: compressing and Indexing Documents and Images. Morgan Kaufmann Publishers, San Francisco (1999)

    Google Scholar 

  41. Yoshikawa M., Amagasa T.: XRel: a path-based approach to storage and retrieval of XML documents using relational databases. ACM Trans. Inter. Tech. 1(1), 110–141 (2001)

    Article  Google Scholar 

  42. Zhang, C., et al.: On supporting containment queries in relational database management systems. In: SIGMOD (2001)

  43. Zobel J., Moffat A.: Exploring the similarity space. SIGIR Forum 32(1), 18–34 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Feng Shao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shao, F., Guo, L., Botev, C. et al. Efficient keyword search over virtual XML views. The VLDB Journal 18, 543–570 (2009). https://doi.org/10.1007/s00778-008-0126-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-008-0126-x

Keywords

Navigation