Skip to main content

Encoding XML in Vector Spaces

  • Conference paper
Book cover Advances in Information Retrieval (ECIR 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3408))

Included in the following conference series:

Abstract

We develop a framework for representing XML documents and queries in vector spaces and build indexes for processing text-centric semi-structured queries that support a proximity measure between XML documents. The idea of using vector spaces for XML retrieval is not new. In this paper we (i) unify prior approaches into a single framework; (ii) develop techniques to eliminate special purpose auxiliary computations (outside the vector space) used previously; (iii) give experimental evidence on benchmark queries that our approach is competitive in its retrieval quality and (iv) as an immediate consequence of the framework, are able to classify and cluster XML documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amer-Yahia, S., Koudas, N., Srivastava, D.: Approximate matching in XML, http://www.research.att.com/~sihem/publications/PART1.pdf

  2. Amer-Yahia, S., Botev, C., Shanmugasundaram, J.: TeXQuery: A Full-Text Search Extension to XQuery. In: WWW 2004 (2004)

    Google Scholar 

  3. Amer-Yahia, S., Lakshmanan, L.V.S., Pandit, S.: FleXPath: Flexible Structure and Full-Text Querying for XML. In: SIGMOD 2004 (2004)

    Google Scholar 

  4. Carmel, D., Afraty, N., Landau, G., Maarek, Y., Mass, Y.: An extension of the vector space model for querying XML documents via XML fragments. In: XML and Information Retrieval Workshop at SIGIR (2002)

    Google Scholar 

  5. Carmel, D., Maarek, Y., Mandelbrod, M., Mass, Y., Soffer, A.: Searching XML documents via XML fragments. In: SIGIR 2003 (2003)

    Google Scholar 

  6. Chamberlin, D., Florescu, D., Robie, J., Siméon, J., Stefanescu, M.: XQuery: A query language for XML. W3C Technical Report

    Google Scholar 

  7. Crouch, C.J., Apte, S., Bapat, H.: Using the extended vector model for XML retrieval. [9], 95–98 (2002)

    Google Scholar 

  8. Doucet, A., Ahonen-Myka, H.: Naive clustering of a large XML document collection. [9], 81–88 (2002)

    Google Scholar 

  9. Fuhr, N., Gövert, N., Kazai, G., Lalmas, M.: Proceedings of the First Workshop of the INitiative for the Evaluation of XML Retrieval, INEX (2002)

    Google Scholar 

  10. Fuhr, N., Großjohann, K.: XIRQL: A Query Language for Information Retrieval in XML Documents. Research and Development in Information Retrieval, 172–180 (2001)

    Google Scholar 

  11. Fuhr, N., Weikum, G.: Classification and Intelligent Search on Information in XML. IEEE Data Engineering Bulletin 25(1) (2002)

    Google Scholar 

  12. Gövert, N., Abolhassani, M., Fuhr, N., Großjohann, K.: Content-oriented XML retrieval with HyRex. [9], 26–32 (2002)

    Google Scholar 

  13. Gövert, N., Kazai, G.: Overview of INEX 2002. [9], 1–17 (2002)

    Google Scholar 

  14. Grabs, T., Schek, H.-J.: Generating vector spaces on-the-fly for flexible XML retrieval. In: Second SIGIR XML workshop (2002)

    Google Scholar 

  15. Guillaume, D., Murtagh, F.: Clustering of XML documents. Computer Physics Communications 127, 215–227 (2000)

    Article  MATH  Google Scholar 

  16. Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: Ranked Keyword Search over XML Documents. In: SIGMOD 2003 (2003)

    Google Scholar 

  17. Initiative for the evaluation of XML retrieval, http://qmir.dcs.qmul.ac.uk/INEX/

  18. Kilpeläinen, P.: Tree Matching Problems with Applications to Structured Text Databases. PhD thesis, Dept. of Computer Science, University of Helsinki (1992)

    Google Scholar 

  19. Kazai, G., Lalmas, M., Fuhr, N., Gövert, N.: A report on the first year of the INitiative for the Evaluation of XML Retrieval (INEX 02). Journal of the American Society for Information Science and Technology 54 (2003)

    Google Scholar 

  20. Luk, R., Leong, H., Dillon, T., Chan, A., Bruce Croft, W., Allan, J.: A survey in indexing and searching XML documents. JASIST 53(6), 415–437 (2002)

    Article  Google Scholar 

  21. Kazai, G., Masood, S., Lalmas, M.: A Study of the Assessment of Relevance for the INEX 2002 Test Collection. In: McDonald, S., Tait, J.I. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 296–310. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  22. Mass, Y., Mandelbrod, M., Amitay, E., Carmel, D., Maarek, Y., Soffer, A.: JuruXML – an XML retrieval system at INEX 2002. [9],73–80 (2002)

    Google Scholar 

  23. Meila, M.: Comparing Clusterings. Technical Report 418, University of Washington Statistics Dept. (2002)

    Google Scholar 

  24. Mignet, L., Barbosa, D., Veltri, P.: The XML Web: a First Study. In: Proceedings of the 12th International World Wide Web Conference. Evaluating Structural Similarity in XML Documents. Proceedings of the Fifth International Workshop on the Web and Databases, WebDB 2002 (2003)

    Google Scholar 

  25. Polyzotis, N., Garofalakis, M., Ioannidis, Y.: Approximate XML Query Answers. In: SIGMOD 2004 (2004)

    Google Scholar 

  26. Punin, J., Krishnamoorthy, M., Zaki, M.: LOGML: Log markup language for web usage mining. In: WEBKDD Workshop, with SIGKDD 2001 (2001)

    Google Scholar 

  27. Rizzolo, F., Mendelzon, A.: Indexing XML Data with ToXin. In: Proceedings of Fourth International Workshop on the Web and Databases (2001)

    Google Scholar 

  28. Salton, G.: The SMART Retrieval System – Experiments in automatic document processing. Prentice Hall Inc, Englewood Cliffs (1971)

    Google Scholar 

  29. Schlieder, T.: Similarity search in XML data using cost-based query transformations. In: Proc. 4th WebDB, pp. 19–24 (2001)

    Google Scholar 

  30. Schlieder, T., Meuss, H.: Querying and Ranking XML Documents. Journal of the American Society for Information Science and Technology 53(6), 489–503 (2002)

    Article  Google Scholar 

  31. Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton, J.: Relational Databases for Querying XML Documents: Limitations and Opportunities. In: Proc. VLDB 1999 (1999)

    Google Scholar 

  32. Zaki, M.: Efficiently Mining Frequent Trees in a Forest. In: Proceedings of ACM KDD 2002 (2002)

    Google Scholar 

  33. Zaki, M., Aggarwal, C.: XRULES: An Effective Structural Classifier for XML Data. In: Proceedings of ACM KDD 2003 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kakade, V., Raghavan, P. (2005). Encoding XML in Vector Spaces. In: Losada, D.E., Fernández-Luna, J.M. (eds) Advances in Information Retrieval. ECIR 2005. Lecture Notes in Computer Science, vol 3408. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31865-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-31865-1_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25295-5

  • Online ISBN: 978-3-540-31865-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics