Skip to main content

Applying the Divergence from Randomness Approach for Content-Only Search in XML Documents

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2997))

Abstract

Content-only retrieval of XML documents deals with the problem of locating the smallest XML elements that satisfy the query. In this paper, we investigate the application of a specific language model for this task, namely Amati’s approach of divergence from randomness. First, we investigate different ways for applying this model without modification by redefining the concept of an (atomic) document for the XML setting. However, this approach yields a retrieval quality lower than the best method known before. We improved the retrieval quality through extending the basic model by an additional factor that refers to the hierarchical structure of XML documents.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amati, G., van Rijsbergen, C.: Probabilistic models of Information Retrieval based on measuring the divergence from randomness. ACM Transactions on Information Systems 20(4), 357–389 (2002)

    Article  Google Scholar 

  2. Amati, G.: Probability Models for Information Retrieval based on Divergence from Randomness. PhD thesis, University of Glasgow (2003)

    Google Scholar 

  3. Chiaramella, Y., Mulhem, P., Fourel, F.: A Model for Multimedia Information Retrieval. Technical report, FERMI ESPRIT BRA 8134, University of Glasgow (1996)

    Google Scholar 

  4. Fuhr, N., Großjohann, K.: XIRQL: A Query Language for Information Retrieval in XML Documents. In: Croft, W., Harper, D., Kraft, D., Zobel, J. (eds.) Proceedings of the 24th Annual International Conference on Research and development in Information Retrieval, pp. 172–180. ACM, New York (2001)

    Google Scholar 

  5. Fuhr, N., Gövert, N., Kazai, G., Lalmas, M.: INEX: INitiative for the Evaluation of XML Retrieval. In: Baeza-Yates, R., Fuhr, N., Maarek, Y.S. (eds.) Proceedings of the SIGIR 2002 Workshop on XML and Information Retrieval (2002), http://www.is.informatik.uni-duisburg.de/bib/xml/Fuhr_etal_02a.html

  6. Fuhr, N., Gövert, N., Kazai, G., Lalmas, M. (eds.): INitiative for the Evaluation of XML Retrieval (INEX). Proceedings of the First INEX Workshop, Dagstuhl, Germany, December 8-11 (2002); ERCIM Workshop Proceedings, Sophia Antipolis, France, ERCIM (2003), http://www.ercim.org/publication/ws-proceedings/INEX2002.pdf

  7. Gövert, N., Kazai, G.: Overview of the INitiative for the Evaluation of XML retrieval (INEX). In: [6], pp. 1–17 (2003), http://www.ercim.org/publication/ws-proceedings/INEX2002.pdf

  8. Gövert, N., Fuhr, N., Abolhassani, M., Großjohann, K.: Content-oriented XML retrieval with HyREX. In: [6], pp. 26–32 (2003), http://www.ercim.org/publication/ws-proceedings/INEX2002.pdf

  9. Grabs, T., Schek, H.-J.: Flexible Information Retrieval from XML with PowerDBXML. In: [6], pp. 141–148 (2003), http://www.ercim.org/publication/ws-proceedings/INEX2002.pdf

  10. Ogilvie, P., Callan, J.: Language Models and Structure Document Retrieval. In: [6], pp. 33–40 (2003), http://www.ercim.org/publication/ws-proceedings/INEX2002.pdf

  11. Piwowarski, B., Faure, G.-E., Gallinari, P.: Bayesian Networks and INEX. In: [6], pp. 149–154 (2003), http://www.ercim.org/publication/ws-proceedings/INEX2002.pdf

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Abolhassani, M., Fuhr, N. (2004). Applying the Divergence from Randomness Approach for Content-Only Search in XML Documents. In: McDonald, S., Tait, J. (eds) Advances in Information Retrieval. ECIR 2004. Lecture Notes in Computer Science, vol 2997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24752-4_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24752-4_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21382-6

  • Online ISBN: 978-3-540-24752-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics