Abstract
Content-only retrieval of XML documents deals with the problem of locating the smallest XML elements that satisfy the query. In this paper, we investigate the application of a specific language model for this task, namely Amati’s approach of divergence from randomness. First, we investigate different ways for applying this model without modification by redefining the concept of an (atomic) document for the XML setting. However, this approach yields a retrieval quality lower than the best method known before. We improved the retrieval quality through extending the basic model by an additional factor that refers to the hierarchical structure of XML documents.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Amati, G., van Rijsbergen, C.: Probabilistic models of Information Retrieval based on measuring the divergence from randomness. ACM Transactions on Information Systems 20(4), 357–389 (2002)
Amati, G.: Probability Models for Information Retrieval based on Divergence from Randomness. PhD thesis, University of Glasgow (2003)
Chiaramella, Y., Mulhem, P., Fourel, F.: A Model for Multimedia Information Retrieval. Technical report, FERMI ESPRIT BRA 8134, University of Glasgow (1996)
Fuhr, N., Großjohann, K.: XIRQL: A Query Language for Information Retrieval in XML Documents. In: Croft, W., Harper, D., Kraft, D., Zobel, J. (eds.) Proceedings of the 24th Annual International Conference on Research and development in Information Retrieval, pp. 172–180. ACM, New York (2001)
Fuhr, N., Gövert, N., Kazai, G., Lalmas, M.: INEX: INitiative for the Evaluation of XML Retrieval. In: Baeza-Yates, R., Fuhr, N., Maarek, Y.S. (eds.) Proceedings of the SIGIR 2002 Workshop on XML and Information Retrieval (2002), http://www.is.informatik.uni-duisburg.de/bib/xml/Fuhr_etal_02a.html
Fuhr, N., Gövert, N., Kazai, G., Lalmas, M. (eds.): INitiative for the Evaluation of XML Retrieval (INEX). Proceedings of the First INEX Workshop, Dagstuhl, Germany, December 8-11 (2002); ERCIM Workshop Proceedings, Sophia Antipolis, France, ERCIM (2003), http://www.ercim.org/publication/ws-proceedings/INEX2002.pdf
Gövert, N., Kazai, G.: Overview of the INitiative for the Evaluation of XML retrieval (INEX). In: [6], pp. 1–17 (2003), http://www.ercim.org/publication/ws-proceedings/INEX2002.pdf
Gövert, N., Fuhr, N., Abolhassani, M., Großjohann, K.: Content-oriented XML retrieval with HyREX. In: [6], pp. 26–32 (2003), http://www.ercim.org/publication/ws-proceedings/INEX2002.pdf
Grabs, T., Schek, H.-J.: Flexible Information Retrieval from XML with PowerDBXML. In: [6], pp. 141–148 (2003), http://www.ercim.org/publication/ws-proceedings/INEX2002.pdf
Ogilvie, P., Callan, J.: Language Models and Structure Document Retrieval. In: [6], pp. 33–40 (2003), http://www.ercim.org/publication/ws-proceedings/INEX2002.pdf
Piwowarski, B., Faure, G.-E., Gallinari, P.: Bayesian Networks and INEX. In: [6], pp. 149–154 (2003), http://www.ercim.org/publication/ws-proceedings/INEX2002.pdf
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Abolhassani, M., Fuhr, N. (2004). Applying the Divergence from Randomness Approach for Content-Only Search in XML Documents. In: McDonald, S., Tait, J. (eds) Advances in Information Retrieval. ECIR 2004. Lecture Notes in Computer Science, vol 2997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24752-4_30
Download citation
DOI: https://doi.org/10.1007/978-3-540-24752-4_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21382-6
Online ISBN: 978-3-540-24752-4
eBook Packages: Springer Book Archive