Filtering and Clustering XML Retrieval Results

Kamps, Jaap; Koolen, Marijn; Sigurbjörnsson, Börkur

doi:10.1007/978-3-540-73888-6_13

Jaap Kamps^1,2,
Marijn Koolen¹ &
Börkur Sigurbjörnsson^2,3

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4518))

Included in the following conference series:

International Workshop of the Initiative for the Evaluation of XML Retrieval

627 Accesses
7 Citations

Abstract

As part of the INEX 2006 Adhoc Track, we conducted a range of experiments with filtering and clustering XML element retrieval results. Our basic retrieval engine retrieves arbitrary elements from the collection (corresponding to the Thorough Task). These runs are filtered to remove textual overlap between elements (corresponding to the Focused Task). The resulting runs can be clustered per article (corresponding to the All in Context Task). Finally, we select the “best” element for each article (corresponding to the Best in Context Task). Our main findings are the following. First, a complete element index outperforms a restricted index based on section-structure, albeit the differences are small. Second, grouping non-overlapping elements per article does not lead to performance degradation, but may improve scores. Third, all restrictions of the “pure” element runs (by removing overlap, by grouping elements per article, or by selecting a single element per article) lead to some but only moderate loss of precision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Denoyer, L., Gallinari, P.: The Wikipedia XML Corpus. SIGIR Forum 40, 64–69 (2006)
Article Google Scholar
Wikipedia: The free encyclopedia (2006), http://en.wikipedia.org/
Sigurbjörnsson, B., Kamps, J., de Rijke, M.: An Element-Based Approch to XML Retrieval. In: INEX, Workshop Proceedings, pp. 19–26 (2003)
Google Scholar
Sigurbjörnsson, B., Kamps, J., de Rijke, M.: Mixture models, overlap, and structural hints in XML element retreival. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 196–210. Springer, Heidelberg (2005)
Google Scholar
Sigurbjörnsson, B., Kamps, J.: The effect of structured queries and selective indexing on XML retrieval. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 104–118. Springer, Heidelberg (2006)
Chapter Google Scholar
Hiemstra, D.: Using Language Models for Information Retrieval. PhD thesis, University of Twente (2001)
Google Scholar
Sigurbjörnsson, B.: Focused Information Access using XML Element Retrieval. SIKS dissertation series 2006-28, University of Amsterdam (2006)
Google Scholar
Kamps, J., de Rijke, M., Sigurbjörnsson, B.: The importance of length normalization for XML retrieval. Information Retrieval 8, 631–654 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Archives and Information Science, Faculty of Humanities, University of Amsterdam,
Jaap Kamps & Marijn Koolen
ISLA, Faculty of Science, University of Amsterdam,
Jaap Kamps & Börkur Sigurbjörnsson
Yahoo! Research, Barcelona,
Börkur Sigurbjörnsson

Authors

Jaap Kamps
View author publications
You can also search for this author in PubMed Google Scholar
Marijn Koolen
View author publications
You can also search for this author in PubMed Google Scholar
Börkur Sigurbjörnsson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Norbert Fuhr Mounia Lalmas Andrew Trotman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kamps, J., Koolen, M., Sigurbjörnsson, B. (2007). Filtering and Clustering XML Retrieval Results. In: Fuhr, N., Lalmas, M., Trotman, A. (eds) Comparative Evaluation of XML Information Retrieval Systems. INEX 2006. Lecture Notes in Computer Science, vol 4518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73888-6_13

Download citation

DOI: https://doi.org/10.1007/978-3-540-73888-6_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73887-9
Online ISBN: 978-3-540-73888-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics