Skip to main content

XML Retrieval Using Pruned Element-Index Files

  • Conference paper
Advances in Information Retrieval (ECIR 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5993))

Included in the following conference series:

Abstract

An element-index is a crucial mechanism for supporting content-only (CO) queries over XML collections. A full element-index that indexes each element along with the content of its descendants involves a high redundancy and reduces query processing efficiency. A direct index, on the other hand, only indexes the content that is directly under each element and disregards the descendants. This results in a smaller index, but possibly in return to some reduction in system effectiveness. In this paper, we propose using static index pruning techniques for obtaining more compact index files that can still result in comparable retrieval performance to that of a full index. We also compare the retrieval performance of these pruning based approaches to some other strategies that make use of a direct element-index. Our experiments conducted along with the lines of INEX evaluation framework reveal that pruned index files yield comparable to or even better retrieval performance than the full index and direct index, for several tasks in the ad hoc track.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altingovde, I.S., Ozcan, R., Ulusoy, Ö.: A practitioner’s guide for static index pruning. In: Proc. of ECIR 2009, pp. 675–679 (2009)

    Google Scholar 

  2. Altingovde, I.S., Ozcan, R., Ulusoy, Ö.: Exploiting Query Views for Static Index Pruning in Web Search Engines. In: Proc. of CIKM 2009, pp. 1951–1954 (2009)

    Google Scholar 

  3. Blanco, R., Barreiro, A.: Boosting static pruning of inverted files. In: Proc. of SIGIR 2007, pp. 777–778 (2007)

    Google Scholar 

  4. Büttcher, S., Clarke, C.L.: A document-centric approach to static index pruning in text retrieval systems. In: Proc. of CIKM 2006, pp. 182–189 (2006)

    Google Scholar 

  5. Carmel, D., Cohen, D., Fagin, R., Farchi, E., Herscovici, M., Maarek, Y.S., Soffer, A.: Static index pruning for information retrieval systems. In: Proc of SIGIR 2001, pp. 43–50 (2001)

    Google Scholar 

  6. de Moura, E.S., Santos, C.F., Araujo, B.D., Silva, A.S., Calado, P., Nascimento, M.A.: Locality-Based pruning methods for web search. ACM TOIS 26(2), 1–28 (2008)

    Article  Google Scholar 

  7. Denoyer, L., Gallinari, P.: The Wikipedia XML Corpus. SIGIR Forum 40(1), 64–69 (2006)

    Article  Google Scholar 

  8. Garcia, S.: Search Engine Optimization Using Past Queries. Doctoral Thesis, RMIT (2007)

    Google Scholar 

  9. Geva, S.: GPX – Gardens Point XML Information Retrieval at INEX 2004. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 211–223. Springer, Heidelberg (2005)

    Google Scholar 

  10. Geva, S.: GPX – Gardens Point XML IR at INEX 2005. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 240–253. Springer, Heidelberg (2006)

    Google Scholar 

  11. Geva, S.: GPX – Gardens Point XML IR at INEX 2006. In: Proc. of INEX 2006 Workshop, pp. 137–150 (2006)

    Google Scholar 

  12. Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: Ranked Keyword Search Over XML Documents. In: Proc. of the ACM SIGMOD 2003, pp. 16–27 (2003)

    Google Scholar 

  13. Initiative for the Evaluation of XML Retrieval (2009), http://www.inex.otago.ac.nz/

  14. Itakura, K.Y., Clarke, C.L.A.: University of Waterloo at INEX 2008: Adhoc, Book, and Link-the-Wiki Tracks. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2008. LNCS, vol. 5631, pp. 132–139. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  15. Kamps, J., Geva, S., Trotman, A., Woodley, A., Koolen, M.: Overview of the 2008 Ad Hoc Track. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2008. LNCS, vol. 5631, pp. 1–28. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  16. Lalmas, M.: XML Retrieval. Morgan & Claypool, San Francisco (2009)

    Google Scholar 

  17. Ntoulas, A., Cho, J.: Pruning policies for two-tiered inverted index with correctness guarantee. In: Proc. of SIGIR 2007, pp. 191–198 (2007)

    Google Scholar 

  18. Sigurbjörnsson, B., Kamps, J.: The effect of structured queries and selective indexing on XML retrieval. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 104–118. Springer, Heidelberg (2006)

    Google Scholar 

  19. Skobeltsyn, G., Junqueira, F., Plachouras, V., Baeza-Yates, R.: ResIn: a combination of results caching and index pruning for high-performance web search engines. In: Proc. of SIGIR 2008, pp. 131–138 (2008)

    Google Scholar 

  20. Su-Cheng, H., Chien-Sing, L.: Node Labeling Schemes in XML Query Optimization: A Survey and Trends. IETE Tech Rev 26, 88–100 (2009)

    Article  Google Scholar 

  21. Zettair search engine (2009), http://www.seg.rmit.edu.au/zettair/

  22. Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Computing Surveys 38(2), 1–56 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Altingovde, I.S., Atilgan, D., Ulusoy, Ö. (2010). XML Retrieval Using Pruned Element-Index Files. In: Gurrin, C., et al. Advances in Information Retrieval. ECIR 2010. Lecture Notes in Computer Science, vol 5993. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12275-0_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12275-0_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12274-3

  • Online ISBN: 978-3-642-12275-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics