Skip to main content

EXTIRP 2004: Towards Heterogeneity

  • Conference paper
Advances in XML Information Retrieval (INEX 2004)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3493))

Abstract

The effort around EXTIRP 2004 focused on the heterogeneity of XML document collections. The subcollections of the heterogeneous track (het-track) did not offer us a suitable testbed, but we successfully applied methods independent of any document type to the original INEX test collection. By closing our eyes to the element names defined in the DTD, we created comparable runs and discovered improvement in the results. This was anticipated evidence for our hypothesis that we do not need to know the element names when indexing the collection or when returning full-text answers to the Content-Only type queries. Some problematic areas were also identified. One of them is score combination which enables us to combine elements of any size into one ranked list of results given that we have the relevance scores of the leaf-level elements. However, finding a suitable score combination method remains part of our future work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Fuhr, N., Goevert, N., Kazai, G., Lalmas, M. (eds.): INEX: Evaluation Initiative for XML retrieval - INEX 2002 Workshop Proceedings, Schloss Dagstuhl. DELOS Workshop (2003)

    Google Scholar 

  2. Fuhr, N., Lalmas, M.: Report on the INEX 2003 Workshop. In: SIGIR FORUM, Schloss Dagstuhl, December 15-17, 2003, vol. 38, pp. 42–47 (2004)

    Google Scholar 

  3. Ahonen-Myka, H.: Finding All Frequent Maximal Sequences in Text. In: Mladenic, D., Grobelnik, M. (eds.) Proceedings of the 16th International Conference on Machine Learning ICML 1999 Workshop on Machine Learning in Text Data Analysis, Ljubljana, Slovenia. J. Stefan Institute, pp. 11–17 (1999)

    Google Scholar 

  4. Doucet, A., Aunimo, L., Lehtonen, M., Petit, R.: Accurate Retrieval of XML Document Fragments using EXTIRP. In: INEX 2003 Workshop Proceedings, Schloss Dagstuhl, Germany, pp. 73–80 (2003)

    Google Scholar 

  5. Ramaswamy, L., Iyengar, A., Liu, L., Douglis, F.: Automatic detection of fragments in dynamically generated web pages. In: 13th World Wide Web Conference (WWW 2004), pp. 443–454 (2004)

    Google Scholar 

  6. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46, 604–632 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  7. Chakrabarti, S.: Integrating the document object model with hyperlinks for enhanced topic distillation and information extraction. In: Proceedings of the tenth international conference on World Wide Web, pp. 211–220. ACM Press, New York (2001)

    Chapter  Google Scholar 

  8. Hoi, K.K., Lee, D.L., Xu, J.: Document visualization on small displays. In: Chen, M.-S., Chrysanthis, P.K., Sloman, M., Zaslavsky, A. (eds.) MDM 2003. LNCS, vol. 2574, pp. 262–278. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  9. Abolhassani, M., Fuhr, N., Malik, S.: HyREX at INEX 2003. In: INEX 2003 Workshop Proceedings, pp. 49–56 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lehtonen, M. (2005). EXTIRP 2004: Towards Heterogeneity. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds) Advances in XML Information Retrieval. INEX 2004. Lecture Notes in Computer Science, vol 3493. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11424550_30

Download citation

  • DOI: https://doi.org/10.1007/11424550_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26166-7

  • Online ISBN: 978-3-540-32053-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics