Skip to main content

Using Rich Document Representation in XML Information Retrieval

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4518))

Abstract

In this paper we present a method of document representation called Rich Document Representation (RDR) to build XML retrieval engines with high specificity. RDR is a form of document representation that utilizes single words, phrases, logical terms and logical statements for representing documents. The Vector Space model is used to compute index terms weight and similarity between each element and query. This system has participated in INEX 2006 and tested with the Content Only queries of the given collection. The results have been very weak but a failure analysis has revealed that it has been caused by an error in document processing which has produced inconsistent IDs and caused a mismatch between the IDs assigned to document elements such as single terms, phrases and logical terms. However similar experiment on INEX2004 collection yielded very good precision on high specificity task with s3e123 quantization.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Oroumchian, F., Karimzadegan, M., Habibi, J.: XML Information Retrieval by Means of Plausible Inferences. In: RASC 2004. 5th International Conference on Recent Advances in Soft Computing, RASC, Nottingham, United Kingdom pp. 542–547 (2004)

    Google Scholar 

  2. Fuhr, N., Gövert, N., Kazai, G., Lalmas, M.: INEX: INitiative for the Evaluation of XML retrieval. In: Baeza-Yates, R., Fuhr, N., Maarek, Y. S. (eds.) Proceedings of the SIGIR 2002 Workshop on XML and Information Retrieval (2002)

    Google Scholar 

  3. Salton, G., Allan, J., Buckley, C.: Approaches to passage retrieval in full text information systems. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 49–58 (1993)

    Google Scholar 

  4. Fox, E.A.: Lexical relations: enhancing effectiveness of information retrieval systems. SIGIR Newsletter 15(3), 5–36 (1981)

    Google Scholar 

  5. Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 21–29 (1996)

    Google Scholar 

  6. Evens, M., Vandendrope, J., Wang, Y.-C.: Lexical Relations in Information Retrieval, Human and Machines. In: Proceedings of the 4th Delware Symposium on Language Studies, Albex, Norwood, NJ (1985)

    Google Scholar 

  7. Cohen, P.R., Kjeldsen, R.: Information retrieval by constrained spreading activation in semantic networks., vol. 23, pp. 255–268. Pergamon Press, Inc., Tarrytown, NY, USA (1987)

    Google Scholar 

  8. Collins, A., Michalski, R.: The Logic of Plausible reasoning A Core Theory. Cognitive Science 13, 1–49 (1989)

    Article  Google Scholar 

  9. Van Rijsbergen, C.J.: Logics for Information Retrieval, AL4T 88, Rome (March 1988)

    Google Scholar 

  10. Oroumchian, F., Oddy, R.N.: An application of plausible reasoning to information retrieval. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp.18–22 (August 1996)

    Google Scholar 

  11. Oroumchian, F., Jalali, A.: Rich document representation for document clustering. RIAO 2004 Conference. In: Proceedings Coupling approaches, coupling media and coupling languages for information retrieval, Le Centre de Hautes Etudes Internationalies d’Informatique Documenataire - C.I.D., France, pp. 1–9 (2004)

    Google Scholar 

  12. Greengrass, E.: Information Retrieval: A Survey. DOD Technical Report TR-R52-008-001 (2000)

    Google Scholar 

  13. (Accessed February 11, 2007), http://inex.is.informatik.uni-duisburg.de/2006/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Norbert Fuhr Mounia Lalmas Andrew Trotman

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Raja, F., Keikha, M., Rahgozar, M., Oroumchian, F. (2007). Using Rich Document Representation in XML Information Retrieval. In: Fuhr, N., Lalmas, M., Trotman, A. (eds) Comparative Evaluation of XML Information Retrieval Systems. INEX 2006. Lecture Notes in Computer Science, vol 4518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73888-6_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73888-6_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73887-9

  • Online ISBN: 978-3-540-73888-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics