Abstract
In this paper we present a method of document representation called Rich Document Representation (RDR) to build XML retrieval engines with high specificity. RDR is a form of document representation that utilizes single words, phrases, logical terms and logical statements for representing documents. The Vector Space model is used to compute index terms weight and similarity between each element and query. This system has participated in INEX 2006 and tested with the Content Only queries of the given collection. The results have been very weak but a failure analysis has revealed that it has been caused by an error in document processing which has produced inconsistent IDs and caused a mismatch between the IDs assigned to document elements such as single terms, phrases and logical terms. However similar experiment on INEX2004 collection yielded very good precision on high specificity task with s3e123 quantization.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Oroumchian, F., Karimzadegan, M., Habibi, J.: XML Information Retrieval by Means of Plausible Inferences. In: RASC 2004. 5th International Conference on Recent Advances in Soft Computing, RASC, Nottingham, United Kingdom pp. 542–547 (2004)
Fuhr, N., Gövert, N., Kazai, G., Lalmas, M.: INEX: INitiative for the Evaluation of XML retrieval. In: Baeza-Yates, R., Fuhr, N., Maarek, Y. S. (eds.) Proceedings of the SIGIR 2002 Workshop on XML and Information Retrieval (2002)
Salton, G., Allan, J., Buckley, C.: Approaches to passage retrieval in full text information systems. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 49–58 (1993)
Fox, E.A.: Lexical relations: enhancing effectiveness of information retrieval systems. SIGIR Newsletter 15(3), 5–36 (1981)
Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 21–29 (1996)
Evens, M., Vandendrope, J., Wang, Y.-C.: Lexical Relations in Information Retrieval, Human and Machines. In: Proceedings of the 4th Delware Symposium on Language Studies, Albex, Norwood, NJ (1985)
Cohen, P.R., Kjeldsen, R.: Information retrieval by constrained spreading activation in semantic networks., vol. 23, pp. 255–268. Pergamon Press, Inc., Tarrytown, NY, USA (1987)
Collins, A., Michalski, R.: The Logic of Plausible reasoning A Core Theory. Cognitive Science 13, 1–49 (1989)
Van Rijsbergen, C.J.: Logics for Information Retrieval, AL4T 88, Rome (March 1988)
Oroumchian, F., Oddy, R.N.: An application of plausible reasoning to information retrieval. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp.18–22 (August 1996)
Oroumchian, F., Jalali, A.: Rich document representation for document clustering. RIAO 2004 Conference. In: Proceedings Coupling approaches, coupling media and coupling languages for information retrieval, Le Centre de Hautes Etudes Internationalies d’Informatique Documenataire - C.I.D., France, pp. 1–9 (2004)
Greengrass, E.: Information Retrieval: A Survey. DOD Technical Report TR-R52-008-001 (2000)
(Accessed February 11, 2007), http://inex.is.informatik.uni-duisburg.de/2006/
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Raja, F., Keikha, M., Rahgozar, M., Oroumchian, F. (2007). Using Rich Document Representation in XML Information Retrieval. In: Fuhr, N., Lalmas, M., Trotman, A. (eds) Comparative Evaluation of XML Information Retrieval Systems. INEX 2006. Lecture Notes in Computer Science, vol 4518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73888-6_29
Download citation
DOI: https://doi.org/10.1007/978-3-540-73888-6_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73887-9
Online ISBN: 978-3-540-73888-6
eBook Packages: Computer ScienceComputer Science (R0)