IBERAMIA 2008: Advances in Artificial Intelligence – IBERAMIA 2008 pp 362-371 | Cite as
Text Retrieval through Corrupted Queries
Conference paper
Abstract
Our work relies on the design and evaluation of experimental information retrieval systems able to cope with textual misspellings in queries. In contrast to previous proposals, commonly based on the consideration of spelling correction strategies and a word language model, we also report on the use of character n-grams as indexing support.
Keywords
Degraded text information retrievalPreview
Unable to display preview. Download preview PDF.
References
- 1.Amati, G., van Rijsbergen, C.-J.: Probabilistic models of Information Retrieval based on measuring divergence from randomness. ACM Transactions on Information Systems 20(4), 357–389 (2002)CrossRefGoogle Scholar
- 2.Cross-Language Evaluation Forum (visited, July 2008), http://www.clef-campaign.org
- 3.Collins-Thompson, K., Schweizer, C., Dumais, S.: Improved string matching under noisy channel conditions. In: Proc. of the 10th Int. Conf. on Information and Knowledge Management, pp. 357–364 (2001)Google Scholar
- 4.Damerau, F.: A technique for computer detection and correction of spelling errors. Communications of the ACM 7(3) (March 1964)Google Scholar
- 5.Graña, J., Alonso, M.A., Vilares, M.: A common solution for tokenization and part-of-speech tagging: One-pass Viterbi algorithm vs. iterative approaches. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2002. LNCS (LNAI), vol. 2448, pp. 3–10. Springer, Heidelberg (2002)CrossRefGoogle Scholar
- 6.Lam-Adesina, A.M., Jones, G.J.F.: Examining and improving the effectiveness of relevance feedback for retrieval of scanned text documents. Information Processing Management 42(3), 633–649 (2006)CrossRefGoogle Scholar
- 7.McNamee, P., Mayfield, J.: Character N-gram Tokenization for European Language Text Retrieval. Information Retrieval 7(1-2), 73–97 (2004)CrossRefGoogle Scholar
- 8.McNamee, P., Mayfield, J.: jhu/apl experiments in tokenization and non-word translation. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 85–97. Springer, Heidelberg (2004)Google Scholar
- 9.Mittendorf, E., Schauble, P.: Measuring the effects of data corruption on information retrieval. In: Symposium on Document Analysis and Information Retrieval, p. XX (1996)Google Scholar
- 10.Mittendorf, E., Schäuble, P.: Information retrieval can cope with many errors. Information Retrieval 3(3), 189–216 (2000)MATHCrossRefGoogle Scholar
- 11.Mittendorfer, M., Winiwarter, W.: A simple way of improving traditional ir methods by structuring queries. In: Proc. of the 2001 IEEE Int. Workshop on Natural Language Processing and Knowledge Engineering (NLPKE 2001) (2001)Google Scholar
- 12.Mittendorfer, M., Winiwarter, W.: Exploiting syntactic analysis of queries for information retrieval. Data & Knowledge Engineering 42(3), 315–325 (2002)MATHCrossRefGoogle Scholar
- 13.Nardi, A., Peters, C., Vicedo, J.L.: Results of the CLEF 2006 Cross-Language System Evaluation Campaign, Working Notes of the CLEF 2006 Workshop, Alicante, Spain, September 20-22 (2006) [2] Google Scholar
- 14.Otero, J., Graña, J., Vilares, M.: Contextual Spelling Correction. In: Moreno Díaz, R., Pichler, F., Quesada Arencibia, A. (eds.) EUROCAST 2007. LNCS, vol. 4739, pp. 290–296. Springer, Heidelberg (2007)CrossRefGoogle Scholar
- 15.Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)Google Scholar
- 16.Ruch, P.: Using contextual spelling correction to improve retrieval effectiveness in degraded text collections. In: Proc. of the 19th Int. Conf. on Computational Linguistics, pp. 1–7 (2002)Google Scholar
- 17.Savary, A.: Typographical nearest-neighbor search in a finite-state lexicon and its application to spelling correction. In: Watson, B.W., Wood, D. (eds.) CIAA 2001. LNCS, vol. 2494, pp. 251–260. Springer, Heidelberg (2003)CrossRefGoogle Scholar
- 18.Taghva, K., Borsack, J., Condit, A.: Results of applying probabilistic ir to ocr text. In: Proc. of the 17th Int. ACM SIGIR Conf. on Research and Development in Information Retrieval. Performance Evaluation, pp. 202–211 (1994)Google Scholar
- 19.Takasu, A.: An approximate multi-word matching algorithm for robust document retrieval. In: CIKM 2006: Proc. of the 15th ACM Int. Conf. on Information and Knowledge Management, pp. 34–42 (2006)Google Scholar
- 20.http://ir.dcs.gla.ac.uk/terrier/ (visited, July 2008)
- 21.Vilares, M., Otero, J., Graña, J.: On asymptotic finite-state error repair. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 271–272. Springer, Heidelberg (2004)Google Scholar
- 22.Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Trans. Information Theory IT-13, 260–269 (1967)CrossRefGoogle Scholar
- 23.Véronis, J.: Multext-corpora: An annotated corpus for five European languages. cd-rom, Distributed by elra/elda (1999)Google Scholar
Copyright information
© Springer-Verlag Berlin Heidelberg 2008