Using String Comparison in Context for Improved Relevance Feedback in Different Text Media

Lam-Adesina, Adenike M.; Jones, Gareth J. F.

doi:10.1007/11880561_19

Adenike M. Lam-Adesina¹⁹ &
Gareth J. F. Jones¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4209))

Included in the following conference series:

International Symposium on String Processing and Information Retrieval

592 Accesses

Abstract

Query expansion is a long standing relevance feedback technique for improving the effectiveness of information retrieval systems. Previous investigations have shown it to be generally effective for electronic text, to give proportionally better improvement for automatic transcriptions of spoken documents, and to be at best of questionable utility for optical character recognized scanned text documents. We introduce two corpus-based methods based on using a string-edit distance measure in context to automatically detect and correct transcription errors. One method operates at query-time and requires no modification of the document index file, and the other at index-time and operates using the standard query-time expansion process. Experimental investigations show these methods to produce improvements in relevance feedback for all three media types, but most significantly mean that relevance feedback can now successfully be applied to scanned text documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lam-Adesina, A.M., Jones, G.J.F.: Examining and Improving the Effectiveness of Relevance Feedback for Retrieval of Scanned Text Documents. Information Processing and Management 43(3), 633–649 (2006)
Article Google Scholar
http://trec.nist.gov
http://ir.nist.gov/ria/
Garafolo, J.S., Auzanne, C.G.P., Voorhees, E.M.: The TREC Spoken Document Retrieval Track: A Success Story. In: Proceedings of the RIAO 2000 Conference: Content-Based Multimedia Information Access, Paris, pp. 1–20 (2000)
Google Scholar
Johnson, S.E., Jourlin, P., Sparck Jones, K., Woodland, P.C.: Spoken Document Retrieval for TREC-8 at Cambridge University. In: Proceedings of the Eighth Text REtrieval Conference (TREC-9), Gaithersburg, MD, pp. 157–168. NIST (2000)
Google Scholar
Gonzalo, J., Clough, P., Vallin, A.: Overview of the CLEF 2005 Interactive Track. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 251–262. Springer, Heidelberg (2006)
Chapter Google Scholar
Kantor, P.B., Voorhees, E.M.: The TREC-5 Confusion Track: Comparing Retrieval Methods for Scanned Text. In: Information Retrieval, vol. 2, pp. 165–176. Kluwer Academic Publishers, Dordrecht (2000)
Google Scholar
Taghva, K., Borsack, J., Condit, A.: Evaluation of Model-Based Retrieval Effectiveness with OCR Text. ACM Transactions on Information Systems 14(1), 64–93 (1996)
Article Google Scholar
Jones, G.J.F., Lam-Adesina, A.M.: An Investigation of Mixed-Media Information Retrieval. In: Proceedings of the 6th European Conference on Research and Development for Digital Libraries, Rome, pp. 463–478. Springer, Heidelberg (2002)
Google Scholar
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M.: Okapi at TREC-3. In: Proceedings of the Third Text REtrieval Conference (TREC-3), pp. 109–126. NIST (1995)
Google Scholar
Lam-Adesina, A.M., Jones, G.J.F.: Applying Summarization Techniques for Term Selection in Relevance Feedback. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, pp. 1–9. ACM Press, New York (2001)
Chapter Google Scholar
Auzanne, C., Garafolo, J.S., Fiscus, J.G., Fisher, W.M.: Automatic Language Model Adaptation for Spoken Document Retrieval. In: Proceedings of the RIAO 2000 Conference: Content-Based Multimedia Information Access, Paris, pp. 1–20 (2000)
Google Scholar
Jones, G.J.F., Han, M.: Information Retrieval from Mixed-Media Collections: Report on Design and Indexing of a Scanned Document Collection. Technical Report 400, Department of Computer Science, University of Exeter (January 2001)
Google Scholar
Mittendorf, E., Schauble, P.: Information Retrieval can Cope with Many Errors. Information Retrieval 3, 189–216 (2000)
Article MATH Google Scholar
Zobel, J., Dart, P.: Phonetic String Mathing: Lessons from Information Retrieval. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, pp. 30–38. ACM Press, New York (1996)
Google Scholar
Singhal, A., Pereira, F.C.N.: Document Expansion for Speech Retrieval. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, pp. 34–41. ACM Press, New York (1999)
Chapter Google Scholar
Tong, X., Evans, D.: A Statistical Approach to Automatic OCR Error Correction in Context. In: Proceedings of the Fourth Workshop on Very Large Corpora, Copenhagen, pp. 88–100 (1996)
Google Scholar
Collins-Thompson, K., Schweizer, C., Dumais, S.: Improved String Matching Under Noisy Channel Conditions. In: Proceedings of the Tenth International Conference on Information and Knowledge Management (CIKM 2001), Atlanta, pp. 357–364. ACM Press, New York (2001)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Digital Video Processing & School of Computing, Dublin City University, Dublin 9, Ireland
Adenike M. Lam-Adesina & Gareth J. F. Jones

Authors

Adenike M. Lam-Adesina
View author publications
You can also search for this author in PubMed Google Scholar
Gareth J. F. Jones
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Information Science, University of Strathclyde, Scotland
Fabio Crestani
Dipartimento di Informatica, University of Pisa, Largo B. Pontecorvo 3, 56127, Pisa, Italy
Paolo Ferragina
Department of Information Studies, University of Sheffield, Sheffield, UK
Mark Sanderson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lam-Adesina, A.M., Jones, G.J.F. (2006). Using String Comparison in Context for Improved Relevance Feedback in Different Text Media. In: Crestani, F., Ferragina, P., Sanderson, M. (eds) String Processing and Information Retrieval. SPIRE 2006. Lecture Notes in Computer Science, vol 4209. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11880561_19

Download citation

DOI: https://doi.org/10.1007/11880561_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45774-9
Online ISBN: 978-3-540-45775-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics