Advertisement

A Language-Independent Approach to European Text Retrieval

  • Paul McNamee
  • James Mayfield
  • Christine Piatko
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2069)

Abstract

We present an approach to multilingual information retrieval that does not depend on the existence of specific linguistic resources such as stemmers or thesauri. Using the HAIRCUT system we participated in the monolingual, bilingual, and multilingual tasks of the CLEF-2000 evaluation. Our approach, based on combining the benefits of words and character n-grams, was effective for both language-independent monolingual retrieval as well as for cross-language retrieval using translated queries. After describing our monolingual retrieval approach we compare a translation method using aligned parallel corpora to commercial machine translation software.

Keywords

Machine Translation Average Precision Relevance Feedback Parallel Corpus Query Translation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    M. Braschler, M-Y. Kan, and P. Schauble, ‘The SPIDER Retrieval System and the TREC-8 Cross-Language Track.’ In E. M. Voorhees and D. K. Harman, eds., Proceedings of the Eighth Text REtrieval Conference (TREC-8). To appear.Google Scholar
  2. 2.
    K. W. Church and P. Hanks, ‘Word Association Norms, Mutual Information, and Lexicography.’ In Computational Linguistics, 6(1), 22–29, 1990.Google Scholar
  3. 4.
    D. Hiemstra and A. de Vries, ‘Relating the new language models of information retrieval to the traditional retrieval models.’ CTIT Technical Report TR-CTIT-00-09, May 2000.Google Scholar
  4. 5.
    T. K. Landauer and M. L. Littman, ‘Fully automated cross-language document retrieval using latent semantic indexing.’ In the Proceedings of the Sixth Annual Conference of the UW Centre for the New Oxford English Dictionary and Text Research. 31–38, 1990.Google Scholar
  5. 6.
    Linguistic Data Consortium (LDC), http://www.ldc.upenn.edu
  6. 7.
    J. Mayfield and P. McNamee, ‘Indexing Using Both N-grams and Words.’ E. M. Voorhees and D. K. Harman, eds., Proceedings of the Seventh Text REtrieval Conference (TREC-7), NIST Special Publication 500-242, August 1999.Google Scholar
  7. 8.
    J. Mayfield, P. McNamee, and C. Piatko, ‘The JHU/APL HAIRCUT System at TREC-8.’ In E. M. Voorhees and D. K. Harman, eds., Proceedings of the Eighth Text REtrieval Conference (TREC-8). To appear.Google Scholar
  8. 9.
    D. R. H. Miller, T. Leek, and R. M. Schwartz, ‘A Hidden Markov Model Information Retrieval System.’ In the Proceedings of the 22nd International Conference on Research and Development in Information Retrieval (SIGIR-99), pp. 214–221, August 1999.Google Scholar
  9. 10.
    E. Miller, D. Shen, J. Liu, and C. Nicholas, ‘Performance and Scalability of a Large-Scale N-gram Based Information Retrieval System.’ In the Journal of Digital Information, 1(5), January 2000.Google Scholar
  10. 11.
    J. Ponte and W. B. Croft, ‘A Language Modeling Approach to Information Retrieval.’ In the Proceedings of the 21st International Conference on Research and Development in Information Retrieval (SIGIR-98), pp. 275–281, August 1998.Google Scholar
  11. 12.
    Recherche Appliquée en Linguistic (RALI), http://www-rali.iro.umontreal.ca

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Paul McNamee
    • 1
  • James Mayfield
    • 1
  • Christine Piatko
    • 1
  1. 1.Johns Hopkins University Applied Physics LabMDUSA

Personalised recommendations