Skip to main content

Multilingual Information Access

  • Chapter
  • First Online:
Lectures on Information Retrieval (ESSIR 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1980))

Included in the following conference series:

Abstract

The global information society has radically changed the way in which know-ledge is acquired, disseminated and exchanged. Users of internationally distributed networks need to be able to find, retrieve and understand relevant information in whatever language and form it may have been stored. For this reason, much attention has been given over the past few years to the study and development of tools and technologies for multilingual information access (MLIA). This is a complex, multidisciplinary area in which methodologies and tools developed in the fields of information retrieval and natural language processing converge. Two main sectors are involved: multiple language recognition, manipulation and display; cross-language search and retrieval. The paper provides an overview of the main issues of interest in both these areas. Topics covered include: multilingual document indexing, specific requirements of particular languages and scripts, techniques for cross-language information retrieval (CLIR), resources, and system and component evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adriani, M., van Rijsbergen, C.J.: Term Similarity-Based Query Expansion for Cross-Language Information Retrieval. In Lecture Notes in Computer Science, Volume 1696, 1999.

    Google Scholar 

  2. Ballerini, J.P., Buchel, M., Domenig, R., Knaus, D., Mateev, B., Mittendorf, E., Schäuble, P., Sheridan, P., Wechsler, M.: SPIDER Retrieval System at TREC-5. In Proceedings of the Fifth Text Retrieval Conference TREC-5, National Institute of Standards and Technology (NIST), Gaithersburg, MD, 1996.

    Google Scholar 

  3. Ballesteros, L.: Cross-Language Retrieval via Transitive Translation. In Croft, W.B. (ed.): Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval, Kluwer Academic Publishers, Boston, 2000.

    Google Scholar 

  4. Ballestreros, L., Croft, W.B.: Resolving Ambiguity for Cross-language Retrieval. In Proceedings of the 20th International ACM SIGIR Conference on Research and Development in Information Retrieval, Philadelphia, PA, 84–91, 1997.

    Google Scholar 

  5. Ballesteros, L., Croft, W.B.: Dictionary-based methods for cross-lingual information retrieval. In Proceedings of the 7th International DEXA Conference on Database and Expert Systems Applications, 791–801, 1996.

    Google Scholar 

  6. Ballesteros, L., Croft, W.B.: Phrasal Translation and Query Expansion Techniques for Cross-Language Information Retrieval. In Working Notes of AAAI Spring Symposium on Cross-Language Text and Speech Retrieval, CA, 1–8, 1997.

    Google Scholar 

  7. Blasband, M., Paroubek, P. (eds.): A Blueprint for a General Infrastructure for Natural Language Processing Systems Evaluation. Deliverable 1.1 of the ELSE project: http://www.limsi.fr/TLP/ELSE/ELSED11EN.HTM

  8. Braschler, M., Kluck, M., Harman, D., Peters, C., Schäuble, P.: The Evaluation of Systems for Cross-Language Information Retrieval. In Gavrilidou, M., Carayannis, G., Markantonatou, S., Piperidis, S., Stainhaouer, G. (eds.) Proceedings of First International Conference on Language Resources and Evaluation, Athens, Greece, 31 May-2 June 2000, 1469–1474. See also: http://www.iei.pi.cnr.it/DELOS/CLEF/

  9. Braschler, M., Krause, J., Peters, P., Schäuble, P.: Cross-Language Information Retrieval (CLIR) Track Overview, In Proceedings of the Seventh Text Retrieval Conference (TREC-7). NIST, Gaithersburg, MD, 1999.

    Google Scholar 

  10. Brown, M., Foote, J.T., Jones, G.J.F., Sparck-Jones, K., Young, S.J.: Video Mail Retrieval by Voice: An Overview of the Cambridge/Olivetti Retrieval System. In Multimedia Data Base Management Systems Workshop, 2nd ACM International Conference on Multimedia, 1994.

    Google Scholar 

  11. Brown, M., Foote, J., Jones, G., Jones, K.S., Young, S.: Open-vocabulary Speech Indexing for Voice and Video Mail Retrieval. In Proceedings of the ACM Mul timedia Conference, Boston, MA, 1996.

    Google Scholar 

  12. Cavnar, W., Trenkle, J.: N-gram Based Text Categorization, In Proceedings of the 17th International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, 161–169, 1994.

    Google Scholar 

  13. Chaudiron, S., Schmitt, L.: AMARYLLIS: An Evaluation-based Program for Text Retrieval in French. In Jacquemin, C., Mariani, J. Paroubek, P. (eds.) Using Evaluation within HLT Programmes: Results and Trends. Workshop Proceedings. LREC 2000, 30 May 2000, Athens, Greece: http://www.inist.fr/accueil/profran.htm

  14. Damashek, M.: Guaging Similarity with N-grams: Language-independent Categorization of Text. Science, 267(10), 1995.

    Google Scholar 

  15. Dunning, T.: Statistical Identification of Language. CRL Technical Memo MCCS-94–273, Computing Research Laboratory, New Mexico State University, 1994.

    Google Scholar 

  16. EMIR Consortium: Final report of the EMIR Project Number 5312. Commission of the European Union, Brussels, 1994.

    Google Scholar 

  17. Frakes, W.B., Baeza-Yates, R.: Information Retrieval: Data Structures and Algorithms, Prentice-Hall, 1992.

    Google Scholar 

  18. Gachot, D.A., Lange, E., Yang, J.: The SYSTRAN NLP Browser: An Application of Machine Translation Technology in Cross-Language Information Retrieval. In: [21, p. 105–118], 1998.

    Google Scholar 

  19. Glavitsch, U., Schäuble P.: A System for Retrieving Speech Documents. In Proceedings of the 15th International ACM SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, Denmark, 168–176, 1992.

    Google Scholar 

  20. Glavitsch, U., Schäuble, P., Wechsler, M.: Metadata for Integrating Speech Documents in a Text Retrieval System. SIGMOD Record, 23(4):57–63, 1994.

    Article  Google Scholar 

  21. Grefenstette, G. (ed.): Cross-Language Information Retrieval, The Kluwer International Series on Information Retrieval, Kluwer Academic Publishers, Boston, 1998.

    Google Scholar 

  22. Harman, D.: A Failure Analysis on the Limitations of Suffxing in an Online Environment. In Proceedings of the 10th International ACM SIGIR Conference on Research and Development in Information Retrieval, 102–108, 1987.

    Google Scholar 

  23. Harman, D.: How Effective is Suffxing? Journal of the American Society for Information Science, 42(1):321–331, 1991.

    Article  Google Scholar 

  24. Hovy, E., Ide, N., Frederking, R. (eds.): Multilingual Information Management: Current Levels and Future Abilities, NSF/EC/DARPA, April 1999. See: http://www.cs.cmu.edu/~ref/mlim/index.html

  25. Hull, D., Grefenstette, G.: Stemming Algorithms-A Case Study for Detailed Evaluation. Journal of the American Society for Information Science, 47(1):70–84, 1996.

    Article  Google Scholar 

  26. Hull, D.A., Grefenstette, G.: Querying Across Languages. A Dictionary-based Approach to Multilingual Information Retrieval. In Proceedings of the 19th International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, 49–57, 1996.

    Google Scholar 

  27. ISO/IEC International Standard 10646-1:1993(E): Information technology Universal Multiple-Octet Coded Character Set (UCS)-Part 1: Architecture and Basic Multilingual Plane. International Organization for Standardization, Geneva 1993.

    Google Scholar 

  28. ISO Standard 5964-1985: Guidelines for the establishment and development of multilingual thesauri. First edition 1985–02–15. International Organisation for Standardisation, Technical Committee ISO/TC 46.

    Google Scholar 

  29. James, D.: A System for Unrestricted Topic Retrieval from Radio Broadcasts. In Proceedings of ICASSP, Atlanta, GA, 279–282, 1996.

    Google Scholar 

  30. Jones, G., Foote, J., Jones, K.S., Young, S.: Retrieving Spoken Documents by Combining Multiple Index Sources. In Proceedings of the 19th International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, 30–38, 1996.

    Google Scholar 

  31. Kando, N., Kuriyama, K., Nozue, T., Eguchi, K., Kato, H., Hidaka, S., Adachi, J.: The NTCIR Workshop: the First Evaluation Workshop on Japanese Text Retrieval and Cross-Lingual Information Retrieval. International Workshop on Information Retrieval with Asian Languages, Nov. 11-12 1999, Taipei, Taiwan 1999.

    Google Scholar 

  32. Kikui, G.: Identifying the Coding System and Language of On-line Documents on the Internet. In Proceedings of the Sixteenth International Conference on Computational Linguistics: COLING’96, Copenhagen, Denmark, 1996.

    Google Scholar 

  33. Krovetz, R.: Viewing Morphology as an Inference Process. In Proceedings of the 16th International ACM SIGIR Conference on Research and Development in Information Retrieval, Pittsburgh, PA, 191–202, 1993.

    Google Scholar 

  34. Lennon, M., Pierce, D., Tarry, B., Willet, P.: An Evaluation of some Conflation Algorithms for Information Retrieval. Journal of Information Science, 3:177–183, 1981.

    Article  Google Scholar 

  35. Littman, M.L., Dumais, S.T., Landauer, T.K.: Automatic Cross-Language Information Retrieval using Latent Semantic Indexing. In Grefenstette, G. (ed.): Cross-Language Information Retrieval, The Kluwer International Series on Information Retrieval, Kluwer Academic Publishers, Boston, pp 51–62, 1998.

    Google Scholar 

  36. Lovins, J.: Development of a Stemming Algorithm. Mechanical Translation and Computational Linguistics, 11:22–31, 1968.

    Google Scholar 

  37. Miller, G.: WordNet: An On-line Lexical Database, International Journal of Lexicography, Special Issue, 3(4), 1990.

    Google Scholar 

  38. Mittendorf, E., Schäuble, P., Sheridan, P.: Applying Probabilistic Term Weighting to OCR Text in the case of a Large Alphabetic Library Catalogue. In Proceedings of the 18th International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA, 328–335, 1995.

    Google Scholar 

  39. Oard, D.W.: Web Language Distribution. Web site for Research Resources on Cross-Language Text Retrieval. See: http://www.clis.umd.edu/dlrg/filter/papers/

  40. Pevzner, B.: Comparative Evaluation of the Operation of the Russian and English variants of the Pusto-Nepusto-2 System. Automatic Documentation and Mathematical Linguistics, 6:71–74, 1972.

    Google Scholar 

  41. Picchi, E., Peters, C.: Cross-Language Information Retrieval: A System for Comparable Corpus Querying. In Grefenstette, G. (ed.): Cross-Language Information Retrieval, The Kluwer International Series on Information Retrieval, Kluwer Academic Publishers, Boston, 81–92, 1998.

    Google Scholar 

  42. Porter, M.F.: An Algorithm for Suffix Stripping. Program, 14(3):130–137, 1980.

    Google Scholar 

  43. Salton, G.: Automatic Processing of Foreign Language Documents. Prentice-Hill, Englewood Cliffs, NJ 1971.

    Google Scholar 

  44. Schäuble, P., Sheridan, P.: Cross-Language Information Retrieval (CLIR) Track Overview. In Proceedings of the Sixth Text Retrieval Conference (TREC-6). NIST, Gaithersburg, MD, 1998.

    Google Scholar 

  45. Schäuble, P., Smeaton, A.: An International Research Agenda for Digital Libraries: Summary Report of the Series of Joint NSF-EU Working Groups on Future Directions for Digital Libraries Research, 1998. See: http://www.iei.pi.cnr.it/DELOS/NSF/nsf.htm

  46. Schäuble, P.: Multimedia Information Retrieval: Content-Based Information Retrieval from Large Text and Audio Databases. Kluwer Academic Publishers, 1997.

    Google Scholar 

  47. Sheridan, P., Wechsler, M., Schäuble, P.: Cross-Language Speech Retrieval: Establishing a Baseline Performance. In Proceedings of the 20th International ACM SIGIR Conference on Research and Development in Information Retrieval, Philadelphia, PA, 1997.

    Google Scholar 

  48. Sheridan, P., Ballerini, J.P.: Experiments in Multilingual Information Retrieval using the SPIDER System, In Proceedings of the 19th International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp 58–65, 1996.

    Google Scholar 

  49. Sheridan, P., Braschler, M., Schäuble, P.: Cross-Language Information Retrieval in a Multilingual Legal Domain. In Proceedings of the 1st European Conference on Digital Libraries, ECDL’97, Pisa, Italy, pp 253–268, 1997.

    Google Scholar 

  50. Sibun, P., Reynar, J.: Language Identification: Examining the Issues. In Proceedings of the Symposium on Document Anal ysis and Information Retrieval, Las Vegas, 125–135, 1996.

    Google Scholar 

  51. Soergel, D.: Multilingual Thesauri in Cross-Language Text and Speech Retrieval. In Working Notes of AAAI Spring Symposium on Cross-Language Text and Speech Retrieval, Stanford, CA, 164–170, 1997.

    Google Scholar 

  52. Souter, C., Churcher, G., Hayes, J., Johnson, S.: Natural Language Identification using Corpus-based Models. Hermes Journal of Linguistics, 13:183–203, Faculty of Modern Languages, Aarhus School of Business, Denmark, 1994.

    Google Scholar 

  53. Unicode Consortium: The Unicode Standard Worldwide Character Encoding. Version 1.0. Vols. 1 and 2, Addison-Wesley 1991.

    Google Scholar 

  54. van Rijsbergen, C.J.: Information Retrieval. Butterworths, London, second edition, 1979.

    Google Scholar 

  55. Wechsler, M., Schäuble, P.: Speech Retrieval Based on Automatic Indexing. In Ruthven I. (ed.), Proceedings of the Final Workshop on Mul timedia Information Retrieval (MIRO’95), Electronic Workshop in Computing, Glasgow, Springer, 1995.

    Google Scholar 

  56. Wechsler, M., Sheridan, P., Schäuble, P.: Multi-Language Text Indexing for Internet Retrieval. In Proceedings of the 5th RIAO Conference, Computer-Assisted Information Searching on the Internet, Montreal, Canada, June 1997.

    Google Scholar 

  57. Wactlar, H., Kanade, T., Smith, M., Stevens, S.: Intelligent Access to Digital Video: The Informedia Project. IEEE Computer, 29(5), 1996.

    Google Scholar 

  58. White, John (ed.): Evaluation and Assessment Techniques. In Hovy, E., Ide, N., Frederking, R. (eds.): Multilingual Information Management: Current Levels and Future Abilities: http://www.cs.cmu.edu/~ref/mlim/chapter8.html

  59. Ziegler, D.: The Automatic Identification of Languages Using Linguistic Recognition Signals. PhD Thesis, State University of New York, Buffalo, 1991.

    Google Scholar 

  60. Oard, D.W.: Web site for Cross-Language Information Retrieval Resources, http://www.ee.umd.edu/medlab/mlir/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Peters, C., Sheridan, P. (2000). Multilingual Information Access. In: Agosti, M., Crestani, F., Pasi, G. (eds) Lectures on Information Retrieval. ESSIR 2000. Lecture Notes in Computer Science, vol 1980. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45368-7_3

Download citation

  • DOI: https://doi.org/10.1007/3-540-45368-7_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41933-4

  • Online ISBN: 978-3-540-45368-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics