Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Cross-Language Mining and Retrieval

  • Wei Gao
  • Cheng Niu
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_89

Synonyms

Cross-language informational retrieval; Cross-language text mining; Cross-language web mining; Translingual information retrieval

Definition

Cross-language mining is a task of text mining dealing with the extraction of entities and their counterparts expressed in different languages. The interested entities may be of various granularities from acronyms, synonyms, cognates, proper names to comparable or parallel corpora. Cross-Language Information Retrieval (CLIR) is a sub-field of information retrieval dealing with the retrieval of documents across language boundaries, i.e., the language of the retrieved documents is not the same as the language of the queries. Cross-language mining usually acts as an effective means to improve the performance of CLIR by complementing the translation resources exploited by CLIR systems.

Historical Background

CLIR addresses the growing demand to access large volumes of documents across language barriers. Unlike monolingual information...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Adriani M. Using statistical term similarity for sense disambiguation in cross-language information retrieval. Inf Retr. 2000;2(1):71–82.CrossRefGoogle Scholar
  2. 2.
    Ballestors LA, Croft WB. Phrasal translation and query expansion techniques for cross-language information retrieval. In: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1997. p. 84–91.Google Scholar
  3. 3.
    Ballestors LA, Croft WB. Resolving and ambiguity for cross-language information retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1998. p. 64–71.Google Scholar
  4. 4.
    Brown PF, Pietra SAD, Pietra VDJ, Mercer RL. The mathematics of machine translation: parameter estimation. Comput Linguist. 1992;19(2):263–312.Google Scholar
  5. 5.
    Cheng PJ, Teng JW, Chen RC, Wang JH, Lu WH, Chien LF. Translating unknown queries with Web corpora for cross-language information retrieval. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2004. p. 146–53.Google Scholar
  6. 6.
    Dumais ST, Landauer TK, and Littman ML. Automatic cross-linguistic information retrieval using latent semantic indexing. In: Proceedings of the ACM SIGIR Workshop on Cross-Linguistic Information Retrieval; 1996. p. 16–23.Google Scholar
  7. 7.
    Fujii A, Ishikawa T. Applying machine translation to two-stage cross-language information retrieval. In: Proceedings of the 4th Conference on Association for Machine Translation in the Americas; 2000. p. 13–24.zbMATHCrossRefGoogle Scholar
  8. 8.
    Gao J, Zhou M, Nie, JY, He H, Chen W. Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2002. p. 183–90.Google Scholar
  9. 9.
    Gao W, Niu C, Nie JY, Zhou M, Hu J, Wong KF, Hon HW. Cross-lingual query suggestion using query logs of different languages. In: Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2007. p. 463–70.Google Scholar
  10. 10.
    Jiang L, Zhou M, Chien LF, Niu C. Named entity translation with Web mining and transliteration. In: Proceedings of the 20th International Joint Conference on AI; 2007. p. 1629–34.Google Scholar
  11. 11.
    Lu WH, Chien LF, Lee HJ. Translation of web queries using anchor text mining. ACM Trans Asian Lang Information Proc. 2002;1(2):159–72.CrossRefGoogle Scholar
  12. 12.
    McCarley JS. Should we translate the documents or the queries in cross-language information retrieval? In: Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics; 1999. p. 208–14.Google Scholar
  13. 13.
    McNamee P, Mayfield J. Comparing cross-language query expansion techniques by degrading translation resources. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2002. p. 159–66.Google Scholar
  14. 14.
    Nie JY, Smard M, Isabelle P, Durand R. Cross-language information retrieval based on parallel text and automatic mining of parallel text from the Web. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1999. p. 74–81.Google Scholar
  15. 15.
    Pirkola A, Hedlund T, Keshusalo H, Järvelin K. Dictionary-based cross-language information retrieval: problems, methods, and research findings. Inf Retr. 2001;3(3–4):209–30.zbMATHCrossRefGoogle Scholar
  16. 16.
    Resnik P, Smith NA. The Web as a parallel corpus. Comput Linguist. 2003;29(3):349–80.CrossRefGoogle Scholar
  17. 17.
    Shi L, Niu C, Zhou M, Gao J. A DOM Tree alignment model for mining parallel data from the Web. In: Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics; 2006. p. 489–96.Google Scholar
  18. 18.
    Zhang Y, Vines P. Using the Web for automated translation extraction in cross-language information retrieval. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2004.p. 162–9.Google Scholar
  19. 19.
    Zhang Y, Vines P, Zobel J. An empirical comparison of translation disambiguation techniques for Chinese-English Cross-Language Information Retrieval. In: Proceedings of the 3rd Asia Information Retrieval Symposium; 2006. p. 666–72.CrossRefGoogle Scholar
  20. 20.
    Zhang Y, Vines P, Zobel J. Chinese OOV translation and post-translation query expansion in Chinese-English cross-lingual information retrieval. ACM Trans Asian Lang Information Proc. 2005;4(2):57–77.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Qatar Computing Research InstituteDohaQatar
  2. 2.Microsoft Research AsiaBeijingChina

Section editors and affiliations

  • Zheng Chen
    • 1
  1. 1.Microsoft Research AsiaMicrosoft CorporationBeijingChina