Cross-Lingual Text Mining
Cross-lingual text mining is a general category denoting tasks and methods for accessing the information in sets of documents written in several languages, or whenever the language used to express an information need is different from the language of the documents. A distinguishing feature of cross-lingual text mining is the necessity to overcome some language translation barrier.
Motivation and Background
Advances in mass storage and network connectivity make enormous amounts of information easily accessible to an increasingly large fraction of the world population. Such information is mostly encoded in the form of running text which, in most cases, is written in a language different from the native language of the user. This state of affairs creates many situations in which the main barrier to the fulfillment of an information need is not technological but linguistic. For example, in some cases the user has some knowledge of the language in which the text containing a...
- Brown, P. E., Della Pietra, V. J., Della Pietra, S. A., & Mercer, R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 12(2), 263–311.Google Scholar
- Gaussier, E., Renders, J.-M., Matveeva, I., Goutte, C., & Déjean, H. (2004). A geometric view on bilingual lexicon extraction from comparable corpora. In Proceedings of the 42nd annual meeting of the association for computational linguistics, Barcelona, Spain. Morristown, NJ: Association for Computational Linguistics.Google Scholar
- Savoy, J., & Berger, P. Y. (2005). Report on CLEF-2005 evaluation campaign: Monolingual, bilingual and GIRT information retrieval. In Proceedings of the cross-language evaluation forum (CLEF) (pp. 131–140). Heidelberg: Springer.Google Scholar
- Zhang, Y., & Vines, P. (2005). Using the web for translation disambiguation. In Proceedings of the NTCIR-5 workshop meeting, Tokyo, Japan.Google Scholar