Abstract
We address in this presentation the problem of processing multilingual collections, for such text mining applications as cross-language clustering, categorisation and information retrieval. We review different models proposed for this task, while focusing on the most important problems that need to be solved.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bourigault, D. 1994. Lexater, un Logiciel d’Extraction de TERminologie. Application l’acquisition de connaissances à partir de textes. PhD Thesis. Paris: É cole des Hautes Études en Sciences Sociales.
Brown, P., Della Pietra, S., Della Pietra, V. and Mercer, R. 1993. “The Mathematics of Statistical Machine Translation: Parameter Estimation”. Computational Linguistics, 19(2).
Brown, R.D., Carbonell, J.G., Yang, Y. Automatic Dictionary Extraction for CrossLanguage Information Retrieval. In J. Véronis, editor, Parallel Text Processing, 2000.
Cancedda N., Gaussier E., Goutte C. and Renders J.-M. 2003. Word-Sequence kernels. In Journal of Machine Learning Research, Special Issue on Machine Learning Methods for Text.
Chuquet, H. and Paillard, M. 1989. Approche linguistique des problèmes de traduction anglais-francais. Ophrys.
Debili, F. and Zribi, A. 1996. “Les dépendances syntaxiques au service de l’appariement des mots”. In Proceedings of 10ième Congrès Reconnaissance des Formes et Intelligence Artificielle.
Déjean, H., Gaussier E., and Sadat F. 2002. An Approach based on Multilingual Thesauri and Model Combination for Bilingual Lexicon Extraction. Proceedings of the 19th International Conference on Computational Linguistics, COLING 2002.
Dempster, A., Laird, N. and Rubin, D. 1977. “Maximum likelihood from incomplete data via the EM algorithm”. Journal of the Royal Statistical Society, 34(B).
Fung, P. 2000E A statistical View of Bilingual Lexicon Extraction: From Parallel Corpora to Non-Parallel Corpora. In Jean Véronis (Ed.) Parallel Text Processing.
Gaussier, E. 1995. Modèles statistiques et patrons morphosyntaxiques pour l’extraction de lexiques bilingues de termes. PhD Thesis. Paris: Univ. Paris 7.
Gaussier, E. 1998. “Flow Network Model for Bilingual Lexicon Extraction”. In Proceedings of the joint Coling-ACL Conference.
Gaussier, E., Goutte, C., Popat, K., Chen, F. A Hierarchical Model for Clustering and Categorisaing Documents. In Advances in Information Retrieval, Lecture Notes in Computer Science, 2291. Springer-Verlag, 2002.
Gross, G. 1988. “Degré de figement des noms composés”. Langages, vol. 90.
Hull, D. 1998. “A practical approach to teminology alignment”. In Proceedings of the First Workshop on Computational Terminology. Montreal, 1998.
Jaakola, T.S., Haussler, D. Exploiting Generative Models in Discriminative Classifiers. In Advances in Neural Information Processing Systems11, 1999.
Jacquemin, C. 2001. Spotting and and discovering terms through NLP, MIT Press, Cambridge, MA.
Jiang, F., Littman, M. Approximate Dimension Equalization in Vector-Based Information Retrieval. In Proceedings of the 17th International Conference on Machine Learning. Morgan-Kauffman, 2000.
Justeson, J. and Katz, S. 1995. “Technical terminology: some linguistic properties and an algorithm for identification in text”. Natural Language Engineering, 1(1).
Kupiec, J. 1993. “An algorithm for finding noun phrase correspondences in bilingual corpora”. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics.
Littman, M., Dumais, S., Landauer, K. Automatic Cross-Language Information Retrieval Using Latent Semantic Indexing. In G. Grefenstette, editor, Cross-Language Information Retrieval. Kluwer, 1998.
Mathieu-Colas, M. 1988. Typologie des noms composés. Rapport technique. Univ. Paris 13.
Maxwell, K. 1992. Automatic translation of English compounds: problems and prospects. Rapport technique Working Papers in Language Processing, 39, University of Essex.
Nkwenti-Azeh, B. 1992. Positional and Combinational characteristics of Satellite Communications terms. Technical Report, CC1-UMIST, Manchester.
Peters C. and Picchi E. 1995. Capturing the Comparable: A System for Querying Comparable Text Corpora, Proceedings of the 3rd International Conference on Statistical Analysis of Textual Data.
Rapp R. 1999. Automatic Identification of Word Translations from Unrelated English and German Corpora. Proceedings of the European Association for Computational Linguistics.
Shahzad I., Ohtake K., Masuyama S. And Yamamoto K. 1999. Identifying Translations of Compound Using Non-aligned Corpora. Proceedings of the Workshop MAL.
Tanaka K. And Iwasaki H. 1996. Extraction of lexical translations from Non-Aligned Corpora. Proceedings of the 13th International Conference on Computational Linguistics, COLING’96.
Vinokourov, A., Shawe-Taylor, J., Cristianini, N. Inferring a semantic representation of text via cross-language correlation analysis, Advances in Neural Information Processing Systems 15, 2002.
Wu, D. 1997. “Stochastic inversion transduction grammars and bilingual parsing of parallel corpora”. Computational Linguistics, 23(3).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gaussier, E. (2004). Processing Multilingual Collections for Text Mining Applications. In: Sirmakessis, S. (eds) Text Mining and its Applications. Studies in Fuzziness and Soft Computing, vol 138. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45219-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-540-45219-5_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05780-9
Online ISBN: 978-3-540-45219-5
eBook Packages: Springer Book Archive