Processing Multilingual Collections for Text Mining Applications

Gaussier, Eric

doi:10.1007/978-3-540-45219-5_9

Eric Gaussier³

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 138))

1003 Accesses

Abstract

We address in this presentation the problem of processing multilingual collections, for such text mining applications as cross-language clustering, categorisation and information retrieval. We review different models proposed for this task, while focusing on the most important problems that need to be solved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bourigault, D. 1994. Lexater, un Logiciel d’Extraction de TERminologie. Application l’acquisition de connaissances à partir de textes. PhD Thesis. Paris: É cole des Hautes Études en Sciences Sociales.
Google Scholar
Brown, P., Della Pietra, S., Della Pietra, V. and Mercer, R. 1993. “The Mathematics of Statistical Machine Translation: Parameter Estimation”. Computational Linguistics, 19(2).
Google Scholar
Brown, R.D., Carbonell, J.G., Yang, Y. Automatic Dictionary Extraction for CrossLanguage Information Retrieval. In J. Véronis, editor, Parallel Text Processing, 2000.
Google Scholar
Cancedda N., Gaussier E., Goutte C. and Renders J.-M. 2003. Word-Sequence kernels. In Journal of Machine Learning Research, Special Issue on Machine Learning Methods for Text.
Google Scholar
Chuquet, H. and Paillard, M. 1989. Approche linguistique des problèmes de traduction anglais-francais. Ophrys.
Google Scholar
Debili, F. and Zribi, A. 1996. “Les dépendances syntaxiques au service de l’appariement des mots”. In Proceedings of 10ième Congrès Reconnaissance des Formes et Intelligence Artificielle.
Google Scholar
Déjean, H., Gaussier E., and Sadat F. 2002. An Approach based on Multilingual Thesauri and Model Combination for Bilingual Lexicon Extraction. Proceedings of the 19th International Conference on Computational Linguistics, COLING 2002.
Google Scholar
Dempster, A., Laird, N. and Rubin, D. 1977. “Maximum likelihood from incomplete data via the EM algorithm”. Journal of the Royal Statistical Society, 34(B).
Google Scholar
Fung, P. 2000E A statistical View of Bilingual Lexicon Extraction: From Parallel Corpora to Non-Parallel Corpora. In Jean Véronis (Ed.) Parallel Text Processing.
Google Scholar
Gaussier, E. 1995. Modèles statistiques et patrons morphosyntaxiques pour l’extraction de lexiques bilingues de termes. PhD Thesis. Paris: Univ. Paris 7.
Google Scholar
Gaussier, E. 1998. “Flow Network Model for Bilingual Lexicon Extraction”. In Proceedings of the joint Coling-ACL Conference.
Google Scholar
Gaussier, E., Goutte, C., Popat, K., Chen, F. A Hierarchical Model for Clustering and Categorisaing Documents. In Advances in Information Retrieval, Lecture Notes in Computer Science, 2291. Springer-Verlag, 2002.
Google Scholar
Gross, G. 1988. “Degré de figement des noms composés”. Langages, vol. 90.
Google Scholar
Hull, D. 1998. “A practical approach to teminology alignment”. In Proceedings of the First Workshop on Computational Terminology. Montreal, 1998.
Google Scholar
Jaakola, T.S., Haussler, D. Exploiting Generative Models in Discriminative Classifiers. In Advances in Neural Information Processing Systems11, 1999.
Google Scholar
Jacquemin, C. 2001. Spotting and and discovering terms through NLP, MIT Press, Cambridge, MA.
Google Scholar
Jiang, F., Littman, M. Approximate Dimension Equalization in Vector-Based Information Retrieval. In Proceedings of the 17th International Conference on Machine Learning. Morgan-Kauffman, 2000.
Google Scholar
Justeson, J. and Katz, S. 1995. “Technical terminology: some linguistic properties and an algorithm for identification in text”. Natural Language Engineering, 1(1).
Google Scholar
Kupiec, J. 1993. “An algorithm for finding noun phrase correspondences in bilingual corpora”. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics.
Google Scholar
Littman, M., Dumais, S., Landauer, K. Automatic Cross-Language Information Retrieval Using Latent Semantic Indexing. In G. Grefenstette, editor, Cross-Language Information Retrieval. Kluwer, 1998.
Google Scholar
Mathieu-Colas, M. 1988. Typologie des noms composés. Rapport technique. Univ. Paris 13.
Google Scholar
Maxwell, K. 1992. Automatic translation of English compounds: problems and prospects. Rapport technique Working Papers in Language Processing, 39, University of Essex.
Google Scholar
Nkwenti-Azeh, B. 1992. Positional and Combinational characteristics of Satellite Communications terms. Technical Report, CC1-UMIST, Manchester.
Google Scholar
Peters C. and Picchi E. 1995. Capturing the Comparable: A System for Querying Comparable Text Corpora, Proceedings of the 3rd International Conference on Statistical Analysis of Textual Data.
Google Scholar
Rapp R. 1999. Automatic Identification of Word Translations from Unrelated English and German Corpora. Proceedings of the European Association for Computational Linguistics.
Google Scholar
Shahzad I., Ohtake K., Masuyama S. And Yamamoto K. 1999. Identifying Translations of Compound Using Non-aligned Corpora. Proceedings of the Workshop MAL.
Google Scholar
Tanaka K. And Iwasaki H. 1996. Extraction of lexical translations from Non-Aligned Corpora. Proceedings of the 13th International Conference on Computational Linguistics, COLING’96.
Google Scholar
Vinokourov, A., Shawe-Taylor, J., Cristianini, N. Inferring a semantic representation of text via cross-language correlation analysis, Advances in Neural Information Processing Systems 15, 2002.
Google Scholar
Wu, D. 1997. “Stochastic inversion transduction grammars and bilingual parsing of parallel corpora”. Computational Linguistics, 23(3).
Google Scholar

Download references

Author information

Authors and Affiliations

Xerox Research Centre Europe, 6, Chemin de Maupertuis, 38240, Meylan, France
Eric Gaussier

Authors

Eric Gaussier
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Technology Institute, Research Academic, 61 Riga Feraiou Str, 26221, Patras, Greece
Spiros Sirmakessis (Assistant Professor) (Assistant Professor)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gaussier, E. (2004). Processing Multilingual Collections for Text Mining Applications. In: Sirmakessis, S. (eds) Text Mining and its Applications. Studies in Fuzziness and Soft Computing, vol 138. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45219-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-540-45219-5_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05780-9
Online ISBN: 978-3-540-45219-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics