Skip to main content

Processing Multilingual Collections for Text Mining Applications

  • Conference paper
Text Mining and its Applications

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 138))

  • 1003 Accesses

Abstract

We address in this presentation the problem of processing multilingual collections, for such text mining applications as cross-language clustering, categorisation and information retrieval. We review different models proposed for this task, while focusing on the most important problems that need to be solved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bourigault, D. 1994. Lexater, un Logiciel d’Extraction de TERminologie. Application l’acquisition de connaissances à partir de textes. PhD Thesis. Paris: É cole des Hautes Études en Sciences Sociales.

    Google Scholar 

  2. Brown, P., Della Pietra, S., Della Pietra, V. and Mercer, R. 1993. “The Mathematics of Statistical Machine Translation: Parameter Estimation”. Computational Linguistics, 19(2).

    Google Scholar 

  3. Brown, R.D., Carbonell, J.G., Yang, Y. Automatic Dictionary Extraction for CrossLanguage Information Retrieval. In J. Véronis, editor, Parallel Text Processing, 2000.

    Google Scholar 

  4. Cancedda N., Gaussier E., Goutte C. and Renders J.-M. 2003. Word-Sequence kernels. In Journal of Machine Learning Research, Special Issue on Machine Learning Methods for Text.

    Google Scholar 

  5. Chuquet, H. and Paillard, M. 1989. Approche linguistique des problèmes de traduction anglais-francais. Ophrys.

    Google Scholar 

  6. Debili, F. and Zribi, A. 1996. “Les dépendances syntaxiques au service de l’appariement des mots”. In Proceedings of 10ième Congrès Reconnaissance des Formes et Intelligence Artificielle.

    Google Scholar 

  7. Déjean, H., Gaussier E., and Sadat F. 2002. An Approach based on Multilingual Thesauri and Model Combination for Bilingual Lexicon Extraction. Proceedings of the 19th International Conference on Computational Linguistics, COLING 2002.

    Google Scholar 

  8. Dempster, A., Laird, N. and Rubin, D. 1977. “Maximum likelihood from incomplete data via the EM algorithm”. Journal of the Royal Statistical Society, 34(B).

    Google Scholar 

  9. Fung, P. 2000E A statistical View of Bilingual Lexicon Extraction: From Parallel Corpora to Non-Parallel Corpora. In Jean Véronis (Ed.) Parallel Text Processing.

    Google Scholar 

  10. Gaussier, E. 1995. Modèles statistiques et patrons morphosyntaxiques pour l’extraction de lexiques bilingues de termes. PhD Thesis. Paris: Univ. Paris 7.

    Google Scholar 

  11. Gaussier, E. 1998. “Flow Network Model for Bilingual Lexicon Extraction”. In Proceedings of the joint Coling-ACL Conference.

    Google Scholar 

  12. Gaussier, E., Goutte, C., Popat, K., Chen, F. A Hierarchical Model for Clustering and Categorisaing Documents. In Advances in Information Retrieval, Lecture Notes in Computer Science, 2291. Springer-Verlag, 2002.

    Google Scholar 

  13. Gross, G. 1988. “Degré de figement des noms composés”. Langages, vol. 90.

    Google Scholar 

  14. Hull, D. 1998. “A practical approach to teminology alignment”. In Proceedings of the First Workshop on Computational Terminology. Montreal, 1998.

    Google Scholar 

  15. Jaakola, T.S., Haussler, D. Exploiting Generative Models in Discriminative Classifiers. In Advances in Neural Information Processing Systems11, 1999.

    Google Scholar 

  16. Jacquemin, C. 2001. Spotting and and discovering terms through NLP, MIT Press, Cambridge, MA.

    Google Scholar 

  17. Jiang, F., Littman, M. Approximate Dimension Equalization in Vector-Based Information Retrieval. In Proceedings of the 17th International Conference on Machine Learning. Morgan-Kauffman, 2000.

    Google Scholar 

  18. Justeson, J. and Katz, S. 1995. “Technical terminology: some linguistic properties and an algorithm for identification in text”. Natural Language Engineering, 1(1).

    Google Scholar 

  19. Kupiec, J. 1993. “An algorithm for finding noun phrase correspondences in bilingual corpora”. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics.

    Google Scholar 

  20. Littman, M., Dumais, S., Landauer, K. Automatic Cross-Language Information Retrieval Using Latent Semantic Indexing. In G. Grefenstette, editor, Cross-Language Information Retrieval. Kluwer, 1998.

    Google Scholar 

  21. Mathieu-Colas, M. 1988. Typologie des noms composés. Rapport technique. Univ. Paris 13.

    Google Scholar 

  22. Maxwell, K. 1992. Automatic translation of English compounds: problems and prospects. Rapport technique Working Papers in Language Processing, 39, University of Essex.

    Google Scholar 

  23. Nkwenti-Azeh, B. 1992. Positional and Combinational characteristics of Satellite Communications terms. Technical Report, CC1-UMIST, Manchester.

    Google Scholar 

  24. Peters C. and Picchi E. 1995. Capturing the Comparable: A System for Querying Comparable Text Corpora, Proceedings of the 3rd International Conference on Statistical Analysis of Textual Data.

    Google Scholar 

  25. Rapp R. 1999. Automatic Identification of Word Translations from Unrelated English and German Corpora. Proceedings of the European Association for Computational Linguistics.

    Google Scholar 

  26. Shahzad I., Ohtake K., Masuyama S. And Yamamoto K. 1999. Identifying Translations of Compound Using Non-aligned Corpora. Proceedings of the Workshop MAL.

    Google Scholar 

  27. Tanaka K. And Iwasaki H. 1996. Extraction of lexical translations from Non-Aligned Corpora. Proceedings of the 13th International Conference on Computational Linguistics, COLING’96.

    Google Scholar 

  28. Vinokourov, A., Shawe-Taylor, J., Cristianini, N. Inferring a semantic representation of text via cross-language correlation analysis, Advances in Neural Information Processing Systems 15, 2002.

    Google Scholar 

  29. Wu, D. 1997. “Stochastic inversion transduction grammars and bilingual parsing of parallel corpora”. Computational Linguistics, 23(3).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gaussier, E. (2004). Processing Multilingual Collections for Text Mining Applications. In: Sirmakessis, S. (eds) Text Mining and its Applications. Studies in Fuzziness and Soft Computing, vol 138. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45219-5_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45219-5_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-05780-9

  • Online ISBN: 978-3-540-45219-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics