Advertisement

Multilingual Documents Clustering Based on Closed Concepts Mining

  • Mohamed ChebelEmail author
  • Chiraz Latiri
  • Eric Gaussier
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9261)

Abstract

The scarcity of bilingual and multilingual parallel corpora has prompted many researchers to accentuate the need for new methods to enhance the quality of comparable corpora. In this paper, we highlight the interest and usefulness of Formal Concept Analysis in multiligual document clustering to improve corpora comparability. We propose a statistical approach for clustering multiligual documents based on multilingual Closed Concepts Mining to partition the documents belonging to one or more collections, writing in more than one language, in a set of classes. Experimental evaluation was conducted on two collections and showed a significant improvement of comparability of the generated classes.

Notes

Acknowledgements

This work is partially funded by the DGRST-CNRS \(n\circ \) 14/R 1401 Franco-Tunisian project, entitled “Text mining for construction of bilingual lexicons and multilingual information retrieval”

References

  1. 1.
    Chen, H.-H., Lin, M.-S., Wei, Y.-C.: Novel association measures using web search with double checking. ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 1009–1016 (2006)Google Scholar
  2. 2.
    Evans, D., Klavans, J.: A platform for multilingual news summarization. Technical Report, Department of Computer Science, Columbia University (2003)Google Scholar
  3. 3.
    Ganter, B., Wille, R.: Formal Concept Analysis. Springer, Heidelberg (1999)CrossRefzbMATHGoogle Scholar
  4. 4.
    Gliozzo A., Strapparava C.: Cross language text categorization by acquiring multi-lingual domain models from comparable corpora. ParaText 2005: Proceedings of the ACL Workshop on Building and Using Parallel Texts (2005)Google Scholar
  5. 5.
    Mimouni, N., Nazarenko, A., S. Salotti: Classification conceptuelle d’une collection documentaire, intertextualité et recherche d’information. CORIA 2012: 9th French Information Retrieval Conference. Bordeaux, France (2012)Google Scholar
  6. 6.
    Montalvo, S., Martínez, R., Casillas, A., Fresno, V.: Multilingual news document clustering: two algorithms based on cognate named entities. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 165–172. Springer, Heidelberg (2006) CrossRefGoogle Scholar
  7. 7.
    Pasquier, N., Bastide, Y., Taouil, R., Stumme, G., Lakhal, L.: Generating a condensed representation for association rules. J. Intell. Inf. Syst. 24(1), 2560 (2005)CrossRefGoogle Scholar
  8. 8.
    Peters C.: Result of the CLEF 2003 cross-language system evaluation campaign. In: Notes for the CLEF 2003 Workshop, 21–22 August, Trondheim, Norway (2003)Google Scholar
  9. 9.
    Salton, G., Buckely, C.: Term weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)CrossRefGoogle Scholar
  10. 10.
    Romeo, S., Ienco, D., Tagarelli, A.: Knowledge-based representation for transductive multilingual document classification. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 92–103. Springer, Heidelberg (2015) Google Scholar
  11. 11.
    Wei, C.-P., Yang, C.-C., Lin, C.-M.: A latent semantic indexing-based approach to multilingual document clustering. Decis. Support. Syst. 45(3), 606–620 (2008)CrossRefGoogle Scholar
  12. 12.
    Zaki, M.-J., Hsiao, C.-J.: Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans. Knowl. Data Eng. 17(4), 462–478 (2005)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Research Laboratory LIPAH, Faculty of Sciences of TunisUniversity Tunis El ManarTunisTunisia
  2. 2.Research Laboratory LIG, AMA GroupUniversity Joseph Fourier (Grenoble I)GrenobleFrance

Personalised recommendations