Abstract
Recently, different works on bilingual lexicon extraction from comparable corpora have been proposed. This paper presents how to combine differents methods for bilingual lexicon extraction based on standard context vectors and advanced text mining methods. In this respect, we focus on combining bilingual lexicons based on context vectors, association rules and contextual meta-rules. The combination of lexicons leads to a less sparse representation in order to extract the most effective translations from these lexicons and create an optimal bilingual lexicon. An experimental validation conducted on two pairs of languages of the CLEF 2003 campaign evaluation, shows that the combination of the models give a significant improvement compared to the standard approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Rapp, R.: Automatic identification of word translations from unrelated English and German corpora. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 519–526 (1999)
Morin, E., Daille, B., Takeuchi, K., Kageura, K.: Bilingual terminology mining-using brain, not brawn comparable corpora. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007), Prague, Czech Republic, pp. 664–671 (2007)
Morin, E., Hazem, A.: Exploiting unbalanced specialized comparable corpora for bilingual lexicon extraction. Nat. Lang. Eng. 22(4), 575–601 (2016)
Belhaj Rhouma, S., Latiri, C., Catherine, B.: Advanced text mining methods for bilingual lexicon extraction from specialized comparable corpora. In: 19th International Conference on Computational Linguistics and Intelligent Text Processing, Hanoi, Vietnam. Lecture Notes in Computer Science. Springer, 18–24 March 2018
Ganter, B., Wille, R.: Formal Concept Analysis. Springer, Berlin (1999). https://doi.org/10.1007/978-3-642-59830-2
Agrawal, R., Skirant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Databases, VLDB 1994, Santiago, Chile, pp. 478–499, September 1994
Sourour, B.R., Latiri, C.C., Slimani, Y.: Vers des méta-règles de contexte appréciées par la IIE pour la RI. In: CORIA 2015 - Conférence en Recherche d’Infomations et Applications - 12th French Information Retrieval Conference, Paris, France, pp. 205–220, 18–20 March 2015
Tanaka, K., Umemura, K.: Construction of a bilingual dictionary intermediated by a third language. In: Proceedings of the 15th Conference on Computational Linguistics, Kyoto, Japan, vol. 1, pp. 297–303. Association for Computational Linguistics (1994)
Ansari, E., Sadreddini, M.H., Alireza, T., Sheikhalishahi, M.: Combining different seed dictionaries to extract lexicon from comparable corpus. Indian J. Sci. Technol. 7(9), 1279–1288 (2014)
Hazem, A., Morin, E.: Bilingual lexicon extraction from comparable corpora by combining contextual representations. In: Traitement Automatique des Langues Naturelles, TALN 2013, pp. 243–256, 17–21 Juin 2013. Articles longs
Prochasson, E., Morin, E.: Points d’ancrage pour l’extraction lexicale bilingue à partir de petits corpus comparables spécialisés. In: Traitement Automatique des Langues (TAL), vol. 50, pp. 283–304 (2009)
Navigli, R., Ponzetto, S.: Babelnet: building a very large multilingual semantic network. In: Hajic, J., Carberry, S., Clark, S. (eds.) Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010), pp. 216–225. The Association for Computer Linguistics (2010)
Li, B., Gaussier, E.: Improving corpus comparability for bilingual lexicon extraction from comparable corpora. In: 23rd International Conference on Computational Linguistics, Proceedings of the Conference (COLING 2010), pp. 644–652, 23–27 August 2010
Chebel, M., Latiri, C., Gaussier, E.: Bilingual lexicon extraction from comparable corpora based on closed concepts mining. In: Kim, J., Shim, K., Cao, L., Lee, J.-G., Lin, X., Moon, Y.-S. (eds.) PAKDD 2017. LNCS (LNAI), vol. 10234, pp. 586–598. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57454-7_46
Hazem, A., Morin, E.: Efficient data selection for bilingual terminology extraction from comparable corpora. In: 26th International Conference on Computational Linguistics (COLING 2016), pp. 3401–3411. ACL, Osaka, Japan (2016)
Bouamor, D., Semmar, N., Zweigenbaum, P.: Towards a generic approach for bilingual lexicon extraction from comparable corpora. In: Proceedings of the 15th Machine Translation Summit. pp. 143–150, 2–6 Sept 2013
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Belhaj Rhouma, S., Latiri, C., Berrut, C. (2018). Combining Bilingual Lexicons Extracted from Comparable Corpora: The Complementary Approach Between Word Embedding and Text Mining. In: Hartmann, S., Ma, H., Hameurlain, A., Pernul, G., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2018. Lecture Notes in Computer Science(), vol 11030. Springer, Cham. https://doi.org/10.1007/978-3-319-98812-2_47
Download citation
DOI: https://doi.org/10.1007/978-3-319-98812-2_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98811-5
Online ISBN: 978-3-319-98812-2
eBook Packages: Computer ScienceComputer Science (R0)