Advertisement

Bilingual Lexicon Extraction from Comparable Corpora Based on Closed Concepts Mining

  • Mohamed ChebelEmail author
  • Chiraz Latiri
  • Eric Gaussier
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10234)

Abstract

In this paper, we propose to complement the context vectors used in bilingual lexicon extraction from comparable corpora with concept vectors, that aim at capturing all the words related to the concepts associated with a given word. This allows one to rely on a representation that is less sparse, especially in specialized domains where the use of a general bilingual lexicon leaves many words untranslated. The concept vectors we are considering are based on closed concepts mining developed in Formal Concept Analysis (FCA). The obtained results on two different comparable corpora show that enriching context vectors with concept vectors leads to lexicons of higher quality, especially in specialized domains.

References

  1. 1.
    Andrade, D., Matsuzaki, T., Tsujii, J: Effective use of dependency structure for bilingual Lexicon creation. In: Gelbukh, A. (ed.) CICLing 2011. LNCS, vol. 6609, pp. 80–92. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-19437-5_7
  2. 2.
    Barker, K., Cornacchia, N.: Using noun phrase heads to extract document keyphrases. In: Hamilton, H.J. (ed.) AI 2000. LNCS (LNAI), vol. 1822, pp. 40–52. Springer, Heidelberg (2000). doi: 10.1007/3-540-45486-1_4 CrossRefGoogle Scholar
  3. 3.
    Chebel, M., Latiri, C., Gaussier, E.: Extraction of interlingual documents clusters based on closed concepts mining. In: 19th International Conference KES 2015, Singapore, pp. 537–546 (2015)Google Scholar
  4. 4.
    Fung, P.: A statistical view on bilingual Lexicon extraction: from parallel corpora to non-parallel corpora. In: Farwell, D., Gerber, L., Hovy, E. (eds.) AMTA 1998. LNCS (LNAI), vol. 1529, pp. 1–17. Springer, Heidelberg (1998). doi: 10.1007/3-540-49478-2_1 CrossRefGoogle Scholar
  5. 5.
    Ganter, B., Wille, R.: Formal Concept Analysis. Springer, Heidelberg (1999)CrossRefzbMATHGoogle Scholar
  6. 6.
    Baroni, M., Georgiana, D., Kruszewski, G.: Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In: 52nd Annual Meeting ACL 2014, Baltimore, Maryland (2014)Google Scholar
  7. 7.
    Laroche, A., Langlais, P.: Revisiting context-based projection methods for term-translation spotting in comparable corpora. In: 23rd International Conference COLING 2010, Beijing, China, pp. 617–625 (2010)Google Scholar
  8. 8.
    Li, B., Gaussier, E.: An information-based cross-language information retrieval model. In: Baeza-Yates, R., Vries, A.P., Zaragoza, H., Cambazoglu, B.B., Murdock, V., Lempel, R., Silvestri, F. (eds.) ECIR 2012. LNCS, vol. 7224, pp. 281–292. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-28997-2_24 CrossRefGoogle Scholar
  9. 9.
    Linard, A., Daille, B., Emmanuel, M.: Attempting to bypass alignment from comparable corpora via pivot language. In: 8th Workshop on BUCC, Beijing, pp. 32–37 (2015)Google Scholar
  10. 10.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)CrossRefzbMATHGoogle Scholar
  11. 11.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, vol. 2013, pp. 3111–3119 (2013)Google Scholar
  12. 12.
    Morin, E., Hazem, A.: Looking at unbalanced specialized comparable corpora for bilingual Lexicon extraction. In: ACL 2014, Baltimore, USA, pp. 284–293 (2014)Google Scholar
  13. 13.
    Gamallo Otero, P.: Comparing window and syntax based strategies for semantic extraction. In: Teixeira, A., Lima, V.L.S., Oliveira, L.C., Quaresma, P. (eds.) PROPOR 2008. LNCS (LNAI), vol. 5190, pp. 41–50. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-85980-2_5 CrossRefGoogle Scholar
  14. 14.
    Pasquier, N., Taouil, R., Bastide, Y., Stumme, G., Lakhal, L.: Generating a condensed representation for association rules. J. Intell. Inf. Syst. 2005, 29–60 (2005)CrossRefzbMATHGoogle Scholar
  15. 15.
    Prochasson, E., Morin, E.l., Kageura, K.: Anchor points for bilingual Lexicon extraction from small comparable corpora. In: Machine Translation Summit, France (2009)Google Scholar
  16. 16.
    Ronan, C., Jason, W.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: ICML2008, pp. 160–167 (2008)Google Scholar
  17. 17.
    Salton, G., Buckley, C.: Term-weighting Approaches in Automatic Text Retrieval. Information Processing Management. Pergamon Press Inc, Tarrytown (1988)Google Scholar
  18. 18.
    Zaki, M.J., Hsiao, C.J.: Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans. Knowl. Data Eng. 17, 462–478 (2005)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Research Laboratory LIPAH, Faculty of Sciences of TunisUniversity Tunis El ManarTunisTunisia
  2. 2.Research Laboratory LIG, AMA Group, University Joseph FourierGrenoble IFrance

Personalised recommendations