Automatic Dictionary Creation by Sub-symbolic Encoding of Words

  • Filippo Vella
  • Giovanni Pilato
  • Ignazio Motisi
  • Salvatore Gaglio
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3931)


This paper describes a technique for automatic creation of dictionaries using sub-symbolic representation of words in cross-language context. Semantic relationship among words of two languages is extracted from aligned bilingual text corpora. This feature is obtained applying the Latent Semantic Analysis technique to the matrices representing terms co-occurrences in aligned text fragments. The technique allows to find the “best translation” according to a properly defined geometric distance in an automatically created semantic space. Experiments show an interesting correctness of 95% obtained in the best case.


Information Retrieval Latent Semantic Analysis Semantic Relationship Semantic Space Geometric Distance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    The new american bible. Resources avalaible at,
  2. 2.
    Brown, R.D.: Automated dictionary extraction for knowledge-free examplebased translation. In: Proc. of the 7th International Conference on Theoretical and Methodological Issues in Machine Translation (1997)Google Scholar
  3. 3.
    Tanimoto, T., Rogers, D.: A computer program for classifying plants. Science 132 (1960)Google Scholar
  4. 4.
    Gaussier, E., Renders, J.-M., Matveeva, I., Goutte, C., Djean, H.: A geometric view on bilingual lexicon extraction from comparable corpora. In: ACL (2004)Google Scholar
  5. 5.
    Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings UAI 1999, pp. 289–296 (1999)Google Scholar
  6. 6.
    Koehn, P.: Europarl: A multilingual corpus for evaluation of machine translation (2003) (unpublished),
  7. 7.
    Littman, M., Dumais, S., Landauer, T.: Automatic cross-language information retrieval using latent semantic indexing. In: Grefenstette, G. (ed.) Cross Language Information Retrieval. Kluwer, Dordrecht (1998)Google Scholar
  8. 8.
    McEwan, C.J.A., Ounis, I., Ruthven, I.: Building bilingual dictionaries from parallel web documents. In: Proc. of the 24 European Colloquium on Information Retrieval Research. LNCS (2002)Google Scholar
  9. 9.
    Foltz, P.W., Landauer, T.K., Laham, D.: An introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)CrossRefGoogle Scholar
  10. 10.
    van Rijsbergen, C.J.: Information Retrieval (1999),

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Filippo Vella
    • 1
  • Giovanni Pilato
    • 2
  • Ignazio Motisi
    • 1
  • Salvatore Gaglio
    • 1
    • 2
  1. 1.DINFO – Dipartimento di ingegneria INFOrmaticaUniversity of PalermoPalermoItaly
  2. 2.ICAR – Istituto di CAlcolo e Reti ad alte prestazioniItalian National Research CouncilPalermoItaly

Personalised recommendations