A Non-linear Semantic Mapping Technique for Cross-Language Sentence Matching

  • Rafael E. Banchs
  • Marta R. Costa-jussà
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6233)


A non-linear semantic mapping procedure is implemented for cross-language text matching at the sentence level. The method relies on a non-linear space reduction technique which is used for constructing semantic embeddings of multilingual sentence collections. In the proposed method, an independent embedding is constructed for each language in the multilingual collection and the similarities among the resulting semantic representations are used for cross-language matching. It is shown that the proposed method outperforms other conventional cross-language information retrieval methods.


Canonical Correlation Analysis Sentence Level Parallel Corpus Query Translation Majority Vote Strategy 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kishida, K.: Technical issues of cross-language information retrieval: a review. Information Processing and Management 41(3), 433–455 (2005)CrossRefGoogle Scholar
  2. 2.
    Oard, D.W., Diekema, A.R.: Cross-language information retrieval. Annual Review of Information Science Technology (ARIST) 33, 223–256 (1998)Google Scholar
  3. 3.
    Utiyama, M., Tanimura, M.: Automatic construction technology for parallel corpora. Journal of the National Institute of Information and Communications Technology 54(3), 25–31 (2007)Google Scholar
  4. 4.
    Potthast, M., Stein, B., Eiselt, A., Barrón, A., Rosso, P.: Overview of the 1st international competition on plagiarism detection. In: Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (2009),
  5. 5.
    Banchs, R., Kaltenbrunner, A.: Exploiting MDS projections for cross-language information retrieval. In: 31st Annual International ACM SIGIR Conference, pp. 863–864 (2008)Google Scholar
  6. 6.
    van Eck, N., Waltman, L., van den Berg, J.: A novel algorithm for visualizing concept associations. In: 16th International Workshop on Database and Expert System Applications, pp. 405–409 (2005)Google Scholar
  7. 7.
    Banchs, R.: Semantic mapping for related term identification. In: Gelbukh, A. (ed.) CICLing 2009. LNCS, vol. 5449, pp. 111–124. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  8. 8.
    Rupnik, J., Shawe-Taylor, J.: Multiview canonical correlation analysis and cross-lingual information retrieval (2008),
  9. 9.
    Cox, M.F., Cox, M.A.: Multidimensional Scaling. Chapman & Hall, UK (2001)zbMATHGoogle Scholar
  10. 10.
    Sammon, J.W.: A nonlinear mapping for data structure analysis. IEEE Transactions on Computers 18, 401–409 (1969)CrossRefGoogle Scholar
  11. 11.
    Banchs, R., Costa-jussà, M.: Extracción crosslingüe de documentos usando mapas semánticos no lineales. Procesamiento del Lenguaje Natural 43, 169–176 (2009)Google Scholar
  12. 12.
    Dumais, S., Landauer, T., Littman, M.: Automatic cross-linguistic information retrieval using latent semantic indexing. In: SIGIR 1996 Workshop on Cross-Lingual Information Retrieval (1996)Google Scholar
  13. 13.
    Chen, J., Bao, Y.: Cross-language search: the case of Google language tools. First Monday 14(3-2) (2009)Google Scholar
  14. 14.
    Ramírez, G., Sánchez, F., Ortiz, S., Pérez, J., Forcada, M.: Opentrad Apertium open-source machine translation system: an opportunity for business and research. In: 28th Conference on Translating and the Computer (2006)Google Scholar
  15. 15.

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Rafael E. Banchs
    • 1
  • Marta R. Costa-jussà
    • 1
  1. 1.Barcelona Media Innovation CentreBarcelonaSpain

Personalised recommendations