Learning Clusters of Bilingual Suffixes Using Bilingual Translation Lexicon

  • K. M. Kavitha
  • Luís Gomes
  • José Gabriel P. Lopes
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9468)


By learning bilingual suffixation operations from translations using an existing bilingual lexicon with near translation forms we can improve its coverage and hence deal with the OOV entries. From this perspective, we identify bilingual stems, their bilingual morphological extensions (bilingual suffixes) and subsequently clusters of bilingual suffixes using known translation forms seen in an existing bilingual translation lexicon. We rely on clustering to enable safer translation generalisations. The degree of co-occurrence between two bilingual morphological extensions with reference to common bilingual stems determines if each of them should fall in the same cluster. Results are discussed for language pairs English-Portuguese (EN-PT) and English-Hindi (EN-HI).


Language Pair Bilingual Lexicon Bilingual Translation Translation Form Natural Language Processing System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



K.M. Kavitha and Luís Gomes acknowledge the Research Fellowship by FCT/MCTES with Ref. nos., SFRH/BD/64371/2009 and SFRH/BD/65059/2009, respectively, and the funded research project ISTRION (Ref. PTDC/EIA-EIA/114521/2009) that provided other means for the research carried out. The authors thank NOVA LINCS, FCT/UNL for the support, SJEC for providing the financial assistance to participate in MIKE 2015, and ISTRION BOX - Translation&Revision, Lda., for providing the data and valuable consultation.


  1. 1.
    Karimbi Mahesh, K., Gomes, L., Lopes, J.G.P.: Identification of bilingual segments for translation generation. In: Blockeel, H., van Leeuwen, M., Vinciotti, V. (eds.) IDA 2014. LNCS, vol. 8819, pp. 167–178. Springer, Heidelberg (2014) Google Scholar
  2. 2.
    Lindén, K.: Assigning an inflectional paradigm using the longest matching affix. In: Mitään ongelmia, E., Wiberg, M., Koura, A. (eds.) Juhlakirja Juhani Reimanille 50-vuotispäiväksi 23.1.2008. Turku 2008 (2008)Google Scholar
  3. 3.
    Desai, S., Pawar, J., Bhattacharyya, P.: A framework for learning morphology using suffix association matrix. In: WSSANLP-2014, pp. 28–36 (2014)Google Scholar
  4. 4.
    Dasgupta, S., Ng, V.: Unsupervised word segmentation for bangla. In: Proceedings of ICON, pp. 15–24 (2007)Google Scholar
  5. 5.
    Da Silva, J.F., Lopes, G.P.: Extracting multiword terms from document collections. In: Proceedings of the VExTAL: Venezia per il Trattamento Automatico delle Lingue, pp. 22–24 (1999)Google Scholar
  6. 6.
    Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computat. Linguist. 19(2), 263–311 (1993)Google Scholar
  7. 7.
    Lardilleux, A., Lepage, Y.: Sampling-based multilingual alignment. Proc. RANLP 2009, 214–218 (2009)Google Scholar
  8. 8.
    Aires, J., Lopes, G.P., Gomes, L.: Phrase translation extraction from aligned parallel corpora using suffix arrays and related structures. In: Lopes, L.S., Lau, N., Mariano, P., Rocha, L.M. (eds.) EPIA 2009. LNCS, vol. 5816, pp. 587–597. Springer, Heidelberg (2009) CrossRefGoogle Scholar
  9. 9.
    Gomes, L., Pereira Lopes, J.G.: Measuring spelling similarity for cognate identification. In: Antunes, L., Pinto, H.S. (eds.) EPIA 2011. LNCS, vol. 7026, pp. 624–633. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  10. 10.
    Kavitha, K.M., Gomes, L., Lopes, J.G.P.: Bilingually motivated segmentation and generation of word translations using relatively small translation data sets. In: Proceedings of the PACLIC29 (Accepted) (2015)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • K. M. Kavitha
    • 1
    • 3
  • Luís Gomes
    • 1
    • 2
  • José Gabriel P. Lopes
    • 1
    • 2
  1. 1.NOVA Laboratory for Computer Science and Informatics (NOVA LINCS), Faculdade de Ciências e TecnologiaUniversidade Nova de LisboaCaparicaPortugal
  2. 2.ISTRION BOX-Translation & Revision, Lda., ParkurbisCovilhãPortugal
  3. 3.Department of Computer ApplicationsSt. Joseph Engineering CollegeVamanjoor, MangaluruIndia

Personalised recommendations