Skip to main content

Learning Clusters of Bilingual Suffixes Using Bilingual Translation Lexicon

Part of the Lecture Notes in Computer Science book series (LNAI,volume 9468)


By learning bilingual suffixation operations from translations using an existing bilingual lexicon with near translation forms we can improve its coverage and hence deal with the OOV entries. From this perspective, we identify bilingual stems, their bilingual morphological extensions (bilingual suffixes) and subsequently clusters of bilingual suffixes using known translation forms seen in an existing bilingual translation lexicon. We rely on clustering to enable safer translation generalisations. The degree of co-occurrence between two bilingual morphological extensions with reference to common bilingual stems determines if each of them should fall in the same cluster. Results are discussed for language pairs English-Portuguese (EN-PT) and English-Hindi (EN-HI).


  • Language Pair
  • Bilingual Lexicon
  • Bilingual Translation
  • Translation Form
  • Natural Language Processing System

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-26832-3_57
  • Chapter length: 9 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   84.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-26832-3
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   109.99
Price excludes VAT (USA)


  1. 1.

    Note the null suffix in EN corresponding to gender and number suffixes in HI.

  2. 2.

    Translations that are lexically similar.

  3. 3.

    A suffix cluster may or may not correspond to Part-of-Speech such as noun or adjective but there are cases where the same suffix cluster aggregates nouns, adjectives and adverbs.

  4. 4.

  5. 5.

    2\(^{nd}\) line in each row shows the transliterations for HI terms.

  6. 6.\(\_\)unic.html/

  7. 7.

    EMILLE Corpus -

  8. 8.

    DGT-TM - Europarl - OPUS (EUconst, EMEA) -

  9. 9.

    In the Table 4, only two bilingual suffixes are shown per cluster although the original clusters contains varying number of bilingual suffixes ranging from 2 to 15 for EN-PT and from 2 to 5 for EN-HI.


  1. Karimbi Mahesh, K., Gomes, L., Lopes, J.G.P.: Identification of bilingual segments for translation generation. In: Blockeel, H., van Leeuwen, M., Vinciotti, V. (eds.) IDA 2014. LNCS, vol. 8819, pp. 167–178. Springer, Heidelberg (2014)

    Google Scholar 

  2. Lindén, K.: Assigning an inflectional paradigm using the longest matching affix. In: Mitään ongelmia, E., Wiberg, M., Koura, A. (eds.) Juhlakirja Juhani Reimanille 50-vuotispäiväksi 23.1.2008. Turku 2008 (2008)

    Google Scholar 

  3. Desai, S., Pawar, J., Bhattacharyya, P.: A framework for learning morphology using suffix association matrix. In: WSSANLP-2014, pp. 28–36 (2014)

    Google Scholar 

  4. Dasgupta, S., Ng, V.: Unsupervised word segmentation for bangla. In: Proceedings of ICON, pp. 15–24 (2007)

    Google Scholar 

  5. Da Silva, J.F., Lopes, G.P.: Extracting multiword terms from document collections. In: Proceedings of the VExTAL: Venezia per il Trattamento Automatico delle Lingue, pp. 22–24 (1999)

    Google Scholar 

  6. Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computat. Linguist. 19(2), 263–311 (1993)

    Google Scholar 

  7. Lardilleux, A., Lepage, Y.: Sampling-based multilingual alignment. Proc. RANLP 2009, 214–218 (2009)

    Google Scholar 

  8. Aires, J., Lopes, G.P., Gomes, L.: Phrase translation extraction from aligned parallel corpora using suffix arrays and related structures. In: Lopes, L.S., Lau, N., Mariano, P., Rocha, L.M. (eds.) EPIA 2009. LNCS, vol. 5816, pp. 587–597. Springer, Heidelberg (2009)

    CrossRef  Google Scholar 

  9. Gomes, L., Pereira Lopes, J.G.: Measuring spelling similarity for cognate identification. In: Antunes, L., Pinto, H.S. (eds.) EPIA 2011. LNCS, vol. 7026, pp. 624–633. Springer, Heidelberg (2011)

    CrossRef  Google Scholar 

  10. Kavitha, K.M., Gomes, L., Lopes, J.G.P.: Bilingually motivated segmentation and generation of word translations using relatively small translation data sets. In: Proceedings of the PACLIC29 (Accepted) (2015)

    Google Scholar 

Download references


K.M. Kavitha and Luís Gomes acknowledge the Research Fellowship by FCT/MCTES with Ref. nos., SFRH/BD/64371/2009 and SFRH/BD/65059/2009, respectively, and the funded research project ISTRION (Ref. PTDC/EIA-EIA/114521/2009) that provided other means for the research carried out. The authors thank NOVA LINCS, FCT/UNL for the support, SJEC for providing the financial assistance to participate in MIKE 2015, and ISTRION BOX - Translation&Revision, Lda., for providing the data and valuable consultation.

Author information

Authors and Affiliations


Corresponding author

Correspondence to K. M. Kavitha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Kavitha, K.M., Gomes, L., Lopes, J.G.P. (2015). Learning Clusters of Bilingual Suffixes Using Bilingual Translation Lexicon. In: Prasath, R., Vuppala, A., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2015. Lecture Notes in Computer Science(), vol 9468. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26831-6

  • Online ISBN: 978-3-319-26832-3

  • eBook Packages: Computer ScienceComputer Science (R0)