International Conference on Mining Intelligence and Knowledge Exploration

Mining Intelligence and Knowledge Exploration pp 607-615

Learning Clusters of Bilingual Suffixes Using Bilingual Translation Lexicon

  • K. M. Kavitha
  • Luís Gomes
  • José Gabriel P. Lopes
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9468)

Abstract

By learning bilingual suffixation operations from translations using an existing bilingual lexicon with near translation forms we can improve its coverage and hence deal with the OOV entries. From this perspective, we identify bilingual stems, their bilingual morphological extensions (bilingual suffixes) and subsequently clusters of bilingual suffixes using known translation forms seen in an existing bilingual translation lexicon. We rely on clustering to enable safer translation generalisations. The degree of co-occurrence between two bilingual morphological extensions with reference to common bilingual stems determines if each of them should fall in the same cluster. Results are discussed for language pairs English-Portuguese (EN-PT) and English-Hindi (EN-HI).

References

  1. 1.
    Karimbi Mahesh, K., Gomes, L., Lopes, J.G.P.: Identification of bilingual segments for translation generation. In: Blockeel, H., van Leeuwen, M., Vinciotti, V. (eds.) IDA 2014. LNCS, vol. 8819, pp. 167–178. Springer, Heidelberg (2014) Google Scholar
  2. 2.
    Lindén, K.: Assigning an inflectional paradigm using the longest matching affix. In: Mitään ongelmia, E., Wiberg, M., Koura, A. (eds.) Juhlakirja Juhani Reimanille 50-vuotispäiväksi 23.1.2008. Turku 2008 (2008)Google Scholar
  3. 3.
    Desai, S., Pawar, J., Bhattacharyya, P.: A framework for learning morphology using suffix association matrix. In: WSSANLP-2014, pp. 28–36 (2014)Google Scholar
  4. 4.
    Dasgupta, S., Ng, V.: Unsupervised word segmentation for bangla. In: Proceedings of ICON, pp. 15–24 (2007)Google Scholar
  5. 5.
    Da Silva, J.F., Lopes, G.P.: Extracting multiword terms from document collections. In: Proceedings of the VExTAL: Venezia per il Trattamento Automatico delle Lingue, pp. 22–24 (1999)Google Scholar
  6. 6.
    Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computat. Linguist. 19(2), 263–311 (1993)Google Scholar
  7. 7.
    Lardilleux, A., Lepage, Y.: Sampling-based multilingual alignment. Proc. RANLP 2009, 214–218 (2009)Google Scholar
  8. 8.
    Aires, J., Lopes, G.P., Gomes, L.: Phrase translation extraction from aligned parallel corpora using suffix arrays and related structures. In: Lopes, L.S., Lau, N., Mariano, P., Rocha, L.M. (eds.) EPIA 2009. LNCS, vol. 5816, pp. 587–597. Springer, Heidelberg (2009) CrossRefGoogle Scholar
  9. 9.
    Gomes, L., Pereira Lopes, J.G.: Measuring spelling similarity for cognate identification. In: Antunes, L., Pinto, H.S. (eds.) EPIA 2011. LNCS, vol. 7026, pp. 624–633. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  10. 10.
    Kavitha, K.M., Gomes, L., Lopes, J.G.P.: Bilingually motivated segmentation and generation of word translations using relatively small translation data sets. In: Proceedings of the PACLIC29 (Accepted) (2015)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • K. M. Kavitha
    • 1
    • 3
  • Luís Gomes
    • 1
    • 2
  • José Gabriel P. Lopes
    • 1
    • 2
  1. 1.NOVA Laboratory for Computer Science and Informatics (NOVA LINCS), Faculdade de Ciências e TecnologiaUniversidade Nova de LisboaCaparicaPortugal
  2. 2.ISTRION BOX-Translation & Revision, Lda., ParkurbisCovilhãPortugal
  3. 3.Department of Computer ApplicationsSt. Joseph Engineering CollegeVamanjoor, MangaluruIndia

Personalised recommendations