Using Kazakh Morphology Information to Improve Word Alignment for SMT

Kartbayev, Amandyk

doi:10.1007/978-3-319-29504-6_34

Amandyk Kartbayev⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 427))

907 Accesses

Abstract

We propose an word alignment model with two core features: the ability to handle uncertainty in the morpheme matching process and in the selecting correct phrase alignments after its creation. These processes are based on the use of a morphological analysis tool and a large monolingual corpora, which is used for improving the alignment elements correspondence. The consideration of this tool is language-dependent for a special pair of the languages, although an Wikipedia data represents an adequate source of the training text that can be used in many cases, and even allows an unsupervised word segmentation. Based on these features, we propose an approach that captures the morphotactics which is common to the source text. The paper describes experiments in the general domain by using a tagset, and has been compared to a classical word alignment by the help of human judgment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bekbulatov, E., Kartbayev, A.: A study of certain morphological structures of Kazakh and their impact on the machine translation quality. In: IEEE 8th International Conference on Application of Information and Communication Technologies, pp. 1–5. Astana (2014)
Google Scholar
Oflazer, K., El-Kahlout, D.: Exploring different representational units in English-to-Turkish statistical machine translation. In: 2nd Workshop on Statistical Machine Translation, pp. 25–32. Prague (2007)
Google Scholar
Bisazza, A., Federico, M.: Morphological pre-processing for Turkish to English statistical machine translation. In: International Workshop on Spoken Language Translation 2009, pp. 129–135. Tokyo (2009)
Google Scholar
Kartbayev, A.: SMT: A case study of Kazakh-English word alignment. In: Current Trends in Web Engineering, pp. 40–49. Springer, Heidelberg (2015)
Google Scholar
Moore, R.: Improving IBM word alignment model 1. In: 42nd Annual Meeting on Association for Computational Linguistics, pp. 518–525. Barcelona (2004)
Google Scholar
Brown, P.F., DellaPietra, V.J., DellaPietra, S.A., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. In: Computational Linguistics, vol. 19, pp. 263–311. MIT Press Cambridge, MA (1993)
Google Scholar
Vogel, S., Ney, H., Tillmann, C.: HMM-based word alignment in statistical translation. In: 16th International Conference on Computational Linguistics, pp. 836–841. Copenhagen (1996)
Google Scholar
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the em algorithm. J. Roy. Stat. Soc. B 39, 1–38. Wiley-Blackwell, UK (1977)
Google Scholar
Creutz, M., Lagus, K.: Unsupervised models for morpheme segmentation and morphology learning. In: ACM Transactions on Speech and Language Processing, vol. 4, article 3. Association for Computing Machinery, New York (2007)
Google Scholar
Beesley, K.R., Karttunen, L.: Finite State Morphology. CSLI Publications, Palo Alto (2003)
Google Scholar
Goldsmith, J.: Unsupervised learning of the morphology of a natural language. In: Computational Linguistics, vol. 27, pp. 153–98. MIT Press, Cambridge (2001)
Google Scholar
Altenbek, G., Xiao-long, W.: Kazakh segmentation system of inflectional affixes. In: CIPS-SIGHAN Joint Conference on Chinese Language Processing, pp. 183–190. Beijing (2010)
Google Scholar
Kairakbay, B.: A nominal paradigm of the Kazakh language. In: 11th International Conference on Finite State Methods and Natural Language Processing, pp. 108–112. St. Andrews (2013)
Google Scholar
Linden, K., Silfverberg, M., Axelson, E., Hardwick, S., Pirinen, T.A.: HFST—Framework for compiling and applying morphology. In: Systems and Frameworks for Computational Morphology, pp. 67–85. Springer, Heidelberg (2011)
Google Scholar
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. In: Computational Linguistics, vol. 29, pp. 19–51. MIT Press, Cambridge (2003)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: A method for automatic evaluation of machine translation. In: 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318. Philadephia (2002)
Google Scholar
Dunning, T.: Accurate methods for the statistics of surprise and coincidence. In: Computational Linguistics, vol. 19, pp. 61–64. MIT Press, Cambridge (1993)
Google Scholar
Lee, J.-H., Lee, S.-W., Hong, G., Hwang, Y.-S., Kim, S.-B., Rim, H.-C.: A post-processing approach to statistical word alignment reflecting alignment tendency between part-of-speeches. In: 23rd International Conference on Computational Linguistics, pp. 623–629. Beijing (2010)
Google Scholar
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: Open source toolkit for statistical machine translation. In: 45th Annual Meeting of the Association for Computational Linguistics, pp. 177–180. Prague (2007)
Google Scholar
Tapias, D., Rosner, M., Piperidis, S., Odjik, J., Mariani, J., Maegaard, B., Choukri, Kh., Calzolari, N.: MultiUN: a multilingual corpus from united nation documents. In: Seventh conference on International Language Resources and Evaluation, pp. 868–872. La Valletta (2010)
Google Scholar
Och, F.J.: Minimum error rate training in statistical machine translation. In: 41st Annual Meeting of the Association for Computational Linguistics, pp. 160–167. Sapporo (2003)
Google Scholar
Federico, M., Bertoldi, N., Cettolo, M.: IRSTLM: An open source toolkit for handling large scale language models. In: Interspeech 2008, pp. 1618–1621. Brisbane (2008)
Google Scholar
Heafield, K.: Kenlm: faster and smaller language model queries. In: Sixth Workshop on Statistical Machine Translation, pp. 187–197. Edinburgh (2011)
Google Scholar
Clark, J.H., Dyer, C., Lavie, A., Smith, N.A.: Better hypothesis testing for statistical machine translation: controlling for optimizer instability. In: 49th Annual Meeting of the Association for Computational Linguistics, pp. 176–181. Portland (2011)
Google Scholar
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A Study of translation edit rate with targeted human annotation. In: Association for Machine Translation in the Americas, pp. 223–231. Cambridge (2006)
Google Scholar
Denkowski, M., Lavie, A.: Meteor 1.3: Automatic metric for reliable optimization and evaluation of machine translation systems. In: Workshop on Statistical Machine Translation EMNLP 2011, pp. 85–91. Edinburgh (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory of Intelligent Information Systems, Al-Farabi Kazakh National University, Almaty, Kazakhstan
Amandyk Kartbayev

Authors

Amandyk Kartbayev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amandyk Kartbayev .

Editor information

Editors and Affiliations

Sci. Network for Innovation & Rec. Exell, Machine Intelligence Research Labs (MIR, Auburn, Washington, USA
Ajith Abraham
Membre du Groupe Efrei, ESIGETEL, Villejuif, France
Katarzyna Wegrzyn-Wolska
Faculty of Computers & Information, Cairo University, Giza, Egypt
Aboul Ella Hassanien
VSB-Technical University of Ostrava, Ostrava, Czech Republic
Vaclav Snasel
ENIS, University of Sfax, Sfax, Tunisia
Adel M. Alimi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kartbayev, A. (2016). Using Kazakh Morphology Information to Improve Word Alignment for SMT. In: Abraham, A., Wegrzyn-Wolska, K., Hassanien, A., Snasel, V., Alimi, A. (eds) Proceedings of the Second International Afro-European Conference for Industrial Advancement AECIA 2015. Advances in Intelligent Systems and Computing, vol 427. Springer, Cham. https://doi.org/10.1007/978-3-319-29504-6_34

Download citation

DOI: https://doi.org/10.1007/978-3-319-29504-6_34
Published: 29 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29503-9
Online ISBN: 978-3-319-29504-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics