Skip to main content

Enabling Medical Translation for Low-Resource Languages

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9624))

  • 1224 Accesses

Abstract

We present research towards bridging the language gap between migrant workers in Qatar and medical staff. In particular, we present the first steps towards the development of a real-world Hindi-English machine translation system for doctor-patient communication. As this is a low-resource language pair, especially for speech and for the medical domain, our initial focus has been on gathering suitable training data from various sources. We applied a variety of methods ranging from fully automatic extraction from the Web to manual annotation of test data. Moreover, we developed a method for automatically augmenting the training data with synthetically generated variants, which yielded a very sizable improvement of more than 3 BLEU points absolute.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://qatar-weill.cornell.edu.

  2. 2.

    http://www.universaldoctor.com.

  3. 3.

    http://medibabble.com.

  4. 4.

    http://www.canopyapps.com.

  5. 5.

    http://mavroinc.com/medical.html.

  6. 6.

    http://duochart.com.

  7. 7.

    http://www.khresmoi.eu.

  8. 8.

    http://www.wikipedia.org.

  9. 9.

    http://www.wiktionary.org.

  10. 10.

    http://www.omegawiki.org.

  11. 11.

    https://github.com/tesseract-ocr.

  12. 12.

    http://www.opensubtitles.com.

  13. 13.

    http://www.ncbi.nlm.nih.gov/mesh.

  14. 14.

    EMILLE contains about 12,000 sentences of comparable data in Hindi and Urdu. We were able to align about 7,000 sentences to build an Urdu-to-Hindi system.

  15. 15.

    We used mkcls to cluster the data into 50 clusters.

References

  1. Baker, P., Hardie, A., McEnery, T., Cunningham, H., Gaizauskas, R.J.: EMILLE, a 67-million word corpus of indic languages: data collection, mark-up and harmonisation. In: Proceedings of the Third International Language Resources and Evaluation Conference, LREC 2002, Las Palmas, Canary Islands, Spain (2002)

    Google Scholar 

  2. Bojar, O., Diatka, V., Rychlý, P., Straňák, P., Tamchyna, A., Zeman, D.: Hindi-English and Hindi-only corpus for machine translation. In: Proceedings of the Ninth International Language Resources and Evaluation Conference, LREC 2014, Reykjavik, Iceland, pp. 3550–3555 (2014)

    Google Scholar 

  3. Bouillon, P., Flores, G., Georgescul, M., Halimi Mallem, I.S., Hockey, B.A., Isahara, H., Kanzaki, K., Nakao, Y., Rayner, E., Santaholma, M.E., Starlander, M., Tsourakis, N.: Many-to-many multilingual medical speech translation on a PDA. In: Proceedings of the Eighth Conference of the Association for Machine Translation in the Americas, AMTA 2008, Waikiki, Hawaii, USA, pp. 314–323 (2008)

    Google Scholar 

  4. Cherry, C., Foster, G.: Batch tuning strategies for statistical machine translation. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, HLT-NAACL 2012, Montréal, Canada, pp. 427–436 (2012)

    Google Scholar 

  5. Dillinger, M., Seligman, M.: Converser: highly interactive speech-to-speech translation for healthcare. In: Proceedings of the COLING-ACL 2006 Workshop on Medical Speech Translation, Sydney, Australia, pp. 36–39 (2006)

    Google Scholar 

  6. Durrani, N., Haddow, B., Koehn, P., Heafield, K.: Edinburgh’s phrase-based machine translation systems for WMT-14. In: Proceedings of the ACL 2014 Ninth Workshop on Statistical Machine Translation, WMT 2014, Baltimore, Maryland, USA, pp. 97–104 (2014)

    Google Scholar 

  7. Durrani, N., Koehn, P.: Improving machine translation via triangulation and transliteration. In: Proceedings of the 17th Annual Conference of the European Association for Machine Translation, EAMT 2014, Dubrovnik, Croatia, pp. 71–78 (2014)

    Google Scholar 

  8. Durrani, N., Koehn, P., Schmid, H., Fraser, A.: Investigating the usefulness of generalized word representations in SMT. In: Proceedings of the 25th Annual Conference on Computational Linguistics, COLING 2014, Dublin, Ireland, pp. 421–432 (2014)

    Google Scholar 

  9. Durrani, N., Sajjad, H., Fraser, A., Schmid, H.: Hindi-to-Urdu machine translation through transliteration. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2014, Uppsala, Sweden, pp. 465–474 (2010)

    Google Scholar 

  10. Durrani, N., Sajjad, H., Hoang, H., Koehn, P.: Integrating an unsupervised transliteration model into statistical machine translation. In: Proceedings of the 15th Conference of the European Chapter of the ACL, EACL 2014, Gothenburg, Sweden, pp. 148–153 (2014)

    Google Scholar 

  11. Durrani, N., Schmid, H., Fraser, A., Koehn, P., Schütze, H.: The operation sequence model - combining N-gram-based and phrase-based statistical machine translation. Comput. Linguist. 41(2), 157–186 (2015)

    Article  MathSciNet  Google Scholar 

  12. Dušek, O., Hajic, J., Hlavácová, J., Novák, M., Pecina, P., Rosa, R., Tamchyna, A., Urešová, Z., Zeman, D.: Machine translation of medical texts in the khresmoi project. In: Proceedings of the 52nd Annual Meeting of the Association of Computational Linguistics, ACL 2014, Baltimore, Maryland, USA, pp. 221–228 (2014)

    Google Scholar 

  13. Eck, M., Lane, I., Zhang, Y., Waibel, A.: Jibbigo: speech-to-speech translation on mobile devices. In: Proceedings of IEEE Spoken Language Technology Workshop, SLT 2010, Berkeley, California, USA, pp. 165–166 (2010)

    Google Scholar 

  14. Ehsani, F., Kimzey, J., Master, D., Sudre, K., Park, H.: Speech to speech translation for medical triage in Korean. In: Proceedings of the COLING-ACL 2006 Workshop on Medical Speech Translation, New York City, New York, USA, pp. 13–19 (2006)

    Google Scholar 

  15. Elnashar, M., Abdelrahim, H., Fetters, M.D.: Cultural competence springs up in the desert: the story of the center for cultural competence in health care at Weill Cornell Medical College in Qatar. Acad. Med. 87(6), 759–766 (2012)

    Article  Google Scholar 

  16. Federmann, C.: Appraise: an open-source toolkit for manual evaluation of MT output. Prague Bull. Math. Linguist. 98, 25–35 (2012)

    Article  Google Scholar 

  17. Gao, Y., Gu, L., Zhou, B., Sarikaya, R., Afify, M., Kuo, H.-K., Zhu, W.-Z., Deng, Y., Prosser, C., Zhang, W., et al.: IBM MASTOR system: multilingual automatic speech-to-speech translator. In: Proceedings of the COLING-ACL 2006 Workshop on Medical Speech Translation, Sydney, Australia, pp. 53–56 (2006)

    Google Scholar 

  18. Hasler, E., Haddow, B., Koehn, P.: Sparse lexicalised features and topic adaptation for SMT. In: Proceedings of the Seventh International Workshop on Spoken Language Translation, IWSLT 2012, Hong Kong, China, pp. 268–275 (2012)

    Google Scholar 

  19. Heafield, K.: KenLM: faster and smaller language model queries. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, WMT 2011, Edinburgh, Scotland, United Kingdom, pp. 187–197 (2011)

    Google Scholar 

  20. Heinze, D.T., Turchin, A., Jagannathan, V.: Automated interpretation of clinical encounters with cultural cues and electronic health record generation. In: Proceedings of the COLING-ACL 2006 Workshop on Medical Speech Translation, Sydney, Australia, pp. 20–27 (2006)

    Google Scholar 

  21. Huang, L., Chiang, D.: Forest rescoring: faster decoding with integrated language models. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, ACL 2007, Prague, Czech Republic, pp. 144–151 (2007)

    Google Scholar 

  22. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, ACL 2007, Prague, Czech Republic, pp. 177–180 (2007)

    Google Scholar 

  23. Kumar, S., Byrne, W.J.: Minimum Bayes-risk decoding for statistical machine translation. In: Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, HLT-NAACL 2004, Boston, Massachusetts, USA, pp. 169–176 (2004)

    Google Scholar 

  24. Lewis, W.D., Munro, R., Vogel, S.: Crisis MT: developing a cookbook for MT in crisis situations. In: Proceedings of the EMNLP 2011 Sixth Workshop on Statistical Machine Translation, WMT 2011, Edinburgh, Scotland, United Kingdom, pp. 501–511 (2011)

    Google Scholar 

  25. Li, J., Kim, S.-J., Na, H., Lee, J.-H.: Postech’s system description for medical text translation task. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, WMT 2014, Baltimore, Maryland, USA, pp. 229–233 (2014)

    Google Scholar 

  26. Lu, Y., Wang, L., Wong, D.F., Chao, L.S., Wang, Y., Oliveira, F.: Domain adaptation for medical text translation using web resources. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, WMT 2014, Baltimore, Maryland, USA, pp. 233–238 (2014)

    Google Scholar 

  27. Costa-Jussa, M.R., Farrus, M., Pons, J.S.: Machine translation in medicine. A quality analysis of statistical machine translation in the medical domain. In: Proceedings of the 1st Virtual International Conference on Advanced Research in Scientific Areas, ARSA 2012, pp. 1995–1998 (2012)

    Google Scholar 

  28. Nakov, P.: Improving English-Spanish statistical machine translation: experiments in domain adaptation, sentence paraphrasing, tokenization, and recasing. In: Proceedings of the Third Workshop on Statistical Machine Translation, WMT 2008, Columbus, Ohio, USA, pp. 147–150 (2008)

    Google Scholar 

  29. Nakov, P., Ng, H.T.: Improved statistical machine translation for resource-poor languages using related resource-rich languages. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3, EMNLP 2009, Singapore, pp. 1358–1367 (2009)

    Google Scholar 

  30. Nakov, P., Ng, H.T.: Improving statistical machine translation for a resource-poor language using related resource-rich languages. J. Artif. Intell. Res. (JAIR) 44, 179–222 (2012)

    MATH  Google Scholar 

  31. Nakov, P., Tiedemann, J.: Combining word-level and character-level models for machine translation between closely-related languages. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2, ACL 2012, Jeju Island, Korea, pp. 301–305 (2012)

    Google Scholar 

  32. Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  33. Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, ACL 2003, Sapporo, Japan, pp. 19–51 (2003)

    Google Scholar 

  34. Okita, T., Vahid, A.H., Way, A., Liu, Q.: The DCU terminology translation system for the medical query subtask at WMT14. In: Proceedings of the ACL 2014 Workshop on Statistical Machine Translation, WMT 2014, Baltimore, Maryland, USA, pp. 239–245 (2014)

    Google Scholar 

  35. Pécheux, N., Gong, L., Do, Q.K., Marie, B., Ivanishcheva, Y., Allauzen, A., Lavergne, T., Niehues, J., Max, A., Yvon, F.: LIMSI@ WMT’14 medical translation task. In: Proceedings of the ACL 2014 Workshop on Statistical Machine Translation, WMT 2014, Baltimore, Maryland, USA, pp. 246–253 (2014)

    Google Scholar 

  36. Post, M., Callison-Burch, C., Osborne, M.: Constructing parallel corpora for six indian languages via crowdsourcing. In: Proceedings of the Seventh Workshop on Statistical Machine Translation, WMT 2012, Montréal, Canada, pp. 401–409 (2012)

    Google Scholar 

  37. Rodrigues, J.A.S.G.: Speech-to-speech translation to support medical interviews. Ph.D. thesis, Universidade de Lisboa, Portugal (2013)

    Google Scholar 

  38. Tiedemann, J., Nakov, P.: Analyzing the use of character-level translation with sparse and noisy datasets. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2013, Hissar, Bulgaria, pp. 676–684 (2013)

    Google Scholar 

  39. Utiyama, M., Isahara, H.: A comparison of pivot methods for phrase-based statistical machine translation. In: Proceedings of the 2007 Meeting of the North American Chapter of the Association for Computational Linguistics, NAACL 2007, Rochester, New York, USA, pp. 484–491 (2007)

    Google Scholar 

  40. Wang, L., Lu, Y., Wong, D.F., Chao, L.S., Wang, Y., Oliveira, F.: Combining domain adaptation approaches for medical text translation. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, WMT 2014, Baltimore, Maryland, USA, pp. 254–259 (2014)

    Google Scholar 

  41. Wang, P., Nakov, P., Ng, H.T.: Source language adaptation for resource-poor machine translation. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012, Jeju Island, Korea, pp. 286–296 (2012)

    Google Scholar 

  42. Wang, P., Nakov, P., Ng, H.T.: Source language adaptation approaches for resource-poor machine translation. Comput. Linguist. 42, 1–44 (2016)

    Article  MathSciNet  Google Scholar 

  43. Wu, H., Wang, H.: Pivot language approach for phrase-based statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, ACL 2007, Prague, Czech Republic, pp. 856–863 (2007)

    Google Scholar 

  44. Zhang, J., Wu, X., Calixto, I., Vahid, A.H., Zhang, X., Way, A., Liu, Q.: Experiments in medical translation shared task at WMT 2014. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, WMT 2014, Baltimore, Maryland, USA, pp. 260–265 (2014)

    Google Scholar 

Download references

Acknowledgments

The authors would like to thank Naila Khalisha and Manisha Bansal for their contributions towards the project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nadir Durrani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Musleh, A., Durrani, N., Temnikova, I., Nakov, P., Vogel, S., Alsaad, O. (2018). Enabling Medical Translation for Low-Resource Languages. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9624. Springer, Cham. https://doi.org/10.1007/978-3-319-75487-1_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-75487-1_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-75486-4

  • Online ISBN: 978-3-319-75487-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics