Divergences in the Usage of Discourse Markers in English and Mandarin Chinese

  • David Steele
  • Lucia Specia
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8655)


Statistical machine translation (SMT) has, in recent years, improved the accuracy of automated translations. However, SMT systems often fail to deliver human quality translations especially with complex sentences and distant language pairs. Current SMT systems often focus on translating single sentences with clauses being treated in isolation. leading to a loss of contextual information. Discourse markers (DMs) are vital contextual links between discourse segments and this paper examines the divergences in their usage across English and Mandarin Chinese. We highlight important structural differences in composite sentences extracted from a number of parallel corpora, and show examples of how these cases are dealt with by popular SMT systems. Numerous significant divergences, such as contextual omissions, were observed which can lead to incoherent automatic translations. Our objective is to use these findings to guide a framework proposal to address divergences in DM usage in order to improve SMT output quality.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Zuffery, S., Degand, L.: Annotating the Meaning of Discourse Connectives in Multilingual Corpora. Corpus Linguistics and Linguistic Theory, 1–24 (2013)Google Scholar
  2. 2.
    Tsou, B., Gao, W., Lai, T., Chan, S.: Applying Machine Learning to Identify Chinese Discourse Markers. In: International Conference on Information, Intelligence and Systems, Chania Crete, Greece (1999)Google Scholar
  3. 3.
    Hussein, M.: Two Accounts of Discourse Markers in English. University of Damascus, Syria (2002)Google Scholar
  4. 4.
    Hardmeier, C.: Discourse in Statistical Machine Translation: A Survey and a Case Study. In: Discours – Revue de linguistique, psycholinguistique et informatique, Caen, Presses Universitaires de Caen (2012)Google Scholar
  5. 5.
    Meyer, T., Webber, B.: Implicitation of Discourse Connectives in (Machine) Translation. In: Workshop on Discourse in Machine Translation (DiscoMT), Sofia, Bulgaria, pp. 19–26 (2013)Google Scholar
  6. 6.
    Hardmeier, C., Stymne, S., Tiedemann, J., Nivre, J.: Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation. In: 51st Annual Meeting of the ACL, Sofia, Bulgaria, pp. 193–198 (2013)Google Scholar
  7. 7.
    Hajlaoni, N., Popsecu-Belis, A.: Translating English Discourse Connectives into Arabic: a Corpus-based analysis and an Evaluation Metric. In: CAASL4 Workshop at AMTA (Fourth Workshop on Computational Approaches to Arabic Script-based Languages), San Diego, CA, pp. 1–8 (2013)Google Scholar
  8. 8.
    Swan, M., Smith, B.: Learner English, 2nd edn. Cambridge University Press, Cambridge (2004)Google Scholar
  9. 9.
    Chang, P., Jurafsky, D., Manning, C.: Disambiguating “DE” for Chinese-English Machine Translation. In: 4th Workshop on SMT, Athens, Greece, pp. 215–223 (2009a)Google Scholar
  10. 10.
    Li, Y.: Sensitive Positions and Chinese Complex Sentences: A Comparative Perspective. Journal of Chinese Language and Computing 18(2), 47–59 (2008)Google Scholar
  11. 11.
    Po-Ching, Y., Rimmington, D.: A Comprehensive Grammar. Routledge, London (2004)Google Scholar
  12. 12.
    Takezawa, T., Sumita, E., Sugaya, F., Yamamoto, H., Yamamoto, S.: Toward a broad-coverage bilingual corpus for speech translation of travel conversations in the real world. In: LREC, Las Palmas, Spain, pp. 147–152 (2002)Google Scholar
  13. 13.
    Cettolo, M., Girardi, C., Federico, M.: WIT3: Web Inventory of Transcribed and Translated Talks. In: EAMT, Trento, Italy, pp. 261–268 (2012)Google Scholar
  14. 14.
    Eisele, A., Chen, Y.: MultiUN: A Multilingual Corpus from United Nation Documents. In: 7th Conference on International Language Resources and Evaluation, Pages, La Valletta, Malta, pp. 2868–2872 (2010)Google Scholar
  15. 15.
    Hutchinson, B.: Acquiring the Meaning of Discourse Markers. In: 42nd Meeting of ACL, Main Volume, Barcelona, Spain, pp. 684–691 (2004)Google Scholar
  16. 16.
    Po-Ching, Y., Rimmington, D.: Chinese: Intermediate Chinese, A Grammar and Workbook. Routledge, London (1998)Google Scholar
  17. 17.
    Po-Ching, Y., Rimmington, D.: Chinese: An Essential Grammar, 2nd edn. Routledge, London (2010)Google Scholar
  18. 18.
    Ross, C., Sheng Ma, J.: Modern Mandarin Chinese Grammar. Routledge, London (2006)Google Scholar
  19. 19.
  20. 20.
    Wang, C., Huang, L.: Grammaticalisation of Connectives in Mandarin Chinese: A Corpus-Based Study. Language and Linguistics 7(4), 991–1016 (2006)Google Scholar
  21. 21.
    Xue, N.: Annotating Discourse Connectives in the Chinese Treebank. In: ACL Workshop on Frontiers in Corpus Annotation 2: Pie in the Sky (2005)Google Scholar
  22. 22.
    Oxford Chinese Dictionary: English-Chinese Chinese-English. Oxford University Press, UK (2009)Google Scholar
  23. 23.
    Macmillan Publishers Limited 2009–2014, http://www.macmillandictionary.com/thesaurus-category/british/
  24. 24.
    Thesauraus.com. Roget’s 21st Century Thesaurus, 3rd edn., http://thesaurus.com/
  25. 25.
    Olive, J., Christianson, C., McCary, J.: Handbook of Natural Language Processing and Machine Translation. Springer, New York (2011)CrossRefMATHGoogle Scholar
  26. 26.
    Xia, F.: The Part-Of-Speech Tagging Guidelines for the Penn Chinese Treebank (3.0). Technical Reports, IRCS Report 00-07. Pennsylvania (2000)Google Scholar
  27. 27.
    Chang, P., Tseng, H., Jurafsky, D., Manning, C.: Discriminative Reordering with Chinese Grammatical Relations Features. In: 3rd Workshop on Syntax and Structure in Statistical Translation at NACCL HTL, Boulder, Colorado (2009b)Google Scholar
  28. 28.
    Zhou, L., Gao, W., Li, B., Wei, Z., Wong, K.: Cross-lingual Identification of Ambiguous Discourse Connectives for Resource-Poor Language. In: 24th International Conference on Computational Linguistics (COLING), Mumba, India (2012)Google Scholar
  29. 29.
    Tu, M., Zhou, Y., Zong, C.: Enhancing Grammatical Cohesion: Generating Transitional Expressions for SMT. In: 52nd Annual Meeting of the ACL, Baltimore, USA, June 23-25 (2014)Google Scholar
  30. 30.
    Guilou, L.: Analysing Lexical Consistency in Translation. In: Workshop on Discourse in Machine Translation (DiscoMT), Sofia, Bulgaria, pp. 10–18 (2013)Google Scholar
  31. 31.
    Wong, B., Kit, C.: Extending machine translation Evaluation Metrics with Lexical Cohesion to Document Level. In: 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea, pp. 1060–1068 (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • David Steele
    • 1
  • Lucia Specia
    • 1
  1. 1.Department of Computer ScienceThe University of SheffieldUK

Personalised recommendations